Skip to content

Parsing into Map #49

@piotrrzysko

Description

@piotrrzysko

Introduction

Sometimes users want to parse a JSON object into a map.

Let's assume that we have the following example object:

{
    "intKey": 123,
    "objKey": {
	"key1": "abc",
        "key2": false
    },
    "arrayKey": [1, 2, 3]
}

We expect the parser to produce a Map<String, Object> from which we should be able to extract the object's fields in the following way:

Map<String, Object> map = parser.parse(bytes, bytes.length, Map.class);

int intValue = (int) map.get("intKey");

Map<String, Object> obj = (Map<String, Object>) map.get("objKey");
String value1 = (String) obj.get("key1");
boolean value2 = (boolean) obj.get("key2");

List<Object> array = (List<Object>) map.get("arrayKey");

Question

Let’s assume that the parser exposes an API like:

Map<String, Object> map = parser.parse(bytes, bytes.length, Map.class);

The returned map is immutable.

JSON parsing benchmarks often show that, in Java, creating new strings takes a significant portion of the time. So, the question is: at which stage should this happen? I see two options:

Option 1

Map<String, Object> map = parser.parse(bytes, bytes.length, Map.class);

// at this point all Strings are created

String value1 = map.get("key"); // this doesn’t create a new one
String value2 = map.get("key"); // this doesn’t create a new one either

Option 2

Map<String, Object> map = parser.parse(bytes, bytes.length, Map.class);

// at this point, the map only holds its own copy of a byte array with all parsed strings, but no instance of String has been created so far

String value1 = map.get("key"); // this creates a new instance of String 
String value2 = map.get("key"); // this also creates a new instance of String 

I suppose the second option is far more efficient in situations where someone wants to access only a small set of all fields and they want to do so only once.

@ZhaiMo15 @zekronium since you reported this topic, what are your thoughts? I’d like to understand your use cases better to be able to choose a more suitable option or come up with something else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions