Sleepless Dev: Boon / Jackson discussion between Hightower and Cowtowncoder.

I am mostly putting this here so I can read it later just in case the issue gets deleted. Boon got a visit Cowboycoder.

cowtowncoder opened this issue 8 hours ago

Improvement to `JacksonSerializer.roundTrip()`

Existing code serializes content as a String, but given that Jackson

heavily optimizes byte stream case, and since parse tests take in a byte[] or InputStream, it would make more sense to use:

byte[] content = mapper.writeValueAsBytes(alltype);

and feed that to parser. Combination will be significantly faster than construction to and parsing from a String (esp. when using StringWriter; there is mapper.writeValueAsString() method as well that is marginally faster).

The same output method probably also makes sense for serialize() method, looking at matching Boon serializer implementation.

3 participants

slandelle commented8 hours ago

Sorry, I don't get it. The benchmark focuses on deserialization, not serialization => read from bytes, not write to bytes.

slandelle commented8 hours ago

Ah, maybe you're talking about additional tests that Boon's author has done in his fork?

cowtowncoder commented8 hours ago

Ok sorry if that is the case: I did start with the work, and did not verify the original code base.
I will verify and close this if irrelevant.

cowtowncoder commented8 hours ago

Indeed. Did not realize there is an actual fork there. Apologies for noise.

Closed

cowtowncoder closed the issue 8 hours ago

RichardHightower commented6 hours ago

Hi! @cowtowncoder
The repo is here:
https://github.com/RichardHightower/json-parsers-benchmark

It has code for serializing an parsing as you mentioned.

RE: Existing code serializes content as a String, but given that Jackson heavily optimizes byte stream case, and since parse tests take in a byte[] or InputStream, it would make more sense to use:

byte[] content = mapper.writeValueAsBytes(alltype);

Seems logical as an benchmark. Not sure it makes the other benchmark invalid as there are users who do want strings. Boon is optimized to serialize with char[]. I don't have a byte[] serializer.

RE: and feed that to parser. Combination will be significantly faster than construction to and parsing from a String (esp. when using StringWriter; there is mapper.writeValueAsString() method as well that is marginally faster).

RE: The same output method probably also makes sense for serialize() method, looking at matching Boon serializer implementation.

The thought of a byte serializer never crossed my mind. Currently there is only a char[] serializer. I like the idea. Not sure it is high on my list of priorities as most of the buffer sizes that I am dealing with in production are satisfied fine with String.toBytes. I'll test Jackson vs. Boon with to/fro bytes only. This should put Boon at an equal disadvantage of where Jackson is now. I don't think it negates the earlier benchmark, and in fact I think the benchmark in question is a much more common case. But that is conjecture for both of us.

cowtowncoder commented4 hours ago

@RichardHightower right, it is difficult to have proper apples-to-apples tests. In this case it is not so much that output asbyte[] is more efficient (it may be marginally so but not significantly), but that parsing is.

I think it is perfectly reasonable to have different sources/targets to test, and report these separately (esp. for parsing). For serialization, perhaps it would make sense to also have OutputStream output type, which should be relatively fair, not assuming that serializer has an efficient way of producing a byte[] or not?

For round-tripping use case it seems to me that the intermediate format should be the most efficient one for implementation; this does require some knowledge of implementation. I was thinking that this use case perhaps emulates case like writing a JSON file to disk, reading it back; or web service call (although payload would differ in directions), or maybe send-modify-return.

One last thing: I would have filed this at the forked repo, but I couldn't find an issue tracker. Maybe github does not have all the facilities for forks.

RichardHightower commented4 hours ago

I am a github neophyte. I can create a new benchmark project. Or you can
add the issues directly to boon.

I see your point about mapper.writeAsString

I am curious to know the difference. I think curiosity has been a driving
factor. I suspect for larger files this will tip the scale towards Jackson.

Background: Boon was written and it needed JSON. I decided to write a JSON
parser to reduce the dependencies with the thought being you could always
just use GSON or Jackson if the feature set / use case was not a direct
match but Boon has its own.

I have been using Jackson from 2009 and maybe before. I have never used
GSON in production.

I was aiming / optimizing for REST calls and Websocket messages around the
1k to 1mb range. Boon JSON is tuned/optimized for service calls and
messaging not really large files. Also I am using it in somewhat reactive
style so I always have complete buffers to parse and typically only accept
a buffer when it is complete. I've used it in conjunction with a file
batcher that writes a JSON entry per line and I quit benchmarking it after
300mb per second sustained writing to disk (well disk array storage
anyway). I only needed 30 mb per second for this app. The file had JSON in
it but it was not a JSON file as it was one JSON entry per line (1k or
so each line).

The largest service call I have to deal with is about 500k but this is more
or less a data push update. End user wise the service calls are typically
under 1k. The returns are under 10k. This pretty much describes every SOA /
REST / Websocket app that I have ever worked with and I have worked with
quite a few.

Boon had an IO lib and an in-memory no-SQL-ish lib and will one day have a
REST framework and an MVC framework and a mustache template lib. It needs
JSON but JSON is not Boon.

In short I am going to make the string/serialize parse as fair as possible.

I will add a byte parse and serialize parse.

Also I realize that Jackson and boon have different design goals and there
are features and things that Jackson does that boon does not.

I compared boon to jackson because that is the first question people ask
me. How does it compare to Jacskon? No one asks me about GSON but I don't
hang out with Android developers. I am an old java dog.

…

cowtowncoder commented2 hours ago

@RichardHightower I thought I recognized your name from NFJS? Unless I am confusing you with another mr Hightower. :)
I appreciate your explanation on goals for Boon. It is always interesting to read about that part, since it makes it much easier to understand various implementations choices and strategies.

The part about writeValueAsString() is really just about how StringWriter works, which likeByteArrayOutputStream keeps on doubling up, reallocating its buffer. If the end result is String, that isn't very optimal unless initial size guess is correct. With segments, one can reduce allocations. But it's probably not a huge deal in the end. Same is true for writeValueAsBytes, intermediate storage is segmented, and final allocation is done when total length is known, instead of doubling up until full size is known, and then making one more copy.

Bigger impact for round-trip was simply just that the intermediate Object would be efficient to create & consume. I had a look at results you found, and round-trip was one where I couldn't quite see where the difference comes from.

Other cases (where Boon does very well -- impressive!) make more sense to me.
Jackson optimizes heavily for POJO data-binding case to/from raw byte input; and specifically binding as Maps is probably somewhat sub-optimal: I suspect that the results for that one dictionary JSON where keys are numbers is particular tricky for Jackson due to symbol table churn. Basically, Jackson assumes that keys are mostly repetitive; but if all keys are unique, this does not hold. Which is fine, except for larger files starts to degrade performance.

One thing I have been curious about has been performance cost of doing range checks (to allow incremental input): that is, whether requiring all input to be in memory would allow short-cuts. I tried removing of those checks once, with pre-allocated buffer, but did not see much difference. So for the way Jackson streaming parser is implemented, there isn't much benefit from requiring full input. But this could well be different with difference parsing technique.

Maybe I should see how Boon does parsing: it's been a while since I have had a look elsewhere. Last one fast FastJSON, which actually had very cool tricks for data-binding. Its approach is much more integrated than Jackson's (sort of like SAX + data-binding in one bundle), and that is something that could yield improvements too. Current division between streaming and data-binding is useful, but it has some non-zero cost.

RichardHightower commentedan hour ago

@RichardHightower <https://github.com/RichardHightower> I thought I
recognized your name from NFJS?

There might be other Hightowers but I did speak for a while on the NFJS
circuit. I was as big as a house (fat). Hard to miss. I know you from Jackson. :)
Someone said, your name at work, and I said who is that. Then they said
cowboycoder, and I knew right away "The guy who wrote Jackson". I think of
you as cowboycoder first, and Tatu Saloranta second. I did read your blogs
and posts and such wrt JSON and have used Jackson exclusively until
recently. :) I can honestly say that I am a fan of your work.


RE: The part about writeValueAsString() is really just about how
StringWriter works, which like ByteArrayOutputStream keeps on doubling up,
reallocating its buffer. If the end result is String, that isn't very
optimal unless initial size guess is correct. With segments, one can reduce
allocations. But it's probably not a huge deal in the end. Same is true for
writeValueAsBytes, intermediate storage is segmented, and final allocation
is done when total length is known, instead of doubling up until full size
is known, and then making one more copy.

I did not know they did that but I am not a very trusting person so I wrote
my own CharBuf and my own ByteBuf which work like you say which I think is
how StringBuilder works underneath or so I have heard. :) I was not trying
to hamstring Jackson. I missed the writeValueAsString method when I writing
the benchmark. I have no excuse now because I have created a interface that
mirrors ObjectMapper and GSON (to a degree, the object mapping logic needs
to be cleaned up, but it works). So Boon has a writeValueAsString too and a
factory that creates and interface similar to ObjectMapper and GSON. I
figured the easiest way to document it was to mimic the 800 lbs gorillas in
the space.

…

RichardHightower commentedan hour ago

```
Benchmark                                         Mode Thr     Count  Sec
      Mean   Mean error    Units
i.g.j.s.BoonPropertySerializer.roundTriper       thrpt   8         5    1
249719.367    19752.708    ops/s
i.g.j.s.BoonSerializer.roundTriper               thrpt   8         5    1
280467.817    27254.161    ops/s
i.g.j.s.JacksonSerializer.roundTriper            thrpt   8         5    1
189743.087     9706.744    ops/s


i.g.j.s.BoonPropertySerializer.serializeSmall    thrpt   8         5    1
711160.027   185676.527    ops/s
i.g.j.s.BoonSerializer.serializeSmall            thrpt   8         5    1
785138.997    38081.272    ops/s
i.g.j.s.JacksonSerializer.serializeSmall         thrpt   8         5    1
543163.990    31147.317    ops/s
```

I changed the test like I think you asked me too. I can't check it in right
now because I am working on some other stuff that I don't want public yet,
and I don't feel like merging, branching and what not (resource constrained
wrt time), but it is what is in github plus these changes that I think you
asked for.

It seems in this configuration that Boon is faster for serialization of
fields or properties. Boon does not have all of the features of Jackson and
if you enable additional features it is possible that Jackson will be
faster.

```

import com.fasterxml.jackson.databind.ObjectMapper;

import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.logic.BlackHole;

import java.io.StringWriter;
import java.util.concurrent.TimeUnit;

import static org.boon.Boon.puts;

/**
 * Created by rick on 12/27/13.
 */
public class JacksonSerializer {

    private static final ObjectMapper serializer = new ObjectMapper();

    private Object serialize(AllTypes allTypes) throws Exception {
        allTypes.setMyLong ( System.currentTimeMillis () );

        return serializer.writeValueAsString( allTypes );
    }



    private Object roundTrip(AllTypes alltype) throws Exception {
        String string = serializer.writeValueAsString( alltype );
        return serializer.readValue (string,  AllTypes.class);
    }



    @GenerateMicroBenchmark
    @OutputTimeUnit ( TimeUnit.SECONDS)
    public void serializeSmall(BlackHole bh) throws Exception {
        bh.consume(serialize(TestObjects.OBJECT));
    }

    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void roundTriper(BlackHole bh) throws Exception {
        bh.consume(roundTrip ( TestObjects.OBJECT ));
    }


}

```


It looks like for the String case Boon is faster (for this particular
test). byte[] coming up.

…

RichardHightower commented15 minutes ago

```
Benchmark                                             Mode Thr     Count
 Sec         Mean   Mean error    Units
i.g.j.s.BoonByteArraySerializer.serializeSmall       thrpt   8         5
 1   799044.697    53333.092    ops/s
i.g.j.s.BoonSerializer.serializeSmall                thrpt   8         5
 1   763242.507    39028.941    ops/s
i.g.j.s.JacksonSerializer.serializeSmall             thrpt   8         5
 1   606863.367   118585.905    ops/s
i.g.j.s.JacksonByteArraySerializer.serializeSmall    thrpt   8         5
 1   514680.657    30010.829    ops/s


i.g.j.s.BoonSerializer.roundTriper                   thrpt   8         5
 1   314664.487     1792.830    ops/s
i.g.j.s.BoonByteArraySerializer.roundTriper          thrpt   8         5
 1   282110.953    22238.562    ops/s
i.g.j.s.JacksonByteArraySerializer.roundTriper       thrpt   8         5
 1   182594.197     5808.897    ops/s
i.g.j.s.JacksonSerializer.roundTriper                thrpt   8         5
 1   161474.970    45330.686    ops/s

```

The results don't make a lot of sense for byte[]. It seems Jackson was
slower for byte array serialization than it was String serialization which
makes not sense and Boon got faster for byte[] serialization than it id
char[] serialization which really does not make sense. Both those are the
results.

The delta is about the same between Jackson and Boon.



Jackson doing byte array[]
```
package io.gatling.jsonbenchmark.serialization;


import com.fasterxml.jackson.databind.ObjectMapper;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.logic.BlackHole;

import java.util.concurrent.TimeUnit;

public class JacksonByteArraySerializer {

    private static final ObjectMapper serializer = new ObjectMapper();

    private Object serialize(AllTypes allTypes) throws Exception {
        allTypes.setMyLong ( System.currentTimeMillis () );

        return serializer.writeValueAsBytes( allTypes );
    }



    private Object roundTrip(AllTypes alltype) throws Exception {
        byte[] bytes = serializer.writeValueAsBytes( alltype );
        return serializer.readValue (bytes,  AllTypes.class);
    }



    @GenerateMicroBenchmark
    @OutputTimeUnit( TimeUnit.SECONDS)
    public void serializeSmall(BlackHole bh) throws Exception {
        bh.consume(serialize(TestObjects.OBJECT));
    }

    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void roundTriper(BlackHole bh) throws Exception {
        bh.consume(roundTrip ( TestObjects.OBJECT ));
    }

}

```

Now wait until you see the Boon way.. And you will see that this benchmark
does not make sense unless JVM is doing some sort of intrinsic operation in
the background. Boon should suffer for this and it does not.

```
package io.gatling.jsonbenchmark.serialization;


import org.boon.json.JsonParser;
import org.boon.json.JsonParserFactory;
import org.boon.json.JsonSerializer;
import org.boon.json.JsonSerializerFactory;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.logic.BlackHole;

import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;

@State
public class BoonByteArraySerializer {

    private final JsonSerializer serializer = new
JsonSerializerFactory().useFieldsOnly().create ();
    private final JsonParser parser = new JsonParserFactory().create();



    private Object serialize(AllTypes alltype) throws Exception {
        alltype.setMyLong ( System.currentTimeMillis () );
        return serializer.serialize ( alltype ).toString ().getBytes (
StandardCharsets.UTF_8 );
    }

    private Object roundTrip(AllTypes alltype) throws Exception {
        return parser.parse ( AllTypes.class, serializer.serialize( alltype
).toString().getBytes(StandardCharsets.UTF_8) );
    }


    @GenerateMicroBenchmark
    @OutputTimeUnit( TimeUnit.SECONDS)
    public void serializeSmall(BlackHole bh) throws Exception {
        bh.consume(serialize(TestObjects.OBJECT));
    }

    @GenerateMicroBenchmark
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void roundTriper(BlackHole bh) throws Exception {
        bh.consume(roundTrip ( TestObjects.OBJECT ));
    }


}

```

Here is only run the byte[] tests to eliminate getting caught with
whatever...

```
Benchmark                                             Mode Thr     Count
 Sec         Mean   Mean error    Units
i.g.j.s.BoonByteArraySerializer.serializeSmall       thrpt   8         5
 1   773485.440    80583.732    ops/s
i.g.j.s.JacksonByteArraySerializer.serializeSmall    thrpt   8         5
 1   578652.557    21243.642    ops/s

i.g.j.s.BoonByteArraySerializer.roundTriper          thrpt   8         5
 1   279642.553    78978.092    ops/s
i.g.j.s.JacksonByteArraySerializer.roundTriper       thrpt   8         5
 1   202312.350     5241.249    ops/s
```

Notice that the roundTriper got a lot closer. And the mean of the
roundTriper for boon is a lot higher.

Let me run just roundTriper.

```

Benchmark                                          Mode Thr     Count  Sec
        Mean   Mean error    Units
i.g.j.s.BoonByteArraySerializer.roundTriper       thrpt   8         5    1
  309763.747     6562.801    ops/s
i.g.j.s.JacksonByteArraySerializer.roundTriper    thrpt   8         5    1
  208100.320     9395.750    ops/s
```

Now just Jackson and then just Boon.

```
Benchmark                                          Mode Thr     Count  Sec
        Mean   Mean error    Units
i.g.j.s.JacksonByteArraySerializer.roundTriper    thrpt   8         5    1
  201209.153    13437.474    ops/s
```

```
Benchmark                                       Mode Thr     Count  Sec
    Mean   Mean error    Units
i.g.j.s.BoonByteArraySerializer.roundTriper    thrpt   8         5    1
308146.663     4197.227    ops/s
```

It looks to me for this set of tests Boon is faster. Take heart. 1) Boon
does not have a streaming mode so really large files are limited
to available memory resources. Jackson does not have this limitation.  2GB
JSON file is not feasible with Boon. Boon is truly geared for
REST/Websocket not large streaming JSON. 2) Jackson has a lot more features
than Boon. Boon has a subset of features.

I don't think the delta for inputStream or reader will matter for this
test. Boon has an optimized IO lib and I can't see how we will be measuring
anything but diluting the parser speed test. I base this on past experience
of adding Reader, InputStream, etc. and benchmarking it. When I/O is
involved to disk the deltas are usually still there but harder to see due
to the constant speed of the disk I/O. But I could be wrong. It did not
matter much on the other tests that I have run against Jackson and Boon.

Also one last thing.. sorry if I said anything too caustic in my blog post.
You have been extremely nice, and most nights I was working really late and
not being as behaved and professional are you are now.

I honestly think the delta is largely due to the streaming buffer windowing
logic, and just more logic due to more features of Jackson.

…

RichardHightower commented12 minutes ago

RE: Notice that the roundTriper got a lot closer. And the mean of the
roundTriper for boon is a lot higher.

Notice that the roundTriper got a lot closer. And the mean error of the
roundTriper for boon is a lot higher.

See how one word makes me look caustic versus self deprecating. I read what
I wrote and sometimes which I could say that English is my
second language because the way I write sure looks like it is.

…

Sleepless Dev

Rick

Friday, January 17, 2014

Boon / Jackson discussion between Hightower and Cowtowncoder.

Improvement to `JacksonSerializer.roundTrip()`

No comments:

Post a Comment

About Me

Related sites

Blog Archive

Subscribe To

Rick

Friday, January 17, 2014

Boon / Jackson discussion between Hightower and Cowtowncoder.

Improvement to `JacksonSerializer.roundTrip()`

No comments:

Post a Comment

About Me

Related sites

Blog Archive