Improvement to `JacksonSerializer.roundTrip()`
Sorry, I don't get it. The benchmark focuses on deserialization, not serialization => read from bytes, not write to bytes.
Ok sorry if that is the case: I did start with the work, and did not verify the original code base.
I will verify and close this if irrelevant.
I will verify and close this if irrelevant.
Closed
cowtowncoder closed the issue
It has code for serializing an parsing as you mentioned.
RE: Existing code serializes content as a String, but given that Jackson heavily optimizes byte stream case, and since parse tests take in a byte[] or InputStream, it would make more sense to use:
byte[] content = mapper.writeValueAsBytes(alltype);
Seems logical as an benchmark. Not sure it makes the other benchmark invalid as there are users who do want strings. Boon is optimized to serialize with char[]. I don't have a byte[] serializer.
RE: and feed that to parser. Combination will be significantly faster than construction to and parsing from a String (esp. when using StringWriter; there is mapper.writeValueAsString() method as well that is marginally faster).
RE: The same output method probably also makes sense for serialize() method, looking at matching Boon serializer implementation.
The thought of a byte serializer never crossed my mind. Currently there is only a char[] serializer. I like the idea. Not sure it is high on my list of priorities as most of the buffer sizes that I am dealing with in production are satisfied fine with String.toBytes. I'll test Jackson vs. Boon with to/fro bytes only. This should put Boon at an equal disadvantage of where Jackson is now. I don't think it negates the earlier benchmark, and in fact I think the benchmark in question is a much more common case. But that is conjecture for both of us.
@RichardHightower right, it is difficult to have proper apples-to-apples tests. In this case it is not so much that output as
byte[]
is more efficient (it may be marginally so but not significantly), but that parsing is.
I think it is perfectly reasonable to have different sources/targets to test, and report these separately (esp. for parsing). For serialization, perhaps it would make sense to also have
OutputStream
output type, which should be relatively fair, not assuming that serializer has an efficient way of producing a byte[]
or not?
For round-tripping use case it seems to me that the intermediate format should be the most efficient one for implementation; this does require some knowledge of implementation. I was thinking that this use case perhaps emulates case like writing a JSON file to disk, reading it back; or web service call (although payload would differ in directions), or maybe send-modify-return.
One last thing: I would have filed this at the forked repo, but I couldn't find an issue tracker. Maybe github does not have all the facilities for forks.
I am a github neophyte. I can create a new benchmark project. Or you can
add the issues directly to boon.
I see your point about mapper.writeAsString
I am curious to know the difference. I think curiosity has been a driving
factor. I suspect for larger files this will tip the scale towards Jackson.
Background: Boon was written and it needed JSON. I decided to write a JSON
parser to reduce the dependencies with the thought being you could always
just use GSON or Jackson if the feature set / use case was not a direct
match but Boon has its own.
I have been using Jackson from 2009 and maybe before. I have never used
GSON in production.
I was aiming / optimizing for REST calls and Websocket messages around the
1k to 1mb range. Boon JSON is tuned/optimized for service calls and
messaging not really large files. Also I am using it in somewhat reactive
style so I always have complete buffers to parse and typically only accept
a buffer when it is complete. I've used it in conjunction with a file
batcher that writes a JSON entry per line and I quit benchmarking it after
300mb per second sustained writing to disk (well disk array storage
anyway). I only needed 30 mb per second for this app. The file had JSON in
it but it was not a JSON file as it was one JSON entry per line (1k or
so each line).
The largest service call I have to deal with is about 500k but this is more
or less a data push update. End user wise the service calls are typically
under 1k. The returns are under 10k. This pretty much describes every SOA /
REST / Websocket app that I have ever worked with and I have worked with
quite a few.
Boon had an IO lib and an in-memory no-SQL-ish lib and will one day have a
REST framework and an MVC framework and a mustache template lib. It needs
JSON but JSON is not Boon.
In short I am going to make the string/serialize parse as fair as possible.
I will add a byte parse and serialize parse.
Also I realize that Jackson and boon have different design goals and there
are features and things that Jackson does that boon does not.
I compared boon to jackson because that is the first question people ask
me. How does it compare to Jacskon? No one asks me about GSON but I don't
hang out with Android developers. I am an old java dog.
…
@RichardHightower I thought I recognized your name from NFJS? Unless I am confusing you with another mr Hightower. :)
I appreciate your explanation on goals for Boon. It is always interesting to read about that part, since it makes it much easier to understand various implementations choices and strategies.
I appreciate your explanation on goals for Boon. It is always interesting to read about that part, since it makes it much easier to understand various implementations choices and strategies.
The part about
writeValueAsString()
is really just about how StringWriter
works, which likeByteArrayOutputStream
keeps on doubling up, reallocating its buffer. If the end result is String, that isn't very optimal unless initial size guess is correct. With segments, one can reduce allocations. But it's probably not a huge deal in the end. Same is true for writeValueAsBytes
, intermediate storage is segmented, and final allocation is done when total length is known, instead of doubling up until full size is known, and then making one more copy.
Bigger impact for round-trip was simply just that the intermediate
Object
would be efficient to create & consume. I had a look at results you found, and round-trip was one where I couldn't quite see where the difference comes from.
Other cases (where Boon does very well -- impressive!) make more sense to me.
Jackson optimizes heavily for POJO data-binding case to/from raw byte input; and specifically binding as
Jackson optimizes heavily for POJO data-binding case to/from raw byte input; and specifically binding as
Map
s is probably somewhat sub-optimal: I suspect that the results for that one dictionary JSON where keys are numbers is particular tricky for Jackson due to symbol table churn. Basically, Jackson assumes that keys are mostly repetitive; but if all keys are unique, this does not hold. Which is fine, except for larger files starts to degrade performance.
One thing I have been curious about has been performance cost of doing range checks (to allow incremental input): that is, whether requiring all input to be in memory would allow short-cuts. I tried removing of those checks once, with pre-allocated buffer, but did not see much difference. So for the way Jackson streaming parser is implemented, there isn't much benefit from requiring full input. But this could well be different with difference parsing technique.
Maybe I should see how Boon does parsing: it's been a while since I have had a look elsewhere. Last one fast FastJSON, which actually had very cool tricks for data-binding. Its approach is much more integrated than Jackson's (sort of like SAX + data-binding in one bundle), and that is something that could yield improvements too. Current division between streaming and data-binding is useful, but it has some non-zero cost.
@RichardHightower <https://github.com/RichardHightower> I thought I
recognized your name from NFJS?
There might be other Hightowers but I did speak for a while on the NFJS
circuit. I was as big as a house (fat). Hard to miss. I know you from Jackson. :)
Someone said, your name at work, and I said who is that. Then they said
cowboycoder, and I knew right away "The guy who wrote Jackson". I think of
you as cowboycoder first, and Tatu Saloranta second. I did read your blogs
and posts and such wrt JSON and have used Jackson exclusively until
recently. :) I can honestly say that I am a fan of your work.
RE: The part about writeValueAsString() is really just about how
StringWriter works, which like ByteArrayOutputStream keeps on doubling up,
reallocating its buffer. If the end result is String, that isn't very
optimal unless initial size guess is correct. With segments, one can reduce
allocations. But it's probably not a huge deal in the end. Same is true for
writeValueAsBytes, intermediate storage is segmented, and final allocation
is done when total length is known, instead of doubling up until full size
is known, and then making one more copy.
I did not know they did that but I am not a very trusting person so I wrote
my own CharBuf and my own ByteBuf which work like you say which I think is
how StringBuilder works underneath or so I have heard. :) I was not trying
to hamstring Jackson. I missed the writeValueAsString method when I writing
the benchmark. I have no excuse now because I have created a interface that
mirrors ObjectMapper and GSON (to a degree, the object mapping logic needs
to be cleaned up, but it works). So Boon has a writeValueAsString too and a
factory that creates and interface similar to ObjectMapper and GSON. I
figured the easiest way to document it was to mimic the 800 lbs gorillas in
the space.
…
```
Benchmark Mode Thr Count Sec
Mean Mean error Units
i.g.j.s.BoonPropertySerializer.roundTriper thrpt 8 5 1
249719.367 19752.708 ops/s
i.g.j.s.BoonSerializer.roundTriper thrpt 8 5 1
280467.817 27254.161 ops/s
i.g.j.s.JacksonSerializer.roundTriper thrpt 8 5 1
189743.087 9706.744 ops/s
i.g.j.s.BoonPropertySerializer.serializeSmall thrpt 8 5 1
711160.027 185676.527 ops/s
i.g.j.s.BoonSerializer.serializeSmall thrpt 8 5 1
785138.997 38081.272 ops/s
i.g.j.s.JacksonSerializer.serializeSmall thrpt 8 5 1
543163.990 31147.317 ops/s
```
I changed the test like I think you asked me too. I can't check it in right
now because I am working on some other stuff that I don't want public yet,
and I don't feel like merging, branching and what not (resource constrained
wrt time), but it is what is in github plus these changes that I think you
asked for.
It seems in this configuration that Boon is faster for serialization of
fields or properties. Boon does not have all of the features of Jackson and
if you enable additional features it is possible that Jackson will be
faster.
```
import com.fasterxml.jackson.databind.ObjectMapper;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.logic.BlackHole;
import java.io.StringWriter;
import java.util.concurrent.TimeUnit;
import static org.boon.Boon.puts;
/**
* Created by rick on 12/27/13.
*/
public class JacksonSerializer {
private static final ObjectMapper serializer = new ObjectMapper();
private Object serialize(AllTypes allTypes) throws Exception {
allTypes.setMyLong ( System.currentTimeMillis () );
return serializer.writeValueAsString( allTypes );
}
private Object roundTrip(AllTypes alltype) throws Exception {
String string = serializer.writeValueAsString( alltype );
return serializer.readValue (string, AllTypes.class);
}
@GenerateMicroBenchmark
@OutputTimeUnit ( TimeUnit.SECONDS)
public void serializeSmall(BlackHole bh) throws Exception {
bh.consume(serialize(TestObjects.OBJECT));
}
@GenerateMicroBenchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void roundTriper(BlackHole bh) throws Exception {
bh.consume(roundTrip ( TestObjects.OBJECT ));
}
}
```
It looks like for the String case Boon is faster (for this particular
test). byte[] coming up.
…
```
Benchmark Mode Thr Count
Sec Mean Mean error Units
i.g.j.s.BoonByteArraySerializer.serializeSmall thrpt 8 5
1 799044.697 53333.092 ops/s
i.g.j.s.BoonSerializer.serializeSmall thrpt 8 5
1 763242.507 39028.941 ops/s
i.g.j.s.JacksonSerializer.serializeSmall thrpt 8 5
1 606863.367 118585.905 ops/s
i.g.j.s.JacksonByteArraySerializer.serializeSmall thrpt 8 5
1 514680.657 30010.829 ops/s
i.g.j.s.BoonSerializer.roundTriper thrpt 8 5
1 314664.487 1792.830 ops/s
i.g.j.s.BoonByteArraySerializer.roundTriper thrpt 8 5
1 282110.953 22238.562 ops/s
i.g.j.s.JacksonByteArraySerializer.roundTriper thrpt 8 5
1 182594.197 5808.897 ops/s
i.g.j.s.JacksonSerializer.roundTriper thrpt 8 5
1 161474.970 45330.686 ops/s
```
The results don't make a lot of sense for byte[]. It seems Jackson was
slower for byte array serialization than it was String serialization which
makes not sense and Boon got faster for byte[] serialization than it id
char[] serialization which really does not make sense. Both those are the
results.
The delta is about the same between Jackson and Boon.
Jackson doing byte array[]
```
package io.gatling.jsonbenchmark.serialization;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.logic.BlackHole;
import java.util.concurrent.TimeUnit;
public class JacksonByteArraySerializer {
private static final ObjectMapper serializer = new ObjectMapper();
private Object serialize(AllTypes allTypes) throws Exception {
allTypes.setMyLong ( System.currentTimeMillis () );
return serializer.writeValueAsBytes( allTypes );
}
private Object roundTrip(AllTypes alltype) throws Exception {
byte[] bytes = serializer.writeValueAsBytes( alltype );
return serializer.readValue (bytes, AllTypes.class);
}
@GenerateMicroBenchmark
@OutputTimeUnit( TimeUnit.SECONDS)
public void serializeSmall(BlackHole bh) throws Exception {
bh.consume(serialize(TestObjects.OBJECT));
}
@GenerateMicroBenchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void roundTriper(BlackHole bh) throws Exception {
bh.consume(roundTrip ( TestObjects.OBJECT ));
}
}
```
Now wait until you see the Boon way.. And you will see that this benchmark
does not make sense unless JVM is doing some sort of intrinsic operation in
the background. Boon should suffer for this and it does not.
```
package io.gatling.jsonbenchmark.serialization;
import org.boon.json.JsonParser;
import org.boon.json.JsonParserFactory;
import org.boon.json.JsonSerializer;
import org.boon.json.JsonSerializerFactory;
import org.openjdk.jmh.annotations.GenerateMicroBenchmark;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.logic.BlackHole;
import java.nio.charset.StandardCharsets;
import java.util.concurrent.TimeUnit;
@State
public class BoonByteArraySerializer {
private final JsonSerializer serializer = new
JsonSerializerFactory().useFieldsOnly().create ();
private final JsonParser parser = new JsonParserFactory().create();
private Object serialize(AllTypes alltype) throws Exception {
alltype.setMyLong ( System.currentTimeMillis () );
return serializer.serialize ( alltype ).toString ().getBytes (
StandardCharsets.UTF_8 );
}
private Object roundTrip(AllTypes alltype) throws Exception {
return parser.parse ( AllTypes.class, serializer.serialize( alltype
).toString().getBytes(StandardCharsets.UTF_8) );
}
@GenerateMicroBenchmark
@OutputTimeUnit( TimeUnit.SECONDS)
public void serializeSmall(BlackHole bh) throws Exception {
bh.consume(serialize(TestObjects.OBJECT));
}
@GenerateMicroBenchmark
@OutputTimeUnit(TimeUnit.SECONDS)
public void roundTriper(BlackHole bh) throws Exception {
bh.consume(roundTrip ( TestObjects.OBJECT ));
}
}
```
Here is only run the byte[] tests to eliminate getting caught with
whatever...
```
Benchmark Mode Thr Count
Sec Mean Mean error Units
i.g.j.s.BoonByteArraySerializer.serializeSmall thrpt 8 5
1 773485.440 80583.732 ops/s
i.g.j.s.JacksonByteArraySerializer.serializeSmall thrpt 8 5
1 578652.557 21243.642 ops/s
i.g.j.s.BoonByteArraySerializer.roundTriper thrpt 8 5
1 279642.553 78978.092 ops/s
i.g.j.s.JacksonByteArraySerializer.roundTriper thrpt 8 5
1 202312.350 5241.249 ops/s
```
Notice that the roundTriper got a lot closer. And the mean of the
roundTriper for boon is a lot higher.
Let me run just roundTriper.
```
Benchmark Mode Thr Count Sec
Mean Mean error Units
i.g.j.s.BoonByteArraySerializer.roundTriper thrpt 8 5 1
309763.747 6562.801 ops/s
i.g.j.s.JacksonByteArraySerializer.roundTriper thrpt 8 5 1
208100.320 9395.750 ops/s
```
Now just Jackson and then just Boon.
```
Benchmark Mode Thr Count Sec
Mean Mean error Units
i.g.j.s.JacksonByteArraySerializer.roundTriper thrpt 8 5 1
201209.153 13437.474 ops/s
```
```
Benchmark Mode Thr Count Sec
Mean Mean error Units
i.g.j.s.BoonByteArraySerializer.roundTriper thrpt 8 5 1
308146.663 4197.227 ops/s
```
It looks to me for this set of tests Boon is faster. Take heart. 1) Boon
does not have a streaming mode so really large files are limited
to available memory resources. Jackson does not have this limitation. 2GB
JSON file is not feasible with Boon. Boon is truly geared for
REST/Websocket not large streaming JSON. 2) Jackson has a lot more features
than Boon. Boon has a subset of features.
I don't think the delta for inputStream or reader will matter for this
test. Boon has an optimized IO lib and I can't see how we will be measuring
anything but diluting the parser speed test. I base this on past experience
of adding Reader, InputStream, etc. and benchmarking it. When I/O is
involved to disk the deltas are usually still there but harder to see due
to the constant speed of the disk I/O. But I could be wrong. It did not
matter much on the other tests that I have run against Jackson and Boon.
Also one last thing.. sorry if I said anything too caustic in my blog post.
You have been extremely nice, and most nights I was working really late and
not being as behaved and professional are you are now.
I honestly think the delta is largely due to the streaming buffer windowing
logic, and just more logic due to more features of Jackson.
…
RE: Notice that the roundTriper got a lot closer. And the mean of the
roundTriper for boon is a lot higher.
Notice that the roundTriper got a lot closer. And the mean error of the
roundTriper for boon is a lot higher.
See how one word makes me look caustic versus self deprecating. I read what
I wrote and sometimes which I could say that English is my
second language because the way I write sure looks like it is.
…
String
, but given that Jacksonbyte[]
orInputStream
, it would make more sense to use:StringWriter
; there ismapper.writeValueAsString()
method as well that is marginally faster).serialize()
method, looking at matching Boon serializer implementation.