Sunday, May 11, 2014

More details on GSON 100x faster than Jackson and Boon (for Hash Collision attack) with code examples (very unlikely but now never with Boon 0.18)

For Boon the hash collision would be a very rare occurrence which I explain below. And what is rare in Boon 0.17 should be impossible for Boon 0.18 for the standard JSON parser that comes with Boon. 

Update: Boon is now patched. It should not be possible to do a hash key collision attack using the standard Boon JSON parser. Boon checks for Java 1.7 and above and then checks to see jdk.map.althashing.threshold is set if 1.7 is detected. If Java is below 1.7, then it uses TreeMap, if Java is 1.7 and jdk.map.althashing.threshold not set then Boon uses TreeMap. If Java 1.8, Boon uses LinkedHashMap. So now the one in a million use case that I have never seen is now a never possible with Boon use case using the standard JSON parser. Boon was never impacted with Pojos even prior to this patch. The index overlay is the default parser so if you do nothing special, Boon will just work. There are other places Boon uses HashMap and LinkedHashMap but not as the default parser so for the public facing APIs use the standard JSON parser (if you do nothing special, that will be what you get).

Older post.

Boon's Index overlay by design does not hash the keys.
Boon only works on JDK 1.7 and above. JDK 1.7 above can block this attack.

This is in response to this:

Where to begin....

POJOs the dominant case for REST and Websocket are not impacted on any version of the JDK.

Two examples of Boon using the evil JSON file from hell that Jesse Wilson provided and not having any issue. The most common case for REST/Websocket uses Pojos and Boon handles that with no issues. 

The second less common case uses Boon index overlay API to white list the keys. I did think of this when I designed Boon. It has been mentioned before. 

In short if you use Pojos, repeat after me: NOT AN ISSUE and never was.

If you have an internal API (a non public API), repeat after me NOT AN ISSUE and never was.

If you use JDK 1.7 and set jdk.map.althashing.threshold, NOT AN ISSUE and never was.

If you use JDK 1.8, NOT AN ISSUE and never was.

If you control both sides of the wire, NOT AN ISSUE and never was.

If you are using JSON to write to a file, post a message to an internal event bus, read a config file, whatever, NOT AN ISSUE and never was.

If you have a public API, don't know about the "jdk.map.althashing.threshold of JDK 1.7" or have somehow got Boon to work in JDK 1.6 (Boon only support 1.7 and above), and insist on using a map instead of Pojos, then you are a purple unicorn, and Boon has just the API for you, it is called ValueMap, example provided. It is easy to white list your property names and avoid this issue. (Update this is now mostly moot due to the patch, but it is a an example of using the index overlay API).

(NOTE You no longer have to use white listing after Boon 0.18. Boon has been patched see https://github.com/RichardHightower/boon/issues/182).

There are actually about 20 other ways to do this with Boon as well. There is a whole bucket load of classes that do things while using a white listed set of key and/or properties.

Best to keep the description short.

Also you ops team can block large JSON files with an F5, etc. etc. etc.

Thanks Jesse for bringing this up.

Rather than document it at length, I figured it was easier just to fix it.

BTW that json file was DAMN evil. It blew up IntelliJ like three times and blew up the github Atom editor twice. That JSON is awesome in its evilness, and I feel more powerful having grasped that the Sith can live so completely in a single JSON file.

Boon loves purple unicorns so come on over. Save a tree. Use Boon.

___ previous post before picture provided ____

I looked at what GSON for safety, and it just does not make sense in the real world to slow down every use case which in my mind is a minority use when there are better ways to do it. Both Boon and Jackson have ways to avoid this, and there are plenty of other ways. 

Firstly Boon parses to a index overlay so you can actually convert this into any sort of Map. You even use the same Map that GSON uses IF YOU NEEDED TO. Notice that last part. 


The whole assertion that this is a common case is just wrong. There are internal SOA/REST/JSON calls. There are JSON/REST calls where you control both sides of the wire. There is JSON serialized to disk. To tie all JSON parsing and serializing to index hash collision is not needed (note I added it). It is like saying SSL/TLS are safer so everyone should always use them always.  

There is always uncle fester who disables JavaScript cookies and wears an aluminum foil hat, but.... In a way he is right (thanks NSA), but there is a tradeoff, and I think Boon and Jackson make the right ones, and I think GSON does not make the right tradeoff. (Uncle fester wins, and I will explain my change of heart later, but...)

It should take me less than ten minutes to show an example that uses Boon to prevent index hash collision. (Update two above and now a patch.) Boon's Index overlay by design does not hash the keys. You should be able to white list keys, and this should never come up. (I have some examples, but will double check that they do in fact never create the map until after the white list is validated). I will double check, and make sure that is true. I believe the POJO case using the index overlay would also not have this problem. I am fairly certain of it, but I would double check and add a unit test to make sure.

It is more about separation of concerns.  Safety / Speed are a continuum and when you tie yourself to a particular design choice you are limiting your solution. Also if you are expecting REST calls between 2k and 4K let's say, you can block large posts that are 1.2 MB at the F5 or NginX that would end to DDOS as well. You can even filter payloads that don't have certain strings in their body. (Which might have been true, but I will explain later.)

I know Jackson has similar capabilities to white list keys. There is usually more than on way to skin a cat. Boon was designed with this case in mind, and I did look at what GSON did. I decided it did not make sense for Boon to depend on a collection lib for a use case that would impact about less than 5% of all REST/Websocket, and 0% of the other 900 reasons you might want to use JSON parsing/serialization (see SlumberDB). Since the map that GSON uses is on google code, you can even use it with Boon. Boon is pretty flexible. The index overlay has an object that looks like a Map and it can be converted into a real map... even the on that GSON uses. 

I did not make this map the default because 1) you can white list the keys before they get to the map 2) not all APIs are public facing free for alls that need that level of protection 3) not every case is even used for SOA/REST/Websocket at all so the additional overhead and dependency on a lib did not make sense when you have plenty of other ways using Boon to accomplish this same thing.

With all that said, this has given me an opportunity to ponder this again, and I can at least have a WIKI page that describes what one should not do. 

Please understand that 100% of the use cases I have seen for public JSON REST calls involve POJOs, and this should not be possible with JSON to Java mapping using the index overlay which is the default with POJOs so of the 5% where this matters 100% of that 5% also does not matter because it inherently uses white listing. That said. I will verify that this is the case (even though I know it is the case because avoiding hashing was how I improved the JSON to Java mapping). But I will do some more due diligence. 

Show less
Rob Williams's profile photoRick Hightower's profile photoJesse Wilson's profile photo
Hide comments

Jesse Wilson
12:03 PM
I think we mostly agree actually. Safety/speed are a continuum. Gson is safer. Boon is faster. This is the Boon code that spent 15 seconds parsing 1.2 MiB of nasty JSON.

    Reader reader = ...
    Map<?, ?> map = (Map) JsonFactory.fromJson(reader);

Was I using it wrong?
Read more

Rick Hightower
1:38 PM
If you were concerned with safety you would use the Boon parser that builds an index overlay map and then white list the keys. If you are going to spend the time making a benchmark, you could look at the one you are trying to disprove which has been validated by three other benchmarks and uses the index overlay throughout.

Different use cases can use an index overlay. The API you are testing against is the connivence API when you just want to do a quick and dirty parse of a config file.

The benchmark I wrote uses the index overlay which does not use a HashMap. Put your benchmark on GitHub and let me take a look. I kept quiet for months and had my benchmarks validated by several people before I went public. This is not the first time I heard this. I've been waiting for you to go public with this since December. Every one knows everyone else.

Also I think the POJO mapping would not have this problem. It was designed that way.

Jackson has ways around this to. But Tatu is nicer than me and better at handling this then I am so he can speak for Jackson.

It is not a matter of safety and non safety. It is a matter of how do you prevent a DDoS and when. If you control both sides of the wire, might be overkill especially if payloads are already encrypted via TLS.

You can also white list the keys. You can also use the same map that GSON uses which is on google code.

Publish your code. I am pretty sure Boon does handle this case but not by default.

Most the services I work with are internal SOA or mobile to backend where we own both sides of the wire and encrypt the JSON. In these cases your point is moot.

In cases where your point is not moot, I would use the index overlay and the white list the allowed keys. Or use POJOs which does the same thing or convert the index overlay map into the map you use.

Publish your benchmark. Publish your code. I have not really tested this case because in the apps I work on it does not come up but if you publish your code then I can more easily validate that boon does not have this issue.

Btw I have found some use cases where Jackson is a lot faster up to 4x. It is not that I cherry picked, it is that these use cases have not come up yet in the projects that I am working on.

Also I plan to match Jackson speed in these use cases within the next week or so.

Sorry if I was rough but you came out swinging and I grew up in the murder capital of the US.

Read more (63 lines)

Rick Hightower
2:16 PM
Btw blocking comments on your blog is very cowardly. Open it up for comments. Fight like a man! :)

Update.. Boon is still a LOT faster than GSON. Boon is part of QBit

To learn more about QBit and microservices check out these links:

  • [Detailed Tutorial] QBit microservice example
  • [Doc] Queue Callbacks for QBit queue based services
  • [Quick Start] Building a simple Rest web microservice server with QBit
  • [Quick Start] Building a TODO web microservice client with QBit
  • [Quick Start] Building a TODO web microservice server with QBit
  • [Quick Start] Building boon for the QBit microservice engine
  • [Quick Start] Building QBit the microservice lib for Java
  • [Rough Cut] Delivering up Single Page Applications from QBit Java JSON Microservice lib
  • [Rough Cut] Working with event bus for QBit the microservice engine
  • [Rough Cut] Working with inproc MicroServices
  • [Rough Cut] Working with private event bus for inproc microservices
  • [Rough Cut] Working with strongly typed event bus proxies for QBit Java Microservice lib
  • [Rough Cut] Working with System Manager for QBit Mircoservice lib
  • [Z Notebook] More benchmarking internal
  • [Z Notebook] Performance testing for REST
  • [Z Notebook] Roadmap
  • Home
  • Introduction to QBit
  • Local Service Proxies
  • QBit Boon New Wave of JSON HTTP and Websocket
  • QBit Docs
  • Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training