Rick

Rick
Rick

Monday, March 17, 2014

JSON ETL, JSON Transformation, Using Boon REPO to transform one JSON Feed into another

JSON ETL....


I wrote you up an example where I transform a JSON list into another JSON list.


Check out this JSON LIST that I converted into the next list and the example code to do it.


[
   {
      "name":"Engineering",
      "employees":[
         {
            "id":1,
            "salary":100,
            "firstName":"Rick",
            "lastName":"Hightower",
            "contactInfo":{
               "phoneNumbers":[
                  "555-555-0000"
               ]
            }
         },
         {
            "id":2,
            "salary":200,
            "firstName":"John",
            "lastName":"Smith",
            "contactInfo":{
               "phoneNumbers":[
                  "555-555-1215",
                  "555-555-1214",
                  "555-555-1213"
               ]
            }
         },
         {
            "id":3,
            "salary":300,
            "firstName":"Drew",
            "lastName":"Donaldson",
            "contactInfo":{
               "phoneNumbers":[
                  "555-555-1216"
               ]
            }
         },
         {
            "id":4,
            "salary":400,
            "firstName":"Nick",
            "lastName":"LaySacky",
            "contactInfo":{
               "phoneNumbers":[
                  "555-555-1217"
               ]
            }
         }
      ]
   },
   {
      "name":"HR",
      "employees":[
         {
            "id":5,
            "salary":100,
            "departmentName":"HR",
            "firstName":"Dianna",
            "lastName":"Hightower",
            "contactInfo":{
               "phoneNumbers":[
                  "555-555-1218"
               ]
            }
         },
…..


Here is the output list:

[
   {
      "fn":"Derek",
      "ln":"Smith",
      "pay":2.0
   },
   {
      "fn":"Dianna",
      "ln":"Hightower",
      "pay":1.0
   },
   {
      "fn":"Tonya",
      "ln":"Donaldson",
      "pay":3.0
   },
   {
      "fn":"Sue",
      "ln":"LaySacky",
      "pay":4.0
   },
   {
      "fn":"John",
      "ln":"Smith",
      "pay":2.0
   },
   {
      "fn":"Rick",
      "ln":"Hightower",
      "pay":1.0
   },
   {
      "fn":"Drew",
      "ln":"Donaldson",
      "pay":3.0
   },
   {
      "fn":"Nick",
      "ln":"LaySacky",
      "pay":4.0
   }
]

The code looks like this:


    //READ IN JSON STREAM
    List<?> array =  (List<?>) fromJson(json);

    //Flatten the list of maps of lists into a list!
    employeeMaps =
                (List<Map<String, Object>>) atIndex(array, "employees");



    //BUILD A REPO FROM THE LIST
    employeeMapRepo = (Repo<Integer,Map<String, Object>>) (Object)
                Repos.builder()
                        .primaryKey("id")
                        .build(int.class, Map.class).init((List)employeeMaps);


    //TRANSFORM IT! NOTE THAT THIS EXAMPLE CONVERTS 
    // THE INT to an INT / 100 just to show we can.
    List<Map<String, Object>> list = employeeMapRepo.query(
                selects(
                     selectAs("firstName", "fn"), //turn firstName to fn
                     selectAs("lastName", "ln"),  //turn lastName to ln
                     selectAs("salary", "pay", new Function<Integer, Float>() {
                            @Override
                            public Float apply(Integer salary) {
                                float pay = salary.floatValue() / 100;
                                return pay;
                            }
                        })
                )
    );


    puts (toJson(list));


Note that we convert salary into pay, and pay is a float which is salary / 100. 

Check out this IETF spec for JSON ETL merge/patch/move



null means remove
non null means replace

And for this they write up a 20 page document.

The more interesting use cases....

[ { "op": "move", "from": "/a", "to": "/c" } ]

Those are operations to change a property name. 

I think there are better ways to do ETL on JSON.

JSON Transformation example (with some patched to data repo to make it work)



Here is the complete example:



The JSON transformation is towards the end of the main method.

We can easily add all of the operations in https://github.com/fge/json-patch plus more.



Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training