Rick

Rick
Rick

Wednesday, March 19, 2014

Boon filtering for Java Beans, JSON and Java Maps like JDK 8 streams but much faster (added Data Repo Index Search, and JSON ETL)

Boon Home | Boon Source | If you are new to boon, you might want to start here. Simple opinionated Java for the novice to expert level Java Programmer. Low Ceremony. High Productivity. A real boon to Java to developers!

Java Boon - Boon Filtering for Java beans, JSON and Maps

Many languages have support for querying objects and filtering objects easily. Java does in JDK 8, but Boon adds it as well, and it can compliment features of JDK 8 as well as work in Java 7 land. Boon adds filtering to to Java. You can filter JSON, Java, and Maps. You can also use indexed queries which are a lot faster than linear search queries that come with JDK 8.

This tutorial builds on the tutorial for path expressions and sorting.
Let's create some sample objects to run some filters on.
List of Java Bean instances to filter
    static List<Department> departmentsList = list(
            new Department("Engineering").add(
                    new Employee(1, 100, "Rick", "Hightower", "555-555-1000"),
                    new Employee(2, 200, "John", "Smith", "555-555-1215", "555-555-1214", "555-555-1213"),
                    new Employee(3, 300, "Drew", "Donaldson", "555-555-1216"),
                    new Employee(4, 400, "Nick", "LaySacky", "555-555-1217")

            ),
            new Department("HR").add(
                    new Employee(5, 100, "Dianna", "Hightower", "555-555-1218"),
                    new Employee(6, 200, "Derek", "Smith", "555-555-1219"),
                    new Employee(7, 300, "Tonya", "Donaldson", "555-555-1220"),
                    new Employee(8, 400, "Sue", "LaySacky", "555-555-9999")

            ), new Department("Manufacturing").add(),
            new Department("Sales").add(),
            new Department("Marketing").add()

    );

The above configures many departments each with a few employees.
But since this is Boon and Boon treats JSON like it is part of Java, there is support for lists and maps and JSON.
The listing below creates lists of maps that we can run our filters on as well. Same code works with maps and Java objects.
List of Java Maps to filter that contain similar department and employee data
    static List<?> departmentObjects = list(
            map("name", "Engineering",
                    "employees", list(
                    map("id", 1, "salary", 100, "firstName", "Rick", "lastName", "Hightower",
                            "contactInfo", map("phoneNumbers",
                            list("555-555-0000")
                    )
                    ),
                    map("id", 2, "salary", 200, "firstName", "John", "lastName", "Smith",
                            "contactInfo", map("phoneNumbers", list("555-555-1215",
                            "555-555-1214", "555-555-1213"))),
                    map("id", 3, "salary", 300, "firstName", "Drew", "lastName", "Donaldson",
                            "contactInfo", map("phoneNumbers", list("555-555-1216"))),
                    map("id", 4, "salary", 400, "firstName", "Nick", "lastName", "LaySacky",
                            "contactInfo", map("phoneNumbers", list("555-555-1217")))

            )
            ),
            map("name", "HR",
                    "employees", list(
                    map("id", 5, "salary", 100, "departmentName", "HR",
                            "firstName", "Dianna", "lastName", "Hightower",
                            "contactInfo",
                            map("phoneNumbers", list("555-555-1218"))),
                    map("id", 6, "salary", 200, "departmentName", "HR",
                            "firstName", "Derek", "lastName", "Smith",
                            "contactInfo",
                            map("phoneNumbers", list("555-555-1219"))),
                    map("id", 7, "salary", 300, "departmentName", "HR",
                            "firstName", "Tonya", "lastName", "Donaldson",
                            "contactInfo", map("phoneNumbers", list("555-555-1220"))),
                    map("id", 8, "salary", 400, "departmentName", "HR",
                            "firstName", "Sue", "lastName", "LaySacky",
                            "contactInfo", map("phoneNumbers", list("555-555-9999")))

            )
            ),
            map("name", "Manufacturing", "employees", Collections.EMPTY_LIST),
            map("name", "Sales", "employees", Collections.EMPTY_LIST),
            map("name", "Marketing", "employees", Collections.EMPTY_LIST)
    ); 
Without future ado, let's start filtering some object. Follow the comments in the listing below.
Lists to work with
        List<Department> departments;
        List<Department> departmentsWithEmployeeNamedRick;
        List<Employee> employees;

        /** Copy list of departments. */
        departments = Lists.deepCopy(departmentsList);

        /** Get all employees in every department. */
        employees = (List<Employee>) atIndex(departments, "employees");
The above jus uses some of Boon helper features to make copies of list and slice them and dice them. Essentially we created a list by extracting all of the employees in each department in the list, and it took one like of code. :) That is boon baby!
Now lets start filtering the list. In Boon a filter is basically like a SQL query.
Search for departments who have employees with the first name Rick
        /* -----------------
        Search for departments that have an employee with the
        first name Rick.
         */
        departmentsWithEmployeeNamedRick =  filter(departments,
                contains("employees.firstName", "Rick"));

Think about what we just did. We just searched all of the departments to see if any of them have an employee named "Rick", and we did it with one line of code.
Now I do a little validate so I can turn this example into a test. :)
Validation that it works...Search for departments who have employees with the first name Rick
        /* Verify. */
        Int.equalsOrDie(1, departmentsWithEmployeeNamedRick.size());

        Str.equalsOrDie("Engineering",
                departmentsWithEmployeeNamedRick.get(0).getName());

Now lets search for all employees who are in the HR department and this just verify that the search works.
Grab All employees in HR
        /* Grab all employees in HR. */
        List<Employee> results = filter(employees, eq("department.name", "HR"));


        /* Verify. */
        Int.equalsOrDie(4, results.size());

        Str.equalsOrDie("HR", results.get(0).getDepartment().getName());


Notice Boon has a DSL like construct for doing queries so eq("department.name", "HR") equates to the employees who are in the department named HR.
You can nest criteria in and methods and or methods. The filter method takes many criteria objects so here we are searching for employees who are in HR and who have a salary greater than 30.
Find All employees who work in HR and make more than 301
        /** Grab employees in HR with a salary greater than 301 */
        results = filter(employees, eq("department.name", "HR"), gt("salary", 301));


        /* Verify. */
        Int.equalsOrDie(1, results.size());

        Str.equalsOrDie("HR", results.get(0).getDepartment().getName());

        Str.equalsOrDie("Sue", results.get(0).getFirstName());
Thus far we have worked with Boon with Java instances in a List, but you can also query java.util.Maps in a List as follows:
Now work with Maps
        /** Now work with maps */

        List<Map<String, Object>> employeeMaps =
        (List<Map<String, Object>>) atIndex(departmentObjects, "employees");

        /** Grab employees in HR with a salary greater than 301 */

        List<Map<String, Object>> resultObjects = filter(employeeMaps,
                eq("departmentName", "HR"), gt("salary", 301));



        /* Verify. */
        Int.equalsOrDie(1, resultObjects.size());

        Str.equalsOrDie("HR", (String) resultObjects.get(0).get("departmentName"));

        Str.equalsOrDie("Sue", (String) resultObjects.get(0).get("firstName"));

The above is the same code but against list of maps instead of list of Java objects.
The filtering also works with JSON. This is Boon remember and Boon always has a way to work with JSON.
Now work with JSON
        /** Now with JSON. */

        String json = toJson(departmentObjects);
        puts(json);

        List<?> array =  (List<?>) fromJson(json);
        employeeMaps =
                (List<Map<String, Object>>) atIndex(array, "employees");


        resultObjects = filter(employeeMaps,
                eq("departmentName", "HR"), gt("salary", 301));



        /* Verify. */
        Int.equalsOrDie(1, resultObjects.size());

        Str.equalsOrDie("HR", (String) resultObjects.get(0).get("departmentName"));

        Str.equalsOrDie("Sue", (String) resultObjects.get(0).get("firstName"));

The criteria can have complex path expressions. We have just scratched the service of what Boon can do. Boon can slice, dice, transform, and search Java objects 50 ways to Sunday.

In addition to Boon filtering you have Boon indexed searching which is based on Boon Data Repo.

Boon Data Repo is an in-memory query engine that can use high speed indexed collections for very fast in-memory queries. This brief tutorial will show you how to use Boon to query JSON, Java objects and Java maps. And also how to do ELT transforms with JSON, Java objects and Java Maps.

Let's redo the example with Boon Data Repo. We are going to create a repo so we can easily search this list of Departments whit lists of Employees of the list of maps of lists of maps of lists.
Grab just the employees
        List<Employee> employees;



        /** Get all employees in every department. */
        employees = (List<Employee>) atIndex(departmentsList, "employees");


Bam! We have not really got to the data repo part, but did you just see that Boon Kung Fo! We just grabbed every employee from every department and pulled all of the employees into one list. We took a hierarchy and made it flat in just one line of code. BOOM! No?! Boon. Boon is full of these utility methods and I will write a whole tutorial on just those. But I digress....
Let's create our Data Repo which is a set of indexed collections as follows:
Description of code example one sentence
        Repo<Integer,Employee> employeeRepo;

        /** It builds indexes on properties. */
        employeeRepo = Repos.builder()
                .primaryKey("id")
                .searchIndex("department.name")
                .searchIndex("salary")
                .build(int.class, Employee.class).init(employees);

We setup a primary key on the field id (Boon can use fields or Java properties), we setup a search index on department name so we can search by department name really fast. And we setup a search index on salary. Then we told the Repo builder that our primary key is and int and we are going to manage classes.
Now let's start searching.
Searching the employee list with Boon
        List<Employee> results =
                employeeRepo.query(eq("department.name", "HR"));

        /* Verify. */
        Int.equalsOrDie(4, results.size());

        Str.equalsOrDie("HR", results.get(0).getDepartment().getName());

Boon Repo has search criteria like eq (equals), gt (greater than), between, and some you never dreamed of like searching objects based on type hierarchy, and all of it can use indexes, and after index searching the results down, it does the final criteria with filters so basically Boon does in-memory what a typical data base does. Boon Repo has a query engine. It is really fast.
Passing multiple criteria search by department name with salary above 301
        results = employeeRepo.query( eq("department.name", "HR"),
                gt("salary", 301));

        /* Verify. */
        Int.equalsOrDie(1, results.size());

        Str.equalsOrDie("HR", results.get(0).getDepartment().getName());

        Str.equalsOrDie("Sue", results.get(0).getFirstName());

Boon is happy to create indexes on Java objects, but it can also just work with lists of maps or even JSON.
Working with List/Maps
        List<Map<String, Object>> employeeMaps;

        Repo<Integer,Map<String, Object>> employeeMapRepo;


        /** Get all employees in every department. */
        employeeMaps = (List<Map<String, Object>>) atIndex(departmentObjects, "employees");


        /** It builds indexes on properties. */
        employeeMapRepo = (Repo<Integer,Map<String, Object>>) (Object)
                Repos.builder()
                        .primaryKey("id")
                        .searchIndex("departmentName")
                        .searchIndex("salary")
                        .build(int.class, Map.class).init((List)employeeMaps);

Now we can search just like before but now our domain objects so to speak are just maps and lists.
Searching against Maps and Lists
        List<Map<String, Object>> resultMaps =
                employeeMapRepo.query(eq("departmentName", "HR"));

Verifying that we found what we were looking for... Searching against Maps and Lists
        /* Verify. */
        Int.equalsOrDie(4, resultMaps.size());

        Str.equalsOrDie("HR", (String) resultMaps.get(0).get("departmentName"));


        resultMaps = employeeMapRepo.query( eq("departmentName", "HR"),
                gt("salary", 301));

         /* Verify. */
        Int.equalsOrDie(1, resultMaps.size());

        Str.equalsOrDie("HR", (String) resultMaps.get(0).get("departmentName"));

        Str.equalsOrDie("Sue", (String) resultMaps.get(0).get("firstName"));

Like was mentioned again. Boon also works directly against JSON as follows:
Searching JSON with Boon
        /** Now with JSON. */

        String json = toJson(departmentObjects);
        puts(json);

        List<?> array =  (List<?>) fromJson(json);
        employeeMaps =
                (List<Map<String, Object>>) atIndex(array, "employees");

        employeeMapRepo = (Repo<Integer,Map<String, Object>>) (Object)
                Repos.builder()
                        .primaryKey("id")
                        .searchIndex("departmentName")
                        .searchIndex("salary")
                        .build(int.class, Map.class).init((List)employeeMaps);


        resultMaps = employeeMapRepo.query(
                eq("departmentName", "HR"), gt("salary", 301));


Making sure we found the maps we were looking for
        /* Verify. */
        Int.equalsOrDie(1, resultMaps.size());

        Str.equalsOrDie("HR", (String) resultMaps.get(0).get("departmentName"));

        Str.equalsOrDie("Sue", (String) resultMaps.get(0).get("firstName"));

Boon sorting, filtering and index searching can all use Boon's property path expression which works with Java object, java.util.Map, and you guessed it JSON.
Here is an example of an ETL transform to converts one list into another.
JSON ETL Transform with Boon
        List<Map<String, Object>> list = employeeMapRepo.query(
                selects(
                        selectAs("firstName", "fn"),
                        selectAs("lastName", "ln"),
                        selectAs("contactInfo.phoneNumbers[0]", "ph"),

                        selectAs("salary", "pay", new Function<Integer, Float>() {
                            @Override
                            public Float apply(Integer salary) {
                                float pay = salary.floatValue() / 100;
                                return pay;
                            }
                        })
                )
        );

        puts (toJson(list));

The above turns this JSON:
[{"name":"Engineering","employees":[{"id":1,"salary":100,"firstName":"Rick","lastName":"Hightower","contactInfo":{"phoneNumbers":["555-555-0000"]}}
Into This JSON:
{"fn":"Rick","ln":"Hightower","ph":"555-555-0000","pay":1.0}
Now keep in mind that this works with JSON, Java Instances and Maps. So you pick and it can transform from one to the other and back. What about munging stuff together and calling other functions to do transforms, and what not... You know... so real nasty ETL! Boon does nasty ETL.
Nasty ETL with Boon
        template = template();
        template.addFunctions(new DecryptionService());

        template.addFunctions(new Salary());



        list = employeeMapRepo.query(
                selects(
                        selectAsTemplate("fullName", "{{firstName}} {{lastName}}",
                                template),
                        selectAs("contactInfo.phoneNumbers[0]", "ph"),
                        selectAsTemplate("pay", "{{pay(salary)}}", template),
                        selectAsTemplate("id", "{{DecryptionService.decrypt(id)}}", template)

                )
        );

        puts (list);
Notice that this uses templates to do concatenation, it uses path expressions, and calls methods from the DecryptionService and the Salary (pay()), which are just pretend services that I made up for this example.
Boon also has a few template engines built in which makes data munging a bit easier.
Now we have this pretty JSON:
{"fullName":"Rick Hightower", ph:"555-555-0000", pay:1.0, id:49}
Boon has easy ways to turn maps into objects so it is child's play to convert the list of maps into a list of object. Remember all of the selection business works with JSON, Java instances and Java HashMaps.
Of course the above is a query so you can add a select clause to filter out objects and such. I am trying not to drown you in detail, but it is easy to manipulate the objects to/fro JSON/Maps/Java instance. Flick of the wrist!
Also if you hate handlebars, there is a JSTL syntax so let's show that.
Just like last example but we use JSTL instead of handlebar syntax
        template = jstl();
        template.addFunctions(new DecryptionService());
        template.addFunctions(new Salary());



        list = employeeMapRepo.query(
                selects(
                        selectAsTemplate("fullName", "${lastName},${firstName}",
                                template),
                        selectAsTemplate("ph", "${contactInfo.phoneNumbers[0]}",
                                template),
                        selectAsTemplate("pay", "${pay(salary)}", template),
                        selectAsTemplate("id", "${DecryptionService.decrypt(id)}", template)

                )
        );

        puts (list);

Which begets…
{"fullName":"Hightower,Rick", "ph":"555-555-0000", pay:1.0, id:49}
I forgot to show of the where clause so let's see it again.
list = employeeMapRepo.query(
                selects(
                        selectAsTemplate("fullName", "${lastName},${firstName}",
                                template),
                        selectAsTemplate("ph", "${contactInfo.phoneNumbers[0]}",
                                template),
                        selectAsTemplate("pay", "${pay(salary)}", template),
                        selectAsTemplate("id", "${DecryptionService.decrypt(id)}", template)

                ), gt("salary", 50), eq("firstName", "Rick")
        );


Now we have this.
[{fullName=Hightower,Rick, ph=555-555-0000, pay=1.0, id=49}] 


The criteria can have complex path expressions (see ${contactInfo.phoneNumbers[0]). We have just scratched the service of what Boon can do. Boon can slice, dice, transform, and search Java objects 50 ways to Sunday.
To read more about Boon sorting and searching capabilities please read.
In addition to linear searching as you saw above, Boon has the ability to run indexed searches as well as ETL transforms for Java and JSON objects in memory.
Much of Boon's sorting, index searching, ETL, and so on came out of Boon's Data Repo, which provides the indexed search capabilities. You can churn through millions of objects in memory in mere moments with indexed collections.
Boon's DataRepo allows you to treat Java collections more like a database. DataRepo is not an in memory database, and cannot substitute arranging your objects into data structures optimized for your application.
If you want to spend your time providing customer value and building your objects and classes and using the Collections API for your data structures, then Boon DataRepo is meant for you. This does not preclude breaking out the Knuth books and coming up with an optimized data structure. It just helps keep the mundane things easy and so you can spend your time making the hard things possible.
This project came out of a need. I was working on a project that planned to store large collection of domain objects in memory for speed, and somebody asked an all to important question that I overlooked. How are we going to query this data. My answer was we will use the Collections API and the Streaming API. Then I tried to do this...
I tired using the JDK 8 stream API on a large data set, and it was slow. It was a linear search/filter. This is by design, but for what I was doing, it did not work. I needed indexes to support arbitrary queries.
Boon's data repo augments the streaming API and provides a indexed collection search and many other utilities around searching, filtering and transforming Java class instances, Maps, and JSON.
Boon's DataRepo does not endeavor to replace the JDK 8 stream API, and in fact it works well with it. By design, DataRepo works with standard collection libraries.
Boon's data repo makes doing index based queries on collections a lot easier.
It provides a simplified API for doing so.
You can use a wrapper class to wrap a collection into a indexed collection.
Let's say you have a method that creates 200,000 employee objects like this:
        List<Employee> employees = TestHelper.createMetricTonOfEmployees(200_000);

So now we have 200,000 employees. Let's search them...
First wrap Employees in a searchable query:
        employees = query(employees);

Now search:
        List<Employee> results = query(employees, eq("firstName", firstName));
So what is the main difference between the above and the stream API?
        employees.stream().filter(emp -> emp.getFirstName().equals(firstName)
About a factor of 20,000% faster to use Boon DataRepo!
There is an API that looks just like your built-in collections. There is also an API that looks more like a DAO object or a Repo Object.
A simple query with the Repo/DAO object looks like this:
        List<Employee> employees = repo.query(eq("firstName", "Diana"));
A more involved query would look like this:
        List<Employee> employees = repo.query(
                and(eq("firstName", "Diana"), eq("lastName", "Smith"), eq("ssn", "21785999")));
Or this:
        List<Employee> employees = repo.query(
                and(startsWith("firstName", "Bob"), eq("lastName", "Smith"), lte("salary", 200_000),
                        gte("salary", 190_000)));
Or this:
        List<Employee> employees = repo.query(
                and(startsWith("firstName", "Bob"), eq("lastName", "Smith"), between("salary", 190_000, 200_000)));
Or if you want to use JDK 8 stream API, this works with it not against it:
        int sum = repo.query(eq("lastName", "Smith")).stream().filter(emp -> emp.getSalary()>50_000)
                .mapToInt(b -> b.getSalary())
                .sum();
The above would be much faster if the number of employees was quite large. It would narrow down the employees whose name started with Smith and had a salary above 50,000. Let's say you had 100,000 employees and only 50 named Smith so now you narrow to 50 quickly by using the TreeMap which effectively pulls 50 employees out of 100_000, then we do the filter over just 50 instead of the whole 100,000.
To learn more about the Boon Data Repo go here:

Thoughts

Thoughts? Write me at richard high tower AT g mail dot c-o-m (Rick Hightower).

Further Reading:

If you are new to boon start here:

Why Boon?

Easily read in files into lines or a giant string with one method call. Works with files, URLs, class-path, etc. Boon IO support will surprise you how easy it is. Boon has Slice notation for dealing with Strings, Lists, primitive arrays, Tree Maps, etc. If you are from Groovy land, Ruby land, Python land, or whatever land, and you have to use Java then Boon might give you some relief from API bloat. If you are like me, and you like to use Java, then Boon is for you too. Boon lets Java be Java, but adds the missing productive APIs from Python, Ruby, and Groovy. Boon may not be Ruby or Groovy, but its a real Boon to Java development.

Core Boon Philosophy

Core Boon will never have any dependencies. It will always be able to run as a single jar. This is not just NIH, but it is partly. My view of what Java needs is more inline with what Python, Ruby and Groovy provide. Boon is an addition on top of the JVM to make up the difference between the harder to use APIs that come with Java and the types of utilities that are built into Ruby, Python, PHP, Groovy etc. Boon is a Java centric view of those libs. The vision of Boon and the current implementation is really far apart.

Contact Info

No comments:

Post a Comment

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training