Monday, December 28, 2015

WebSocket microservice vs REST microservice

WebSocket microservice vs REST microservice

With WebSocket, you are connecting to a server and sending requests what the server sends responses. You can essentially stream calls. How does this compare with REST which uses straight HTTP? WebSocket has 8x to 20x more throughput and uses less resources.
Let's demonstrate, for this test I will run everything on my MacBook pro. I have made no tweaks to the TCP/IP OS stack (but you can expect slightly higher throughput if you do).
To test the REST code we will use wrk. The wrk load tester is capable of generating significant load when run on a single multi-core CPU. The wrk load tester uses multithreaded and event notification systems such as epoll and kqueue to maximize the number of HTTP requests per second.
The REST service code:

REST service code

package io.advantageous.qbit.example.perf.websocket;

import io.advantageous.qbit.admin.ManagedServiceBuilder;
import io.advantageous.qbit.annotation.RequestMapping;
import io.advantageous.qbit.annotation.http.GET;
import io.advantageous.qbit.annotation.http.PUT;
import static io.advantageous.qbit.admin.ManagedServiceBuilder.managedServiceBuilder;

 * curl  -H "Content-Type: application/json"  -X PUT http://localhost:8080/trade -d '{"name":"ibm", "amount":1}'
 * curl  http://localhost:8080/count
public class TradeService {

    private long count;

    public boolean trade(final Trade trade) {
        return true;

    public long count() {
        return count;

    public static void main(final String... args) {

        final ManagedServiceBuilder managedServiceBuilder = managedServiceBuilder();

                .addEndpointService(new TradeService())

To test this with wrk, we need a Lua script to run the PUT operations.
wrk.method = "PUT"
wrk.body   = '{"name":"ibm", "amount":1}'
wrk.headers["Content-Type"] = "application/json"

100 connections 70K TPS

$ wrk -c100 -d10s -strade.lua http://localhost:8080/trade
Running 10s test @ http://localhost:8080/trade
  2 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.45ms  199.79us   7.61ms   88.80%
    Req/Sec    34.69k     0.87k   36.02k    85.15%
  696962 requests in 10.10s, 49.19MB read
Requests/sec:  68993.88
Transfer/sec:      4.87MB
We can tweak the server a bit and reduce the flush rate or reduce the batch size to get higher throughput with lower connections. We can also tweak the OS so we can have more ephemeral ports available and then use a lot more connections. With those tweaks experience tells me that we can get close to 90K TPS or so on a MacBook Pro. Also we could test from two machines with one of those machines being a Linux server and we can get even more throughput. This test has the disadvantage of all being run on the same machine, but it will be the same disadvantage that the WebSocket version will have so it is somewhat fair. We could also employ HTTP pipelining to increase the throughput, but this trick is great for benchmarks but rarely works in production environments with real clients. On an average server, we can get close to 150K TPS to 200K TPS from experience (I will show this later perhaps).
Ok let's see how WebSocket version does on the same machine with the same server. QBitMicroservice lib supports REST and WebSocket.
In order to use WebSocket, we need to create an async interface so we can build a client proxy.

Async interface

package io.advantageous.qbit.example.perf.websocket;

import io.advantageous.qbit.reactive.Callback;

public interface TradeServiceAsync {

    void trade(Callback<Boolean> callback, final Trade trade);
    void count(Callback<Long> callback);
Here is the code to create clients and run them against the same server on the same machine.

Load testing with WebSocket client

package io.advantageous.qbit.example.perf.websocket;

import io.advantageous.boon.core.Str;
import io.advantageous.boon.core.Sys;
import io.advantageous.qbit.client.Client;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

import static io.advantageous.qbit.service.ServiceProxyUtils.flushServiceProxy;

import static io.advantageous.boon.core.IO.puts;
import static io.advantageous.qbit.client.ClientBuilder.clientBuilder;

public class TradeServiceLoadTestWebSocket {

    public static void main(final String... args) {

        /** Hold the number of clients we will run. */
        final int numClients = 3;

        /** Hold the number of calls each thread will make. */
        final int numCalls = 50_000_000;

        /** Hold the client threads to run. */
        final List<Thread> threadList = new ArrayList<>(numClients);

        /** Hold the counts to total. */
        final List<AtomicInteger> counts = new ArrayList<>();

        /** Create the client threads. */
        for (int c =0; c < numClients; c++) {
            final AtomicInteger count = new AtomicInteger();
            threadList.add(new Thread(() -> {
                runCalls(numCalls, count);

        /** Start the threads. */

        /** Grab the start time. */
        long startTime = System.currentTimeMillis();

        for (int index =0; index<1000; index++) {

            long totalCount = 0L;

            for (int c = 0; c < counts.size(); c++) {
                totalCount += counts.get(c).get();

            puts("total", Str.num(totalCount),
                    "\telapsed time", Str.num(System.currentTimeMillis()-startTime),
                    "\trate", Str.num(totalCount/(System.currentTimeMillis()-startTime)*1000));


    /** Each client will run this
     * @param numCalls number of times to make calls
     * @param count holds the total count
    private static void runCalls(final int numCalls, final AtomicInteger count) {
        final Client client = clientBuilder().setAutoFlush(false).build();

        final TradeServiceAsync tradeService = client.createProxy(TradeServiceAsync.class, "tradeservice");


        for (int call=0; call < numCalls; call++) {
            tradeService.trade(response -> {
                if (response) {
            }, new Trade("IBM", 1));

            /** Apply some back pressure so the server is not overwhelmed. */
            if (call % 10 == 0) {
                while (call - 5_000 > count.get()) {


How does WebSocket do? Quite well!
total 26,668,058    elapsed time 52,186     rate 511,000
That is a total of 1,022,000 messages a second (request / response) using just three WebSocket connections. A single WebSocket connection seems to handle around 700K TPS, and then we start running into more and more IO contention, which again can be solved by having bigger pipes or by tweaking the TCP/IP stack. But in this simple test, we can see that we have 7.5X improvement over REST by using WebSocket.
Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training