Saturday, May 2, 2015

Microservices Runtime Statistics and Metrics

Reactive Microservices Architecture and Runtime Statistics & Metrics

Runtime statistics and metrics are important for distributed systems. Since microservices architecture tend to promote and encourage remote process communication, they are inherently distributed systems. Runtime statistics and metrics can include request per second, available system memory, number of threads being used, connections that are open, failed authentication, expired tokens, and their ilk. If there is a parameter that is important to you, then you will want to track it. Given the complications of debugging a distributed system, you will find that runtime statistics of important parameters are a godsend.

Microservices Architecture Statistics
This is even more the case if you’re dealing with a lot of message queues. It can be difficult to determine where a message stopped being processed, and runtime statistics can help you track down issues.

Runtime statistics and metrics can also be a data-stream to your big data systems. Understanding types of request and their count and being able to correlate those with time of day, and events can aid in understanding how people use your application. In the age of big data, data science, and micro services, one may conclude that runtime statistics is no longer an optional feature, but a required feature for application development with an increasingly mobile and cloud world.

Just like logging became a must-have for applications so has runtime statistics. The runtime statistics can be important for tracking errors and when a certain threshold of errors occur a circuit breaker can be thrown open.

Remote calls and messages buses can fail, or hang without a response until a timeout is reached. In the event of a system that is down, a multitude of timeouts can cause a cascading failure. The Circuit Breaker pattern can be used to prevent a catastrophic cascade. Runtime statistics can be used to track errors and and trigger circuit breakers to open. You would want to use runtime statistics, and circuit breaker with service discovery so that you can mark nodes as unhealthy.

You can use runtime statistics to do application-specific things like rate limiting a partners Application ID so that they do not consume your resources outside of the bounds of their service agreements. Once you make microservices publicly available, you have to monitor and rate limit to collaborate effectively with partners. If you have ever used a public REST API, you are well aware of rate limiting which may do things like limit the number of connections you’re allowed to make and/or limit the number of certain requests that you were allowed to make in a given time period.
If you believe in the concepts of the reactive manifesto then you will want to gather runtime statistics that allow your systems to write reactive microservices.

QBit StatsService

QBit is a reactive mircoservices library that comes with a runtime statistics engine. QBit services are exposed via WebSocket RPC using JSON and REST. The statistics engine is easy to query and use. The QBit service engine’s stats engine can be integrated with StatsD for display and consumption of stats. There are tools and vendors who deal directly with StatsD feeds. You can also query QBit stats and use them to implement features like rate limiting, or spinning up new nodes when you detect things are getting overloaded.

StatsD the standard stats engine

StatsD is a network daemon for aggregating statistics, such as counters and timers, and shipping over UDP to backend services, such as Graphite or Datadog. In less than 5 years since it was first introduced, StatsD has become an important tool to aid in debugging, and monitoring microservices. If you are doing DevOps, then you are likely using StatsD.

StatsD was a simple daemon developed and released by Etsy. StatsD is used to aggregate and summarize application metrics. StatsD has a plethora of clients for various programming languages (ruby, python, java, erlang, node, scala, go, haskell, etc.). StatsD daemon collects stats from these clients using a published wire protocol. StatsD is so popular that it is a universal protocol for application metrics collection. The Etsy StatsD Daemon is the reference implementation, but there are other implementations like Go Stats Daemon and many more.

StatsD captures different types of metrics: Gauges, Counters, Timing Summary Statistics, and Sets. You can decorate your code to capture this type of data and report it.

A StatsD daemon listens to UDP traffic from StatsD clients. StatsD collects runtime statistics data over time and does periodic “flushes” of the data to analysis and monitoring engines you choose.
Tools can turn your runtime statistics and metrics into actionable charts and alert. Tools like Graphite are often used to visualize the state of microservices. Graphite is made up of Graphite-Web that renders graphs and dashboards, Carbon metric processing daemons, and Whisper which is a time-series database library.

There are other alternatives that QBit can integrate with as well like Coda Hale’s Metrics library which uses a Go Daemon.

StatsD seems to be the current champion of mind space. Mainly due to its simplicity and fire-and-forget protocol. StatsD can’t cause as cascading failure, and its client libs are very small.

Datadog and StatsD

Datadog allows importing of StatsD for graphing, alerting, and event correlation. They embedded the StatsD daemon within the Datadog Agent so it is a drop in replacement. Datadog added tagging to StatsD which allows information to the metrics like application version, event correlation, and more. Datadog is a monitoring service for IT, Operations, Development and DevOps. It attempts to take input from many vendors, cloud providers, open source tools, servers, and aggregate their data into actionable metrics.

StatsD and Kibana

Kibana is a flexible analytics and visualization platform for elastic search. It provides real-time summary and charting of streaming data from a variety of sources including logstash. Kibana has an Intuitive interface which allows you to configure dashboards. Kibana can be used to graph data from logstash which uses elastic search. Logstash has a plugin for StatsD. Kibana allows you to visualize streams of data from Elasticsearch from Logstash, es-hadoop or 3rd party technologies like Apache Flume, Fluentd, and many others.

StatsD and SOLR and Banana

LucidWorks ported Kibana, called Banana and Logstash to work with SOLR so if you are a SOLR shop, you have that as an option.


Runtime Statistics and Metrics are a very important component of microservices architecture. They help you debug, understand and react to events in your application. They help you build circuit breakers. Make sure that runtime statistics are not treated like an after thought in your microservices lib, but rather part of the core. Tools like StatsD and Code Hale Statistics allow you to gather metrics in a standard way. Tools like Graphite, Kibana, DataDog and Banana help you understand the data, and build dashboards. QBit, the Java Microservices Library, includes a queryable stats service which can feed into StatsD/CodeHale Metrics or can be used to implement features like rate limiting.

Read more here:

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training