Architecture

An high level description of Samsara's components and how they play together

The idea behind

The basic idea behind the project is to have an analytics capable system which has got everything included out of the box.

For those who like to measure things and want to understand better how their user base interacts with their products, we prepared a system which out of the box will give you:

  • a fast, scalable solution to ingest user/machine generated events
  • a real-time processing pipeline with a collection of common processing tools
  • an interactive frontend user interface to explore your data-set in real time.

Contrarily to most analytics systems, Samsara doesn’t aggregate data during the ingestion phase. We leave the aggregation part at query time which gives you more flexibility on choosing which events need to be aggregated and how.

Samsara’s components

Samsara is composed of 4 major parts: the ingestion APIs, the real-time processing pipeline, the live index and query APIs, and the frontend data exploration tool.

There are several other components which are used for internal house keeping.

Overall Architecture

At the top of the stack we find the Ingestion APIs. This tier is an elastically scalable layer of RESTful web services. They respond to the client SDKs and accept the payload which is composed by one or more events. These events are validated and then sent to a high throughput queueing system such as Apache Kafka.

Kafka stores the data locally for a certain amount of time, then using eviction policies the data is removed. This interesting property allows the consumer to rewind the topic back in time up to the maximum amount of storage defined in the policy.

For additional durability all the data stored into Kafka is constantly pulled and stored into a deep storage such as HDFS, AWS S3 or Azure Blobs.

Every event sent to Kafka is then processed by Samsara’s CORE. Here events are filtered, enriched and correlated to produce a much richer stream of data. Samsara uses a key-value store, typically an in-memory k/v-store, which is backed by a Kafka stream for durability, but you can optionally spill out to a external database such Cassandra. Samsara CORE uses this in-memory kv-store for transient processing data. The output of Samsara is then sent into another Kafka topic ready to be indexed.

Qanal, our parallel indexer, takes the enriched streams of events and stores them into ElasticSearch.

Once available in the index, the data is immediately queryable by the frontend Kibana which allows to slice-and-dice the data as you need, it computes aggregations and creates powerful real-time dashboards.

Cloud independent vs Cloud Native

Each part of the system can be scaled independently focusing the power in the areas which require it most. The system is build on top of container technology such as Docker, and it can run into Clouds such as AWS and Azure, as well as in premises or your own data-center.

When running in a cloud you are able to choose between a fully cloud independent solution which will leverage only the Virtual machine system of the cloud or you can swap one or more component to use cloud native offering.

The following table shows which native components can be used in a cloud solution:

Component Amazon Web Services Azure Cloud
Kafka Amazon Kinesis EventsHub
Cassandra Amazon DynamoDB Azure Table
Deep storage Amazon S3 Azure Blob Storage

When running in the cloud this is always better to use the load balancers offered, unless you expect your traffic to have long flat lines followed by huge spikes. In that case it is better to run your own load balancers and we recommend HAProxy.