Astra Help

Architecture

Design goals, system components, and data architecture for Astra.

Goals

Astra is a cloud-native search and analytics engine for log, trace, and audit data. It is designed to be easy to operate, cost-effective, and scale to petabytes of data.

  • Native support for log, trace, audit use cases.

  • Aggressively prioritize ingest of recent data over older data.

  • Full-text search capability.

  • First-class Kubernetes support for all components.

  • Autoscaling of ingest and query capacity.

  • Coordination free ingestion, so failure of a single node does not impact ingestion.

  • Works out of the box with sensible defaults.

  • Designed for zero data loss.

  • First-class Grafana support with accompanying plugin.

  • Built-in multi-tenancy, supporting several small use-cases on a single cluster.

  • Supports the majority of Apache Lucene features.

  • Drop-in replacement for most OpenSearch log use cases.

  • General-purpose search cases, such as for an ecommerce site.

  • Document mutability - records are expected to be append only.

  • Additional storage engines other than Lucene.

  • Support for JVM versions other than the current LTS.

  • Supporting multiple Lucene versions.

System overview

Astra is a Lucene based log search engine built using an architecture similar to the Aggregator/Leaf/Tailer pattern. This approach allows us to separate the compute from durability and storage. The durability of non-indexed data is provided by Kafka, and the storage for indexed data is provided by S3.

Since Astra indexers are stateless, we can dynamically scale indexer nodes during peak hours to guarantee real-time ingestion of logs. This helps us tackle the peak ingestion, and prioritize ingesting fresh logs over older logs.

http
http
kafka
kafka
cache
cache1
cacheN
recovery
recovery1
recoveryN
indexers
indexer1
indexerN
bulk ingest
preprocessor
grafana
query
S3
manager
Zookeeper

Twitter LogLens

The Astra architecture largely follows an internal approach developed at Twitter around 2015 named LogLens. A few key differences with Astra to this architecture revolve around using S3 instead of HDFS, and leveraging OpenSearch for the query engine.

LogLens was discontinued at Twitter around 2021 in favor of Splunk, due to "limited resource investment".

LogLens architecture
LogLens Architecture
Last modified: 03 September 2024