Understanding cardinality is crucial for building effective observability systems. This concept, borrowed from set theory, directly impacts the performance and cost of your monitoring infrastructure.

The Mathematical Origin

Cardinality comes from set theory and refers to the number of distinct elements in a set. In mathematics, the cardinality of a set A is denoted as |A|.

Simple Example

Consider a metric with two dimensions: service and status.

Initial state:

  • service: {api, web, db} → 3 values
  • status: {success, error} → 2 values

Cardinality = 3 × 2 = 6 unique metric series

1
2
3
4
5
6
Series 1: {service="api", status="success"}
Series 2: {service="api", status="error"}
Series 3: {service="web", status="success"}
Series 4: {service="web", status="error"}
Series 5: {service="db", status="success"}
Series 6: {service="db", status="error"}

Now add a high-cardinality dimension:

  • service: {api, web, db} → 3 values
  • status: {success, error} → 2 values
  • user_id: {user1, user2, …, user1000} → 1000 values

New cardinality = 3 × 2 × 1000 = 6,000 unique metric series

Each unique combination of dimension values creates a separate time series that must be tracked, stored, and queried.

Impact on Observability

Memory and Storage

High cardinality metrics consume exponentially more resources:

  • Low cardinality (6 series): Minimal memory, fast queries
  • High cardinality (6,000 series): 1000x more memory, slower queries

Query Performance

As cardinality grows, query performance degrades:

1
2
3
4
5
# Fast: Filtering 6 series
rate(http_requests_total{service="api"}[5m])

# Slow: Filtering 6,000 series
rate(http_requests_total{service="api", user_id="user123"}[5m])

Best Practices

  1. Avoid high-cardinality dimensions: Don’t use user_id, request_id, or timestamps as labels
  2. Use aggregation: Group similar events together (e.g., by status instead of individual errors)
  3. Monitor cardinality: Track the number of unique series per metric
  4. Use logs for high-cardinality data: Save detailed, unique identifiers for log analysis, not metrics

The Cardinality Explosion Problem

When a dimension has unbounded or very large value sets (like user IDs, request IDs, or IP addresses), you create a cardinality explosion:

  • Each unique value multiplies the total series count
  • Most series will have count=1 (no aggregation benefit)
  • Storage and query costs grow linearly with unique values

Key insight: Metrics should aggregate, not enumerate. If you need to track every unique occurrence, use logs or traces instead.