OTEL & SpanMetrics The Right Way!

Understanding how to configure dimensions in OpenTelemetry’s SpanMetrics connector is crucial for creating effective, efficient metrics from spans. This guide explains the fundamental concepts and best practices for avoiding common pitfalls like cardinality explosion.

What is a Metric Series?

A metric series (also called a time series) is a unique stream of metric data points identified by:

A metric name (e.g., ci_span_metrics_calls)
A unique combination of dimension labels (e.g., {service.name="api", ci.job.name="build", ci.status="success"})

Example: Understanding Metric Series

Metric name: ci_span_metrics_calls

Three different metric series for the same metric:

Series 1: ci_span_metrics_calls{service.name="api", ci.job.name="build", ci.status="success"}
Series 2: ci_span_metrics_calls{service.name="api", ci.job.name="build", ci.status="failed"}
Series 3: ci_span_metrics_calls{service.name="api", ci.job.name="test", ci.status="success"}

Even though they share the same metric name, each combination of label values creates a separate series that tracks its own count over time.

How Series Work with SpanMetrics

When a span arrives, spanmetricsconnector:

Extracts the dimension values from the span
Builds a unique key from ALL dimension values
Finds or creates the metric series for that key
Increments the counter for that series by 1

    key := p.buildKey(serviceName, span, callsDimensions, resourceAttr)
    attributesFun := func() pcommon.Map {
        return p.buildAttributes(serviceName, span, resourceAttr, callsDimensions, ils.Scope())
    }
    // aggregate sums metrics
    s, limitReached := sums.GetOrCreate(key, attributesFun, startTimestamp)
    if !limitReached && p.config.Exemplars.Enabled && !span.TraceID().IsEmpty() {
        s.AddExemplar(span.TraceID(), span.SpanID(), duration)
    }
    s.Add(1)

Good Aggregation (Proper Dimensions)

Span 1: {job="build", status="success"} → Series A count: 1
Span 2: {job="build", status="success"} → Series A count: 2  ✅ Aggregated!
Span 3: {job="build", status="success"} → Series A count: 3  ✅ Aggregated!

Result: 1 series with meaningful count

Bad Aggregation (With Timestamps)

Span 1: {job="build", started_at="100"} → Series A count: 1
Span 2: {job="build", started_at="101"} → Series B count: 1  ❌ New series!
Span 3: {job="build", started_at="102"} → Series C count: 1  ❌ New series!

Result: 3 series, each with count 1 (no aggregation)

Cardinality Impact

The number of possible metric series (cardinality) is the product of all dimension values:

# Example with good dimensions:
10 jobs × 3 statuses × 5 projects = 150 possible series ✅ Manageable

# Example with timestamp dimensions:
10 jobs × 3 statuses × 1000 unique timestamps = 30,000 series ❌ Explosion!

High cardinality causes:

Excessive memory usage
Poor query performance
Storage problems in metrics backends
Most series having count = 1 (no aggregation benefit)

Visual Impact: Aggregation vs No Aggregation

✅ WITH PROPER DIMENSIONS (Good Aggregation)

10 Spans arrive with: {job="build", status="success"}

    Span 1 ──┐
    Span 2 ──┤
    Span 3 ──┤
    Span 4 ──┼──► Series A: {job="build", status="success"} → COUNT: 10
    Span 5 ──┤
    Span 6 ──┤
    Span 7 ──┤
    Span 8 ──┤
    Span 9 ──┤
    Span 10 ─┘

Result: 1 metric series with value 10

✅ Useful for queries: “How many build successes?”
✅ Storage: 1 time series to track
✅ Memory: Minimal

❌ WITH TIMESTAMP DIMENSIONS (No Aggregation)

Same 10 Spans, but each has unique timestamp:

    Span 1 ──► Series A: {job="build", started_at="100"} → COUNT: 1
    Span 2 ──► Series B: {job="build", started_at="101"} → COUNT: 1
    Span 3 ──► Series C: {job="build", started_at="102"} → COUNT: 1
    Span 4 ──► Series D: {job="build", started_at="103"} → COUNT: 1
    Span 5 ──► Series E: {job="build", started_at="104"} → COUNT: 1
    Span 6 ──► Series F: {job="build", started_at="105"} → COUNT: 1
    Span 7 ──► Series G: {job="build", started_at="106"} → COUNT: 1
    Span 8 ──► Series H: {job="build", started_at="107"} → COUNT: 1
    Span 9 ──► Series I: {job="build", started_at="108"} → COUNT: 1
    Span 10 ─► Series J: {job="build", started_at="109"} → COUNT: 1

Result: 10 metric series, each with value 1

❌ Useless for queries: “How many build successes?”
❌ Storage: 10 time series to track (10x overhead!)
❌ Memory: Grows linearly with span count

Over time with 1000 spans:

► 1000 series, each with count 1 ❌
► Instead of 1 series with count 1000 ✅

Impact Over Time

Time	Good Dimensions	Bad Dimensions (Timestamps)
Minute 1	1 series, count=100	100 series, count=1 each
Minute 2	1 series, count=200	200 series, count=1 each
Minute 3	1 series, count=300	300 series, count=1 each
Day 1	1 series, count=144k	144,000 series, count=1 each ⚠️
Month 1	1 series, count=4.3M	4,300,000 series, count=1 each 💥

Key Insight: With timestamps as dimensions, you’re creating a new metric series for every single span instead of aggregating them. You lose all the benefits of metrics!

What is a Metric Series?#

Example: Understanding Metric Series#

How Series Work with SpanMetrics#

Good Aggregation (Proper Dimensions)#

Bad Aggregation (With Timestamps)#

Cardinality Impact#

Visual Impact: Aggregation vs No Aggregation#

✅ WITH PROPER DIMENSIONS (Good Aggregation)#

❌ WITH TIMESTAMP DIMENSIONS (No Aggregation)#

Impact Over Time#