What is the difference between tracing, metrics, and logging in Go observability?

Traces record the path of a request through your system - each operation is a span with timing, attributes, and parent-child relationships. Metrics are numerical measurements aggregated over time - request rate, error rate, latency percentiles, goroutine count. Logs are timestamped records of discrete events. All three are complementary: metrics alert you that something is wrong, traces show where, and logs show what.

How do I propagate trace context across Go services?

Inject the trace context into outgoing HTTP or gRPC headers using OTel propagators (otel.GetTextMapPropagator().Inject). Extract it from incoming requests in middleware (otel.GetTextMapPropagator().Extract). The W3C Trace Context standard (traceparent header) is the default propagation format and is supported by all major observability platforms. Without propagation, traces break at service boundaries.

How do I add custom spans and attributes to Go code with OTel?

Get a tracer with tracer := otel.Tracer('service-name'), then start a span: ctx, span := tracer.Start(ctx, 'operation-name'). Add attributes with span.SetAttributes(attribute.String('key', 'value')). Always defer span.End() to ensure the span is finished. Pass the context through all function calls so child operations can create child spans automatically. Record errors with span.RecordError(err) and span.SetStatus(codes.Error, err.Error()).

← Back to Go Mastery Hub

Go Observability & OTel: The Telemetry Mirror

In the world of microservices, a single user request can spark a chain reaction across dozens of services. When a request fails or becomes slow, finding the cause is impossible without Observability.

Unlike simple "Monitoring" (which tells you IF something is wrong), "Observability" tells you WHY something is wrong by correlating three pillars: Metrics, Logs, and Traces. In 2026, the industry standard for this is OpenTelemetry (OTel). Because Go is designed for distributed systems, its OTel integration is among the most performant and type-safe in the world.

1. The Three Pillars of Observability

1. Metrics (How many?)

Aggregated numeric data over time. Metrics tell you the Quantity of things:

How many requests per second?
What is the 99th percentile latency?
How much RAM are we consuming?
Tool: Prometheus (via the OpenTelemetry Exporter).

2. Traces (Where?)

The journey of a single request across the system. Traces tell you the Path:

Which service called which?
Where exactly in the chain did the 5-second delay happen?
Tool: Jaeger or Zipkin.

A record of discrete events. Logs tell you the Detail:

"User 123 failed to login due to an invalid password hash."

1. The Telemetry Mirror: Tracing Physics

Distributed tracing is the "Execution Mirror" that follows a request across different physical machines.

The Span Physics

The Trace Mirror: A complete trace represents the entire lifecycle of a request.
The Span Geometry: Each step (e.g., a DB query, an API call) is a "Span." Spans have a start time, an end time, and metadata (attributes).
The Linkage Mirror: Spans are parented to each other, creating a tree-like mirror of the execution path. In Go, this is handled by the context.Context mirror, which carries the "Trace State" across function boundaries.

2. The Hardware-Mirror: The Tax of Insight

Instrumenting your code has a physical cost. Observability is essentially Information Leakage-you are taking internal state and pushing it to an external system.

The CPU Interrupt Cost

Stopwatches everywhere: Every time you start a Trace Span or increment a Metric counter, you are calling time.Now(). This triggers a syscall (on some kernels) or reads the hardware VDSO. Doing this 1,000,000 times a second consumes CPU Cycles.
Context Propagation: Passing Trace IDs across goroutines and over the network (B3 or W3C headers) adds bytes to your NIC's throughput.
Memory Buffering: OpenTelemetry doesn't send data instantly (that would be inefficient). It buffers Spans in RAM before "Flushing" them in batches.

Hardware-Mirror Rule: For high-throughput services, use Sampling. You don't need to trace every single request. Tracing 1% of requests gives you a statistically perfect view of your latency distribution while reducing the "Observability Tax" on your Hardware by 99%.

3. Implementing Metrics with Prometheus in Go

Prometheus expects your Go app to expose a /metrics HTTP endpoint. Go's runtime already has excellent metrics (GC duration, Goroutine count) built-in.

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    httpRequestsTotal = prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Count of all HTTP requests",
        },
    )
)

func main() {
    prometheus.MustRegister(httpRequestsTotal)
    
    http.Handle("/metrics", promhttp.Handler())
    // ... start server ...
}

6. Distributed Tracing with OpenTelemetry (OTel)

The magic of OTel in Go is the Context propagation. By passing the context.Context object between functions and services, the Trace ID is automatically preserved.

Instrumentation Example:

func processOrder(ctx context.Context, orderID string) {
    // Start a new span from the existing context
    ctx, span := otel.Tracer("order-service").Start(ctx, "processOrder")
    defer span.End() // Make sure to end the span!

    span.SetAttributes(attribute.String("order.id", orderID))
    
    // Call downstream service - the Trace ID follows!
    callInventoryService(ctx, orderID)
}

7. The "Red-Line" Dashboard: GOLDEN Signals

When building your observability dashboards (Grafana), focus on the Four Golden Signals:

Latency: Time it takes to service a request.
Traffic: Demand placed on the system (RPS).
Errors: The rate of requests that fail.
Saturation: How "Full" your hardware is (CPU limit, Disk I/O, Network queue).

If your "Saturation" is at 95% but your "Latency" is still low, you are one spike away from a total system failure. This "Hardware-Signal" is your early warning system.

6. Observability in Production: The "Z-Level" Detail

In 2026, we also look at Execution Profiling as part of observability. Go's net/http/pprof allows you to see:

Which function is currently hogging the CPU?
Where is the most memory being allocated?
Are we blocked on a Mutex (Lock Contention)?

This combined with OTel Tracing gives you Quantum Visibility-you can see exactly what a single user did, and exactly which line of code was responsible for the 2ms delay they experienced.

Summary: Master of the Pulse

Standardize on OTel: Don't lock yourself into a single vendor; use OpenTelemetry to keep your instrumentation neutral.
Context is Everything: Always pass ctx to your functions to preserve the Trace ID.
Gold Over Silver: Prioritize the Golden Signals (Latency, Traffic, Errors, Saturation) on your primary dashboards.
Sample Responsibly: Protect your hardware and network by only tracing what you need to understand the system.

You can now "See" into the heart of your microservices. You are ready for the ultimate architectural pattern: Domain-Driven Design.

Frequently Asked Questions

Q: What is OpenTelemetry and why should Go services use it? OpenTelemetry (OTel) is a vendor-neutral observability framework that provides APIs and SDKs for collecting traces, metrics, and logs. Using OTel in Go means you instrument your code once with the go.opentelemetry.io/otel packages and export to any backend - Jaeger, Zipkin, Prometheus, Datadog, Honeycomb - by swapping the exporter configuration. This avoids vendor lock-in and lets you change your observability stack without touching application code.

Q: How do you add distributed tracing to a Go HTTP service with OpenTelemetry? Initialize a tracer provider with your chosen exporter (e.g., OTLP to a Jaeger collector), set it as the global provider, and wrap your HTTP server with the OTel middleware: otelhttp.NewHandler(mux, "service-name"). Inside handlers, start child spans with tracer.Start(ctx, "operation-name") and defer span.End(). Propagate context through your call stack - pass ctx to every function and database call so spans nest correctly. The go.opentelemetry.io/contrib packages provide ready-made instrumentation for net/http, gRPC, and popular databases.

Q: What is the difference between traces, metrics, and logs in OpenTelemetry? Traces track the journey of a single request across services - a tree of spans showing where time was spent and where errors occurred, essential for debugging latency issues in distributed systems. Metrics are aggregated numerical measurements over time (request rate, error rate, p99 latency) - ideal for dashboards and alerts. Logs are timestamped records of discrete events - useful for debugging specific errors. OTel links all three: spans can carry log events (span.AddEvent), and exemplars connect metric data points to specific traces.

Part of the Go Mastery Course - engineering the pulse.

Phase 27: Observability & OTel Mastery Checklist

Verify Context Sovereignty: Ensure that every span is created from the request's context. Never use context.Background() for sub-spans, as it breaks the trace mirror.
Audit Sampling Strategy: Confirm that you are using a probabilistic sampler (e.g., 1% or 10%) for high-volume endpoints to prevent the observability mirror from crushing your performance.
Implement Sovereign Labels: Add business-relevant attributes (e.g., user_id, tenant_id) to your spans to enable multidimensional filtering in your telemetry mirror.
Test Propagation Geometry: Use a tool like Jaeger to verify that traces correctly span multiple microservices without losing their parentage.
Use Standard Semantic Conventions: Follow the OTel Semantic Conventions for attribute naming to ensure your telemetry remains compatible with all analysis mirrors.