Observability in Go: OpenTelemetry & Metrics

Go Observability & OTel: The Telemetry Mirror
In the world of microservices, a single user request can spark a chain reaction across dozens of services. When a request fails or becomes slow, finding the cause is impossible without Observability.
Unlike simple "Monitoring" (which tells you IF something is wrong), "Observability" tells you WHY something is wrong by correlating three pillars: Metrics, Logs, and Traces. In 2026, the industry standard for this is OpenTelemetry (OTel). Because Go is designed for distributed systems, its OTel integration is among the most performant and type-safe in the world.
1. The Three Pillars of Observability
1. Metrics (How many?)
Aggregated numeric data over time. Metrics tell you the Quantity of things:
- How many requests per second?
- What is the 99th percentile latency?
- How much RAM are we consuming?
- Tool: Prometheus (via the OpenTelemetry Exporter).
2. Traces (Where?)
The journey of a single request across the system. Traces tell you the Path:
- Which service called which?
- Where exactly in the chain did the
5-seconddelay happen? - Tool: Jaeger or Zipkin.
A record of discrete events. Logs tell you the Detail:
- "User 123 failed to login due to an invalid password hash."
1. The Telemetry Mirror: Tracing Physics
Distributed tracing is the "Execution Mirror" that follows a request across different physical machines.
The Span Physics
- The Trace Mirror: A complete trace represents the entire lifecycle of a request.
- The Span Geometry: Each step (e.g., a DB query, an API call) is a "Span." Spans have a start time, an end time, and metadata (attributes).
- The Linkage Mirror: Spans are parented to each other, creating a tree-like mirror of the execution path. In Go, this is handled by the
context.Contextmirror, which carries the "Trace State" across function boundaries.
2. The Hardware-Mirror: The Tax of Insight
Instrumenting your code has a physical cost. Observability is essentially Information Leakage—you are taking internal state and pushing it to an external system.
The CPU Interrupt Cost
- Stopwatches everywhere: Every time you start a Trace Span or increment a Metric counter, you are calling
time.Now(). This triggers a syscall (on some kernels) or reads the hardware VDSO. Doing this1,000,000times a second consumes CPU Cycles. - Context Propagation: Passing Trace IDs across goroutines and over the network (B3 or W3C headers) adds bytes to your NIC's throughput.
- Memory Buffering: OpenTelemetry doesn't send data instantly (that would be inefficient). It buffers Spans in RAM before "Flushing" them in batches.
Hardware-Mirror Rule: For high-throughput services, use Sampling. You don't need to trace every single request. Tracing 1% of requests gives you a statistically perfect view of your latency distribution while reducing the "Observability Tax" on your Hardware by 99%.
3. Implementing Metrics with Prometheus in Go
Prometheus expects your Go app to expose a /metrics HTTP endpoint. Go's runtime already has excellent metrics (GC duration, Goroutine count) built-in.
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
httpRequestsTotal = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Count of all HTTP requests",
},
)
)
func main() {
prometheus.MustRegister(httpRequestsTotal)
http.Handle("/metrics", promhttp.Handler())
// ... start server ...
}6. Distributed Tracing with OpenTelemetry (OTel)
The magic of OTel in Go is the Context propagation. By passing the context.Context object between functions and services, the Trace ID is automatically preserved.
Instrumentation Example:
func processOrder(ctx context.Context, orderID string) {
// Start a new span from the existing context
ctx, span := otel.Tracer("order-service").Start(ctx, "processOrder")
defer span.End() // Make sure to end the span!
span.SetAttributes(attribute.String("order.id", orderID))
// Call downstream service - the Trace ID follows!
callInventoryService(ctx, orderID)
}7. The "Red-Line" Dashboard: GOLDEN Signals
When building your observability dashboards (Grafana), focus on the Four Golden Signals:
- Latency: Time it takes to service a request.
- Traffic: Demand placed on the system (RPS).
- Errors: The rate of requests that fail.
- Saturation: How "Full" your hardware is (CPU limit, Disk I/O, Network queue).
If your "Saturation" is at 95% but your "Latency" is still low, you are one spike away from a total system failure. This "Hardware-Signal" is your early warning system.
6. Observability in Production: The "Z-Level" Detail
In 2026, we also look at Execution Profiling as part of observability.
Go's net/http/pprof allows you to see:
- Which function is currently hogging the CPU?
- Where is the most memory being allocated?
- Are we blocked on a Mutex (Lock Contention)?
This combined with OTel Tracing gives you Quantum Visibility—you can see exactly what a single user did, and exactly which line of code was responsible for the 2ms delay they experienced.
Summary: Master of the Pulse
- Standardize on OTel: Don't lock yourself into a single vendor; use OpenTelemetry to keep your instrumentation neutral.
- Context is Everything: Always pass
ctxto your functions to preserve the Trace ID. - Gold Over Silver: Prioritize the Golden Signals (Latency, Traffic, Errors, Saturation) on your primary dashboards.
- Sample Responsibly: Protect your hardware and network by only tracing what you need to understand the system.
You can now "See" into the heart of your microservices. You are ready for the ultimate architectural pattern: Domain-Driven Design.
Frequently Asked Questions
Q: What is OpenTelemetry and why should Go services use it?
OpenTelemetry (OTel) is a vendor-neutral observability framework that provides APIs and SDKs for collecting traces, metrics, and logs. Using OTel in Go means you instrument your code once with the go.opentelemetry.io/otel packages and export to any backend — Jaeger, Zipkin, Prometheus, Datadog, Honeycomb — by swapping the exporter configuration. This avoids vendor lock-in and lets you change your observability stack without touching application code.
Q: How do you add distributed tracing to a Go HTTP service with OpenTelemetry?
Initialize a tracer provider with your chosen exporter (e.g., OTLP to a Jaeger collector), set it as the global provider, and wrap your HTTP server with the OTel middleware: otelhttp.NewHandler(mux, "service-name"). Inside handlers, start child spans with tracer.Start(ctx, "operation-name") and defer span.End(). Propagate context through your call stack — pass ctx to every function and database call so spans nest correctly. The go.opentelemetry.io/contrib packages provide ready-made instrumentation for net/http, gRPC, and popular databases.
Q: What is the difference between traces, metrics, and logs in OpenTelemetry?
Traces track the journey of a single request across services — a tree of spans showing where time was spent and where errors occurred, essential for debugging latency issues in distributed systems. Metrics are aggregated numerical measurements over time (request rate, error rate, p99 latency) — ideal for dashboards and alerts. Logs are timestamped records of discrete events — useful for debugging specific errors. OTel links all three: spans can carry log events (span.AddEvent), and exemplars connect metric data points to specific traces.
Part of the Go Mastery Course — engineering the pulse.
Phase 27: Observability & OTel Mastery Checklist
- Verify Context Sovereignty: Ensure that every span is created from the request's context. Never use
context.Background()for sub-spans, as it breaks the trace mirror. - Audit Sampling Strategy: Confirm that you are using a probabilistic sampler (e.g., 1% or 10%) for high-volume endpoints to prevent the observability mirror from crushing your performance.
- Implement Sovereign Labels: Add business-relevant attributes (e.g.,
user_id,tenant_id) to your spans to enable multidimensional filtering in your telemetry mirror. - Test Propagation Geometry: Use a tool like Jaeger to verify that traces correctly span multiple microservices without losing their parentage.
- Use Standard Semantic Conventions: Follow the OTel Semantic Conventions for attribute naming to ensure your telemetry remains compatible with all analysis mirrors.
Read next: Go Reflection & Unsafe: The Deep-Space Mirror →
