Module 36: Service Discovery with Consul

TT
Module 36: Service Discovery with Consul

Module 36: Service Discovery with Consul

In a cloud-native environment, hardware is fluid. Instances are created, moved, and destroyed by schedulers like Kubernetes or AWS Auto Scaling Groups. You cannot rely on static IP addresses. You need a Service Registry that acts as the "Phonebook" of your infrastructure.

While Netflix Eureka (Module 31) is popular, HashiCorp Consul provides a more robust, multi-datacenter-ready alternative that includes healthy service discovery, key-value storage, and a service mesh.


1. The Consul Architecture: Agents and Servers

Consul operates using a distributed cluster model. Unlike Eureka (which is a passive registry), Consul is active.

  • Consul Agent: A lightweight process that runs on every physical hardware node in your cluster. It is responsible for health checking the services running on that specific host.
  • Consul Server: The "Brain" of the cluster. A small group (typically 3 or 5) of servers that maintain the state using the Raft Consensus Algorithm.
  • Quorum: For a write to be successful, a majority of servers must agree. This ensures consistency even if a hardware node fails.

2. Hardware-Mirror: The Gossip Protocol (LAN & WAN)

Consul uses a Gossip Protocol (based on Serf) to manage cluster membership and broadcast events. This is where the hardware meets the software.

The Physics of Discovery:

  1. UDP Background Hum: Agents talk to each other using small UDP packets. Even when your application is idle, your NIC (Network Interface Card) is constantly processing "Member Joined" or "Member Alive" packets.
  2. CPU Interrupts: Each incoming gossip packet triggers a hardware interrupt. In a cluster of 1,000 nodes, the sheer volume of "Gossip Noise" can consume 1-2% of your System CPU just managing the registry.
  3. Convergence Time: When a node fails (e.g., a physical RAM error), the gossip protocol propagates this news. The time it takes for the whole cluster to "Know" depends on the Network Latency between your hardware racks.

Hardware-Mirror Rule: Ensure your network switch fabric supports high-volume UDP traffic. In multi-datacenter setups, place Consul Servers on high-IOPS hardware with NVMe SSDs to ensure the Raft log can be written to disk with sub-millisecond latency.


3. Implementation: Service Registration

In the Spring ecosystem, Consul integration is seamless. You simply add the starter, and the application will self-register on startup.

Maven Dependencies

xml

Configuration (application.yml)

yaml

4. Health Checks: The Hardware-Software Heartbeat

Consul's discovery is "Health-Aware." If a service instance is running but its database connection is broken, Consul will stop routing traffic to it.

Common Check Types:

  • HTTP: Consul Agent calls a URL (e.g., /actuator/health). If it returns 200, the instance is "Healthy."
  • TCP: Consul attempts to open a socket on a specific port.
  • TTL (Time To Live): The application must "Check-in" with Consul periodically. If it fails to do so (e.g., due to a JVM Freeze/Long GC pause), Consul marks it as failed.

Hardware-Mirror Insight: Frequent health checks (e.g., every 1 second) provide fast failover but create constant I/O load on your application and Network overhead on the bus. Balance your check interval based on your hardware's capacity for context switching.


5. Consul as a Config Source (KV Store)

Beyond discovery, Consul includes a distributed Key-Value (KV) Store. This can be an alternative to the Spring Cloud Config Server (Module 35).

Why use Consul for Config?

  • Real-time Updates: Consul uses "Long Polling" to watch for changes. When you update a key in the Consul UI, the change travels to the application in milliseconds.
  • Consistency: Values are stored via Raft, ensuring that all nodes see the same configuration.
yaml

6. Consul vs. Eureka: The Hardware Tradeoff

FeatureNetflix EurekaHashiCorp Consul
Consistency ModelAP (Availability/Partition) - Eventually ConsistentCP (Consistency/Partition) - Strongly Consistent
Hardware DensityPassive; servers can be small.Active; servers need high IOPS/CPU for Raft.
Failover SpeedSlower (Heartbeats + Cache Expiry).Faster (Gossip + Active Probing).
Multi-DatacenterDifficult to coordinate.Native WAN Gossip support.

Hardware Selection: Use Eureka if you have unreliable hardware or a "Scale-at-all-costs" mentality where correctness is secondary to speed. Use Consul if you have a stable infrastructure and require strong guarantees that a service is actually healthy before sending traffic.


7. Service Mesh: The Future with Consul Connect

In advanced "Hardware-Mirror" setups, Consul provides Connect, which handles service-to-service security via mTLS (Mutual TLS).

  1. Consul generates and distributes certificates to every node.
  2. Hardware-level encryption occurs in a "Sidecar Proxy" (like Envoy).
  3. Your Java application thinks it's talking over plain HTTP, but the hardware is actually encrypting bits over the wire using AES-NI instructions.

8. Summary

Consul is the "Nervous System" of your microservice cluster. By leveraging the Gossip Protocol for discovery and Raft for consistency, it provides a rock-solid foundation for high-availability systems. Understanding the hardware tax of gossip noise and the importance of health-check intervals allows you to tune Consul for maximum reliability.

In the next module, Module 37: Fault Tolerance with Resilience4j, we'll see how to handle the inevitable failures that occur when Service Discovery reports a node is "Healthy" but it's actually slow.


Next Steps:

  1. Run Consul in a Docker container: docker run -d -p 8500:8500 consul.
  2. Enable spring-cloud-starter-consul-discovery in an existing Spring Boot app.
  3. Open the Consul UI at http://localhost:8500 and watch your service register.
  4. Kill the application process and see how quickly Consul detects the "Critical" state.