Distributed Key-Value Store

Project: Distributed Key-Value Store

Problem Statement

Modern distributed applications require highly available, fault-tolerant data storage systems that can maintain consistency across multiple nodes while handling network partitions and node failures. Traditional single-node databases don't meet these requirements, creating a need for distributed consensus-based storage systems.

This project challenges you to build a production-grade distributed key-value store that uses the Raft consensus algorithm to ensure strong consistency guarantees while maintaining availability in the face of failures.

The Challenge:
Design and implement a distributed key-value store that can:

  • Maintain consistency across multiple nodes using Raft consensus
  • Handle node failures without data loss
  • Provide linearizable read/write operations
  • Support horizontal scaling through data sharding
  • Enable efficient inter-node communication via gRPC
  • Automatically elect leaders and replicate logs
  • Provide client libraries with transparent failover

Requirements

Functional Requirements

Core Storage Operations:

  • Get: Retrieve value for a given key
  • Set: Store key-value pair with replication
  • Delete: Remove key-value pair across cluster
  • List: Query keys by prefix pattern

Distributed Consensus:

  • Implement Raft consensus for state machine replication
  • Leader election with timeout-based voting
  • Log replication from leader to followers
  • Snapshot mechanism for log compaction
  • Handle network partitions and split-brain scenarios

Node Communication:

  • gRPC-based inter-node RPC for Raft operations
  • RESTful HTTP API for client interactions
  • Service discovery and cluster membership management
  • Heartbeat mechanism for failure detection

Client Library:

  • Simple API for Get/Set/Delete operations
  • Automatic leader discovery and failover
  • Connection pooling and retry logic
  • Support for both synchronous and asynchronous operations

Non-Functional Requirements

Consistency:

  • Linearizable reads and writes
  • All operations go through Raft consensus
  • Guaranteed ordering of operations
  • No lost updates or stale reads from leader

Availability:

  • System remains available with+1 nodes operational
  • Automatic recovery from transient failures
  • Graceful degradation during network partitions
  • Read operations can be served from any node

Performance:

  • Sub-10ms latency for local operations
  • Handle 10,000+ operations per second per node
  • Efficient snapshot creation and restoration
  • Minimal overhead for consensus protocol

Fault Tolerance:

  • Survive up to/2 node failures
  • Automatic log replication to maintain durability
  • Persistent storage with crash recovery
  • Corruption detection and repair

Scalability:

  • Support for 3-7 node clusters
  • Data sharding for horizontal scaling
  • Efficient memory usage with log compaction
  • Resource limits for storage and memory

Technical Constraints

Technology Stack:

  • Go 1.19+ for implementation
  • hashicorp/raft library for Raft consensus
  • gRPC with Protocol Buffers for RPC
  • BadgerDB or BoltDB for persistent storage
  • Docker for containerized deployment

CAP Theorem Positioning:

  • CP system: Prioritize Consistency and Partition Tolerance
  • During network partition: Minority partition becomes unavailable
  • Trade-off: Strong consistency over availability during splits

Storage Backend:

  • Pluggable storage interface
  • Support for embedded databases
  • Efficient key-value operations
  • Snapshot and restore capabilities

Design Considerations

System Architecture Overview

The distributed key-value store follows a leader-follower architecture based on the Raft consensus algorithm:

Cluster Architecture:

┌──────────────┐         ┌──────────────┐         ┌──────────────┐
│   Client     │────────►│   Leader     │────────►│  Follower 1  │
│              │         │     │         │     │
└──────────────┘         └──────────────┘         └──────────────┘
                                │                         │
                                │      Log Replication    │
                                └─────────────────────────┘
                                          │
                                          ▼
                                 ┌──────────────┐
                                 │  Follower 2  │
                                 │     │
                                 └──────────────┘

Core Components

Raft Consensus Layer:

  • Manages leader election and term transitions
  • Replicates log entries from leader to followers
  • Maintains state machine consistency across nodes
  • Handles snapshot creation for log compaction

Storage Engine:

  • Embedded key-value database
  • Finite State Machine for applying Raft log entries
  • Snapshot mechanism for efficient recovery
  • ACID guarantees for local operations

Network Communication:

  • gRPC for efficient inter-node RPC
  • HTTP REST API for client requests
  • Protocol Buffers for message serialization
  • TCP transport for Raft consensus messages

Client Library:

  • Automatic leader discovery
  • Transparent failover to new leader
  • Connection pooling for performance
  • Retry logic with exponential backoff

Key Design Decisions

Consistency Model:

  • All writes go through Raft consensus
  • Reads can optionally bypass consensus for performance
  • Strong consistency guarantees for critical operations

Failure Handling:

  • Quorum-based approach requires+1 nodes for availability
  • Automatic leader election on failure detection
  • Log replication ensures durability before acknowledgment

Performance Optimizations:

  • Batch log entries for reduced network overhead
  • Asynchronous replication with configurable timeouts
  • Log compaction through snapshots
  • Read-only operations can be served locally

Scalability Approach:

  • Consistent hashing for data sharding
  • Read replicas for read scaling
  • Connection pooling to minimize overhead

Acceptance Criteria

Core Functionality

  • All CRUD operations work correctly
  • Operations are replicated across all nodes in the cluster
  • Data persists across node restarts
  • Client can connect to any node and operations succeed

Raft Consensus

  • Leader election completes within 5 seconds of cluster start
  • New leader elected within 10 seconds of leader failure
  • Log entries replicated to majority before acknowledgment
  • Followers apply log entries in correct order
  • Snapshot creation works and reduces log size
  • Cluster recovers correctly from snapshot

Fault Tolerance

  • System remains available with 1 node failure
  • System remains available with 2 node failures
  • No data loss when minority of nodes fail
  • Automatic recovery when failed node rejoins
  • Handles network partitions correctly

Client Library

  • Client automatically discovers leader node
  • Client retries operations on leader failure
  • Client fails over to new leader transparently
  • Connection pooling reduces overhead
  • Proper error handling and timeout management

Performance

  • Write operations complete in <50ms on local network
  • Read operations complete in <10ms from leader
  • System handles 5,000+ ops/sec per node
  • Snapshot creation completes in <30 seconds for 1GB data
  • Log compaction keeps memory usage under 500MB

Testing

  • Unit tests for all core components
  • Integration tests for cluster operations
  • Chaos tests for node failures and network partitions
  • Performance benchmarks for throughput and latency
  • Test coverage above 70%

Documentation

  • README with setup and usage instructions
  • Architecture documentation with diagrams
  • API reference for client library
  • Deployment guide for Docker and Kubernetes
  • Troubleshooting guide for common issues

Usage Examples

Starting the Cluster

 1# Start a 3-node cluster using Docker Compose
 2docker-compose up -d
 3
 4# Verify cluster health
 5curl http://localhost:8081/health
 6curl http://localhost:8082/health
 7curl http://localhost:8083/health
 8
 9# Check Raft status
10curl http://localhost:8081/api/stats

Basic Operations via HTTP API

 1# Set a key-value pair
 2curl -X POST http://localhost:8081/api/set \
 3  -H "Content-Type: application/json" \
 4  -d '{"key":"user:1001","value":"John Doe"}'
 5
 6# Response: {"success": true}
 7
 8# Get a value
 9curl http://localhost:8081/api/get?key=user:1001
10
11# Response: {"value": "John Doe", "found": true}
12
13# List keys by prefix
14curl http://localhost:8081/api/list?prefix=user:
15
16# Response: {"pairs": [{"key": "user:1001", "value": "John Doe"}]}
17
18# Delete a key
19curl -X DELETE http://localhost:8081/api/delete?key=user:1001
20
21# Response: {"success": true}

Using the Go Client Library

 1package main
 2
 3import (
 4    "fmt"
 5    "log"
 6
 7    "github.com/yourusername/kvstore/client"
 8)
 9
10func main() {
11    // Connect to cluster
12    nodes := []string{
13        "localhost:9001",
14        "localhost:9002",
15        "localhost:9003",
16    }
17
18    c, err := client.NewClient(nodes)
19    if err != nil {
20        log.Fatal(err)
21    }
22    defer c.Close()
23
24    // Set a value
25    err = c.Set("hello", "world")
26    if err != nil {
27        log.Fatal(err)
28    }
29    fmt.Println("Set hello=world")
30
31    // Get a value
32    value, err := c.Get("hello")
33    if err != nil {
34        log.Fatal(err)
35    }
36    fmt.Printf("Get hello: %s\n", value)
37
38    // Delete a value
39    err = c.Delete("hello")
40    if err != nil {
41        log.Fatal(err)
42    }
43    fmt.Println("Deleted hello")
44}

Testing Fault Tolerance

 1# Stop the leader node
 2docker-compose stop node1
 3
 4# Operations should still work
 5curl -X POST http://localhost:8082/api/set \
 6  -H "Content-Type: application/json" \
 7  -d '{"key":"test","value":"fault-tolerant"}'
 8
 9# Verify data persisted
10curl http://localhost:8082/api/get?key=test
11
12# Restart the node
13docker-compose start node1
14
15# Verify it rejoins and syncs data
16curl http://localhost:8081/api/get?key=test

Monitoring Cluster State

 1package main
 2
 3import (
 4    "fmt"
 5    "net/http"
 6    "encoding/json"
 7)
 8
 9func main() {
10    // Check Raft stats
11    resp, _ := http.Get("http://localhost:8081/api/stats")
12    var stats map[string]interface{}
13    json.NewDecoder(resp.Body).Decode(&stats)
14
15    fmt.Printf("State: %s\n", stats["state"])
16    fmt.Printf("Leader: %s\n", stats["leader"])
17    fmt.Printf("Term: %v\n", stats["term"])
18    fmt.Printf("Commit Index: %v\n", stats["commit_index"])
19}

Key Takeaways

Distributed Systems Concepts

Raft Consensus Algorithm:

  • Leader election ensures single source of truth for writes
  • Log replication maintains consistency across nodes
  • Term-based voting prevents split-brain scenarios
  • Quorum requirements balance availability and consistency

Fault Tolerance Principles:

  • System survives/2 node failures in N-node cluster
  • Majority quorum ensures consistency during partitions
  • Automatic recovery when nodes rejoin cluster
  • No single point of failure with proper replication

CAP Theorem in Practice:

  • CP system: Consistency and Partition Tolerance prioritized
  • During network partition, minority becomes unavailable
  • Strong consistency guarantees prevent split-brain
  • Trade-off is explicit and understood

Implementation Insights

Raft Integration:

  • hashicorp/raft library provides robust consensus implementation
  • Finite State Machine applies log entries to storage
  • Snapshot mechanism enables efficient recovery and log compaction
  • Transport layer handles network communication between nodes

gRPC for Performance:

  • Efficient binary protocol reduces overhead
  • Protocol Buffers provide type-safe serialization
  • Streaming support for bulk operations
  • Connection pooling minimizes latency

Client Design Patterns:

  • Automatic leader discovery through retry logic
  • Transparent failover improves reliability
  • Connection pooling reduces overhead
  • Exponential backoff prevents thundering herd

Production Considerations

Monitoring and Observability:

  • Track Raft state
  • Monitor replication lag between nodes
  • Alert on quorum loss or leader election failures
  • Measure operation latency and throughput

Operational Challenges:

  • Node addition/removal requires careful coordination
  • Snapshot creation can impact performance temporarily
  • Network partitions require human intervention to resolve
  • Capacity planning for log growth and storage

Security Considerations:

  • TLS for inter-node communication
  • Authentication for client requests
  • Authorization for sensitive operations
  • Audit logging for compliance

Next Steps

Immediate Enhancements

  1. Data Sharding:

    • Implement consistent hashing for horizontal scaling
    • Route operations to correct shard
    • Rebalance data when nodes added/removed
  2. Read Replicas:

    • Support eventual consistency for read scaling
    • Follower read optimization with staleness bounds
    • Read-your-writes consistency guarantees
  3. Advanced Features:

    • Multi-key transactions with 2PC
    • Time-to-live for automatic expiration
    • Secondary indexes for range queries
    • Compression for storage efficiency

Production Hardening

  1. Security:

    • TLS/mTLS for all communication
    • Authentication via JWT or mTLS
    • Role-based access control
    • Encryption at rest for sensitive data
  2. Observability:

    • Prometheus metrics for cluster health
    • Distributed tracing for operations
    • Structured logging with correlation IDs
    • Grafana dashboards for visualization
  3. Operational Tools:

    • CLI for cluster management
    • Admin API for node operations
    • Backup and restore automation
    • Cluster topology visualization

Advanced Distributed Systems

  1. Multi-Region Deployment:

    • Cross-datacenter replication
    • Regional failure handling
    • Latency-aware routing
    • Geo-distributed consensus
  2. Performance Optimization:

    • Batch operations for throughput
    • Pipeline parallelism for replication
    • Zero-copy optimizations
    • Custom allocators for reduced GC pressure
  3. Compatibility:

    • Redis protocol compatibility
    • etcd API compatibility
    • Kubernetes operator for deployment
    • Cloud-native integrations

Download Complete Solution

📦 Download Complete Solution

Get the full implementation with all source code, tests, and deployment configuration:

⬇️ Download Solution

Includes:

  • Complete Go source code
  • Protocol Buffer definitions and generated code
  • Raft FSM implementation with snapshots
  • gRPC and HTTP server implementations
  • Client library with automatic failover
  • Docker Compose setup for 3-node cluster
  • Comprehensive test suite
  • README with architecture guide and API documentation

The README contains detailed implementation guide, architecture diagrams, deployment instructions, and troubleshooting tips.


Project Status: This is an advanced checkpoint project that demonstrates production-grade distributed systems engineering with Go. Completing this project proves your ability to build fault-tolerant, consensus-based systems.

Estimated Effort: 14-20 hours for full implementation including testing and documentation.

Learning Path: After completing this project, proceed to Observability Platform or Container Orchestrator to continue building production systems expertise.