# Distributed KV Store - Production Deployment Guide

Complete guide for deploying the distributed KV store to production environments.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Local Development Setup](#local-development-setup)
- [Docker Deployment](#docker-deployment)
- [Kubernetes Deployment](#kubernetes-deployment)
- [Cloud Provider Deployments](#cloud-provider-deployments)
- [Security Hardening](#security-hardening)
- [Monitoring Setup](#monitoring-setup)
- [Backup and Recovery](#backup-and-recovery)
- [Performance Tuning](#performance-tuning)
- [High Availability Architecture](#high-availability-architecture)

---

## Prerequisites

### System Requirements

- Go 1.24 or later
- Docker 20.10+ (for containerized deployment)
- Kubernetes 1.19+ (for K8s deployment)
- 2+ GB RAM per node
- 10+ GB storage for production clusters

### Network Requirements

- Outbound HTTPS access (for cloud providers)
- Raft communication: TCP ports 7000-7010
- HTTP API: TCP port 8080 (or custom)
- Intra-node communication

### Security Prerequisites

- TLS certificates for HTTPS (optional but recommended)
- DNS resolution for node discovery
- Load balancer (optional, for HA)

---

## Local Development Setup

### Single Node for Development

```bash
# Clone repository
git clone <repo-url>
cd kvstore-simplified

# Install dependencies
go mod download

# Build binary
go build -o kvstore ./cmd/kvstore

# Run single node
./kvstore --node-id=dev --bootstrap --data-dir=./data/dev
```

### Local 3-Node Cluster

**Terminal 1 - Node 1 (Leader)**
```bash
./kvstore \
  --node-id=node1 \
  --raft-addr=localhost:7000 \
  --http-addr=:8081 \
  --data-dir=./data/node1 \
  --bootstrap
```

**Terminal 2 - Node 2**
```bash
./kvstore \
  --node-id=node2 \
  --raft-addr=localhost:7001 \
  --http-addr=:8082 \
  --data-dir=./data/node2
```

**Terminal 3 - Node 3**
```bash
./kvstore \
  --node-id=node3 \
  --raft-addr=localhost:7002 \
  --http-addr=:8083 \
  --data-dir=./data/node3
```

**Terminal 4 - Join Cluster**
```bash
# Join nodes
curl -X POST http://localhost:8081/api/join \
  -H "Content-Type: application/json" \
  -d '{"node_id":"node2","addr":"localhost:7001"}'

curl -X POST http://localhost:8081/api/join \
  -H "Content-Type: application/json" \
  -d '{"node_id":"node3","addr":"localhost:7002"}'

# Verify cluster
curl http://localhost:8081/api/stats | jq '.is_leader, .leader'
```

---

## Docker Deployment

### Build Docker Image

Create `Dockerfile`:
```dockerfile
FROM golang:1.24-alpine AS builder

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o kvstore ./cmd/kvstore

FROM alpine:latest
RUN apk --no-cache add ca-certificates

WORKDIR /root/
COPY --from=builder /app/kvstore .

EXPOSE 8080 7000

ENTRYPOINT ["./kvstore"]
CMD ["--bootstrap"]
```

Build image:
```bash
docker build -t kvstore:latest .
docker tag kvstore:latest kvstore:v1.0
```

### Single Node Docker

```bash
docker run -d \
  --name kvstore-single \
  -p 8080:8080 \
  -p 7000:7000 \
  -v kvstore-data:/root/data \
  kvstore:latest \
  --bootstrap \
  --http-addr=:8080 \
  --raft-addr=:7000
```

### Docker Compose - 3 Node Cluster

Create `docker-compose.yml`:
```yaml
version: '3.8'

services:
  node1:
    image: kvstore:latest
    container_name: kvstore-node1
    ports:
      - "8081:8080"
      - "7000:7000"
    environment:
      - NODE_ID=node1
      - RAFT_ADDR=node1:7000
      - HTTP_ADDR=:8080
      - BOOTSTRAP=true
      - DATA_DIR=/root/data
    volumes:
      - kvstore-data-node1:/root/data
    networks:
      - kvstore-network
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 5
    command: >
      --node-id node1
      --raft-addr node1:7000
      --http-addr :8080
      --bootstrap

  node2:
    image: kvstore:latest
    container_name: kvstore-node2
    ports:
      - "8082:8080"
      - "7001:7000"
    environment:
      - NODE_ID=node2
      - RAFT_ADDR=node2:7000
      - HTTP_ADDR=:8080
      - DATA_DIR=/root/data
    volumes:
      - kvstore-data-node2:/root/data
    networks:
      - kvstore-network
    depends_on:
      node1:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 5
    command: >
      --node-id node2
      --raft-addr node2:7000
      --http-addr :8080

  node3:
    image: kvstore:latest
    container_name: kvstore-node3
    ports:
      - "8083:8080"
      - "7002:7000"
    environment:
      - NODE_ID=node3
      - RAFT_ADDR=node3:7000
      - HTTP_ADDR=:8080
      - DATA_DIR=/root/data
    volumes:
      - kvstore-data-node3:/root/data
    networks:
      - kvstore-network
    depends_on:
      node1:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 5
    command: >
      --node-id node3
      --raft-addr node3:7000
      --http-addr :8080

  # Init service to join nodes
  init:
    image: curlimages/curl:latest
    container_name: kvstore-init
    depends_on:
      node1:
        condition: service_healthy
      node2:
        condition: service_healthy
      node3:
        condition: service_healthy
    networks:
      - kvstore-network
    command: >
      sh -c "
        sleep 5 &&
        curl -X POST http://node1:8080/api/join \
          -H 'Content-Type: application/json' \
          -d '{\"node_id\":\"node2\",\"addr\":\"node2:7000\"}' &&
        sleep 2 &&
        curl -X POST http://node1:8080/api/join \
          -H 'Content-Type: application/json' \
          -d '{\"node_id\":\"node3\",\"addr\":\"node3:7000\"}' &&
        echo 'Cluster initialized'
      "

volumes:
  kvstore-data-node1:
  kvstore-data-node2:
  kvstore-data-node3:

networks:
  kvstore-network:
    driver: bridge
```

Deploy:
```bash
docker-compose up -d
docker-compose logs -f
docker-compose ps
```

Verify:
```bash
curl http://localhost:8081/api/stats | jq '.is_leader'
curl http://localhost:8081/api/set -X POST \
  -H "Content-Type: application/json" \
  -d '{"key":"test","value":"docker"}'
curl http://localhost:8082/api/get?key=test
```

Stop:
```bash
docker-compose down -v  # -v removes volumes
```

---

## Kubernetes Deployment

### Namespace and RBAC

```yaml
apiVersion: v1
kind: Namespace
metadata:
  name: kvstore
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kvstore
  namespace: kvstore
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kvstore
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kvstore
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kvstore
subjects:
- kind: ServiceAccount
  name: kvstore
  namespace: kvstore
```

### StatefulSet Deployment

```yaml
apiVersion: v1
kind: Service
metadata:
  name: kvstore
  namespace: kvstore
  labels:
    app: kvstore
spec:
  clusterIP: None
  ports:
  - port: 8080
    name: http
    protocol: TCP
  - port: 7000
    name: raft
    protocol: TCP
  selector:
    app: kvstore

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kvstore
  namespace: kvstore
  labels:
    app: kvstore
spec:
  serviceName: kvstore
  replicas: 3
  selector:
    matchLabels:
      app: kvstore
  template:
    metadata:
      labels:
        app: kvstore
    spec:
      serviceAccountName: kvstore
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - kvstore
              topologyKey: kubernetes.io/hostname

      containers:
      - name: kvstore
        image: kvstore:v1.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        - containerPort: 7000
          name: raft
          protocol: TCP
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: NODE_ID
          value: $(POD_NAME)
        - name: RAFT_ADDR
          value: "$(POD_NAME).kvstore.$(POD_NAMESPACE).svc.cluster.local:7000"
        - name: HTTP_ADDR
          value: ":8080"
        - name: DATA_DIR
          value: /data
        args:
        - --node-id=$(NODE_ID)
        - --raft-addr=$(RAFT_ADDR)
        - --http-addr=$(HTTP_ADDR)
        - --data-dir=$(DATA_DIR)

        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2

        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"

        volumeMounts:
        - name: data
          mountPath: /data

      terminationGracePeriodSeconds: 30

  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      storageClassName: standard
      resources:
        requests:
          storage: 10Gi
```

### Cluster Initialization Job

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: kvstore-init
  namespace: kvstore
spec:
  backoffLimit: 3
  template:
    spec:
      serviceAccountName: kvstore
      containers:
      - name: init
        image: curlimages/curl:latest
        command:
        - /bin/sh
        - -c
        - |
          set -e
          echo "Waiting for kvstore-0..."
          sleep 20

          echo "Joining kvstore-1 to cluster..."
          curl -X POST http://kvstore-0.kvstore.kvstore.svc.cluster.local:8080/api/join \
            -H 'Content-Type: application/json' \
            -d '{"node_id":"kvstore-1","addr":"kvstore-1.kvstore.kvstore.svc.cluster.local:7000"}'

          sleep 2

          echo "Joining kvstore-2 to cluster..."
          curl -X POST http://kvstore-0.kvstore.kvstore.svc.cluster.local:8080/api/join \
            -H 'Content-Type: application/json' \
            -d '{"node_id":"kvstore-2","addr":"kvstore-2.kvstore.kvstore.svc.cluster.local:7000"}'

          echo "Cluster initialized successfully"
      restartPolicy: Never
```

Deploy to Kubernetes:
```bash
# Create namespace
kubectl create namespace kvstore

# Deploy StatefulSet
kubectl apply -f statefulset.yaml

# Wait for pods to be ready
kubectl wait --for=condition=ready pod -l app=kvstore -n kvstore --timeout=300s

# Run init job
kubectl apply -f init-job.yaml

# Check status
kubectl get statefulset -n kvstore
kubectl get pods -n kvstore
kubectl logs kvstore-0 -n kvstore
```

Verify deployment:
```bash
# Port forward to node
kubectl port-forward kvstore-0 8080:8080 -n kvstore

# Test API
curl http://localhost:8080/health
curl http://localhost:8080/api/stats | jq '.'

# Set value
curl -X POST http://localhost:8080/api/set \
  -H "Content-Type: application/json" \
  -d '{"key":"k8s","value":"deployment"}'

# Get value
curl http://localhost:8080/api/get?key=k8s
```

---

## Cloud Provider Deployments

### AWS ECS Deployment

Create `task-definition.json`:
```json
{
  "family": "kvstore",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "kvstore",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/kvstore:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080,
          "protocol": "tcp"
        },
        {
          "containerPort": 7000,
          "hostPort": 7000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "NODE_ID", "value": "node1"},
        {"name": "BOOTSTRAP", "value": "true"}
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/kvstore",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ],
  "cpu": "256",
  "memory": "512",
  "requiresCompatibilities": ["FARGATE"]
}
```

Register task:
```bash
aws ecs register-task-definition --cli-input-json file://task-definition.json
```

### Google Cloud Run Deployment

```bash
# Build and push image
docker tag kvstore:latest gcr.io/your-project/kvstore:latest
docker push gcr.io/your-project/kvstore:latest

# Deploy to Cloud Run
gcloud run deploy kvstore \
  --image gcr.io/your-project/kvstore:latest \
  --port 8080 \
  --memory 512Mi \
  --cpu 1 \
  --platform managed \
  --region us-central1 \
  --allow-unauthenticated
```

### Azure Container Instances

```bash
az container create \
  --resource-group myresourcegroup \
  --name kvstore \
  --image kvstore:latest \
  --ports 8080 7000 \
  --environment-variables \
    NODE_ID=node1 \
    BOOTSTRAP=true \
  --memory 0.5 \
  --cpu 0.5
```

---

## Security Hardening

### TLS Configuration

Update `cmd/kvstore/main.go` to support TLS:
```go
func startHTTPSServer(addr string, node *Node, certFile, keyFile string) error {
    httpServer := server.NewHTTPServer(addr, node)
    return httpServer.server.ListenAndServeTLS(certFile, keyFile)
}
```

Generate self-signed certificate:
```bash
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes
```

### Network Policies (Kubernetes)

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: kvstore-network-policy
  namespace: kvstore
spec:
  podSelector:
    matchLabels:
      app: kvstore
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: kvstore
    ports:
    - protocol: TCP
      port: 7000
  - from:
    - namespaceSelector:
        matchLabels:
          name: kvstore
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: kvstore
    ports:
    - protocol: TCP
      port: 7000
  - to:
    - podSelector:
        matchLabels:
          app: kvstore
    ports:
    - protocol: TCP
      port: 8080
```

---

## Monitoring Setup

### Prometheus Metrics

Add metrics to `cmd/kvstore/main.go`:
```bash
go get github.com/prometheus/client_golang/prometheus
```

Expose metrics endpoint:
```go
r.Handle("/metrics", promhttp.Handler())
```

### Prometheus Configuration

Create `prometheus.yml`:
```yaml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
- job_name: kvstore
  static_configs:
  - targets:
    - localhost:8081
    - localhost:8082
    - localhost:8083
```

### Grafana Dashboards

Create dashboard to monitor:
- Leader election state
- Replication lag
- Log indices
- Request latency
- Error rates

---

## Backup and Recovery

### Snapshot-based Backup

```bash
# Create snapshot (manual or automated)
curl http://localhost:8080/api/stats > state.json

# Backup Raft logs
cp -r ./data/node1 ./backups/node1-$(date +%s)
```

### Automated Backups

Create `backup.sh`:
```bash
#!/bin/bash
BACKUP_DIR="./backups"
NODE_ID="node1"
TIMESTAMP=$(date +%s)

mkdir -p $BACKUP_DIR

# Backup data directory
tar -czf $BACKUP_DIR/kvstore-$NODE_ID-$TIMESTAMP.tar.gz ./data/$NODE_ID

# Keep only last 7 days
find $BACKUP_DIR -name "kvstore-*.tar.gz" -mtime +7 -delete

echo "Backup complete: kvstore-$NODE_ID-$TIMESTAMP.tar.gz"
```

Schedule with cron:
```bash
0 2 * * * /path/to/backup.sh  # 2 AM daily
```

### Recovery Procedure

1. Stop the node
2. Extract backup
3. Restart node
4. Verify cluster state

---

## Performance Tuning

### Raft Tuning Parameters

Adjust in node initialization:
```go
raftConfig.HeartbeatTimeout = 1000 * time.Millisecond
raftConfig.ElectionTimeout = 1000 * time.Millisecond
raftConfig.CommitTimeout = 50 * time.Millisecond
raftConfig.MaxAppendEntries = 64
raftConfig.SnapshotInterval = 120 * time.Second
raftConfig.SnapshotThreshold = 8192
```

### Connection Pooling

For clients with many requests, implement connection pooling:
```go
import "net/http"

client := &http.Client{
    Transport: &http.Transport{
        MaxIdleConns:        100,
        MaxIdleConnsPerHost: 100,
        IdleConnTimeout:     90 * time.Second,
    },
}
```

---

## High Availability Architecture

### Recommended Production Setup

```
                        ┌─────────────────┐
                        │  Load Balancer  │
                        │   (HAProxy)     │
                        └────────┬────────┘
                                 │
                ┌────────────────┼────────────────┐
                │                │                │
                ▼                ▼                ▼
            ┌────────┐      ┌────────┐      ┌────────┐
            │ Node 1 │      │ Node 2 │      │ Node 3 │
            │(Leader)│      │        │      │        │
            └────┬───┘      └────┬───┘      └────┬───┘
                 │               │               │
                 └───────────────┼───────────────┘
                                 │
                        ┌────────┴────────┐
                        │  Raft Network   │
                        │  (TCP 7000+)    │
                        └─────────────────┘

        With automatic DNS failover and health checks
```

### Load Balancer Configuration (HAProxy)

```haproxy
global
    log stdout local0
    maxconn 4096

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend http-in
    bind *:80
    default_backend kvstore-nodes

backend kvstore-nodes
    balance roundrobin
    server node1 node1:8080 check inter 2000 rise 2 fall 5
    server node2 node2:8080 check inter 2000 rise 2 fall 5
    server node3 node3:8080 check inter 2000 rise 2 fall 5
```

---

## Deployment Checklist

- [ ] Security: TLS certificates installed
- [ ] Security: Network policies configured
- [ ] Security: Access controls implemented
- [ ] Monitoring: Prometheus scraping working
- [ ] Monitoring: Alerts configured
- [ ] Backup: Backup strategy implemented
- [ ] Backup: Recovery tested
- [ ] Performance: Load testing completed
- [ ] HA: Failover tested
- [ ] Documentation: Runbooks updated

---

For questions or additional deployment scenarios, refer to the main README.md.
