# Container Orchestrator - Architecture Deep Dive

## System Architecture Diagram

```
┌──────────────────────────────────────────────────────────────────┐
│                         REST API Layer                           │
│        (HTTP Endpoints: Deploy, Scale, Monitor, Logs)            │
└─────────────────────────┬──────────────────────────────────────┘
                          │
┌─────────────────────────┴──────────────────────────────────────┐
│                    Orchestrator Core Engine                    │
├──────────────┬──────────────┬──────────────┬──────────────────┤
│  Scheduler   │   Registry   │   Health     │  Docker Client   │
│              │              │   Checker    │                  │
│  - Node      │ - Service    │ - Monitors   │ - Create/Start   │
│    Selection │   Discovery  │ - Failures   │ - Stop/Remove    │
│  - Resource  │ - Tracking   │ - Callbacks  │ - Log Streaming  │
│    Planning  │ - Load Bal.  │              │                  │
└──────┬───────┴──────┬───────┴──────┬───────┴────────┬─────────┘
       │              │              │                │
       ▼              ▼              ▼                ▼
  ┌─────────┐  ┌─────────┐  ┌──────────┐  ┌──────────────┐
  │ Nodes   │  │Services │  │Container │  │  Docker      │
  │         │  │         │  │ Monitor  │  │  Engine      │
  │ Pools   │  │ Catalog │  │ Goroutine│  │              │
  └─────────┘  └─────────┘  └──────────┘  └──────────────┘
```

## Component Interactions

### 1. Request Flow: Service Deployment

```
Client Request (Deploy Service)
        │
        ▼
  API Handler
  │ - Parse JSON
  │ - Validate input
  │ - Check for duplicates
        │
        ▼
  Service Registry
  │ - Create new service record
  │ - Initialize container list
        │
        └────────────┬─────────────────┐
                     │                 │
                     ▼                 ▼
              For each replica:
              │
              ├─1. Scheduler
              │   │ - Select best node
              │   │ - Check capacity
              │   ▼
              │
              ├─2. Docker Client
              │   │ - Pull image
              │   │ - Create container
              │   │ - Configure resources
              │   │ - Set labels
              │   ▼
              │
              ├─3. Start Container
              │   │ - Transition to "running"
              │   ▼
              │
              ├─4. Register in Service
              │   │ - Add to service's container list
              │   ▼
              │
              └─5. Health Monitoring (if configured)
                  │ - Start monitoring goroutine
                  │ - Register failure callback
                  ▼

        ▼
  Event Log
  │ - Record "deployment" event
  │ - Store timestamp and details
        │
        ▼
  HTTP Response (201 Created)
  │ - Return service details
  │ - Include container info
```

### 2. Health Check Flow

```
Health Checker
│
├─ StartMonitoring(containerID, config, callback)
│  │
│  ├─ Create HealthMonitor
│  ├─ Store in checks map
│  └─ Spawn goroutine
│
└─ Monitor Goroutine (per container)
   │
   ├─ Ticker: Check every interval
   │
   ├─ Loop iteration:
   │  ├─ performCheck()
   │  │  ├─ HTTP: GET request with timeout
   │  │  ├─ TCP: Connection attempt
   │  │  └─ Exec: Command execution
   │  │
   │  ├─ Result:
   │  │  ├─ Success: Reset failure count, status="healthy"
   │  │  │
   │  │  └─ Failure:
   │  │     ├─ Increment failure count
   │  │     ├─ if failures >= retries:
   │  │     │  ├─ Set status="unhealthy"
   │  │     │  └─ Call callback(containerID)
   │  │     │     │
   │  │     │     └─ API Handler's handleContainerFailure()
   │  │     │        ├─ Stop container
   │  │     │        ├─ Remove from service
   │  │     │        └─ Log event
   │  │     │
   │  │     └─ Wait for next check
   │
   └─ StopMonitoring(containerID)
      └─ Send signal to stop channel
```

### 3. Scaling Flow

```
Scale Request (POST /api/services/{name}/scale)
│
├─ Parse desired replicas
├─ Get current service
├─ Calculate delta
│
└─ if delta > 0 (scale up):
   │
   └─ For each new replica:
      ├─ Scheduler.SelectNode()
      ├─ Docker.CreateContainer()
      ├─ Docker.StartContainer()
      ├─ Registry.AddContainer()
      └─ Health.StartMonitoring()

└─ if delta < 0 (scale down):
   │
   └─ For each container to remove:
      ├─ Health.StopMonitoring()
      ├─ Docker.StopContainer()
      ├─ Docker.RemoveContainer()
      └─ Registry.RemoveContainer()

└─ Log scale event
```

## Data Structures

### Service Registry Structure

```
Registry {
  services: map[string]*Service {
    "web-app": Service {
      Name: "web-app",
      Image: "nginx:alpine",
      Replicas: 3,
      Containers: [
        Container {
          ID: "abc123",
          Name: "web-app-0",
          Status: "running",
          Health: "healthy",
          NodeID: "local-node"
        },
        Container {
          ID: "def456",
          Name: "web-app-1",
          Status: "running",
          Health: "healthy",
          NodeID: "local-node"
        },
        Container {
          ID: "ghi789",
          Name: "web-app-2",
          Status: "running",
          Health: "unhealthy",
          NodeID: "local-node"
        }
      ],
      HealthCheck: {
        Type: "http",
        Endpoint: "http://localhost:80/health",
        Interval: 10s,
        Timeout: 5s,
        Retries: 3
      },
      CreatedAt: 2024-01-15T10:00:00Z
    }
  },
  mu: sync.RWMutex
}
```

### Scheduler Structure

```
Scheduler {
  nodes: map[string]*Node {
    "local-node": Node {
      ID: "local-node",
      Address: "localhost",
      CPUCores: 4,
      MemoryMB: 8192,
      Available: true,
      LastSeen: 2024-01-15T10:05:30Z
    }
  },
  mu: sync.RWMutex
}
```

### Health Checker Structure

```
Checker {
  checks: map[string]*HealthMonitor {
    "abc123": HealthMonitor {
      ContainerID: "abc123",
      Config: HealthCheckConfig {...},
      Status: "healthy",
      FailureCount: 0,
      LastCheck: 2024-01-15T10:05:25Z,
      StopCh: chan
    }
  },
  mu: sync.RWMutex
}
```

## Threading Model

### Concurrency Patterns

```
Main Goroutine
│
├─ HTTP Server (net/http)
│  ├─ Listener goroutine
│  └─ Handler goroutine per request
│
├─ Health Check Goroutines (1 per monitored container)
│  ├─ Ticker-driven loop
│  ├─ Independent failure counting
│  └─ Callback on failure
│
└─ Signal Handler
   ├─ Waits for SIGINT/SIGTERM
   └─ Initiates graceful shutdown
```

### Synchronization Primitives

1. **sync.RWMutex in Scheduler**
   - Protects `nodes` map
   - Read locks for SelectNode() and GetNodes()
   - Write locks for RegisterNode() and UnregisterNode()
   - Many readers, single writer pattern

2. **sync.RWMutex in Registry**
   - Protects `services` map
   - Read locks for Get(), List(), Discover()
   - Write locks for Register(), Unregister(), AddContainer(), RemoveContainer()

3. **sync.RWMutex in Health Checker**
   - Protects `checks` map
   - Prevents concurrent access to health monitors
   - Ensures consistent status updates

4. **Channels in Health Monitor**
   - `StopCh` signals graceful shutdown
   - Select statement allows timeout + stop

## Memory Layout

### Heap Allocations

```
main()
├─ Docker Client (once)
├─ Scheduler (once)
├─ Registry (once)
├─ Health Checker (once)
│  └─ Health Monitors (one per container)
├─ API Handler (once)
├─ HTTP Server (once)
│
└─ Request-scoped
   ├─ Each container creation
   ├─ JSON marshaling/unmarshaling
   └─ Event logging
```

### Resource Limits

- **Container Memory**: Enforced via Docker limits
- **CPU Shares**: Relative weight via Docker
- **Goroutines**: 1 per health monitor + 1 per HTTP request
- **Event Log**: Circular buffer (max 100 events)

## Error Handling Strategy

### Error Propagation

```
Error Handling Hierarchy
├─ Validation Errors (400 Bad Request)
│  ├─ Missing required fields
│  ├─ Invalid JSON
│  └─ Invalid health check config
│
├─ Not Found Errors (404)
│  ├─ Service doesn't exist
│  └─ Container doesn't exist
│
├─ Scheduling Errors (500)
│  ├─ No suitable node found
│  └─ Insufficient resources
│
├─ Docker Errors (500)
│  ├─ Image pull failure
│  ├─ Container creation failure
│  ├─ Container start failure
│  └─ API communication error
│
└─ Internal Errors (500)
   ├─ Race condition recovery
   ├─ Timeout errors
   └─ Unexpected state

Error Response Format:
{
  "error": "Descriptive error message",
  "code": "ErrorCode",
  "details": "Additional context"
}
```

## Performance Considerations

### Algorithmic Complexity

```
Operation              | Time Complexity | Space Complexity
-----------------------|-----------------|------------------
SelectNode()          | O(N)            | O(1)
Discover()            | O(M)            | O(M)  (M = containers)
CreateContainer()     | O(1)            | O(1)
StartMonitoring()     | O(1)            | O(1)
StopMonitoring()      | O(1)            | O(1)
GetStatus()           | O(1)            | O(1)
DeployHandler()       | O(R*N)          | O(R)  (R = replicas)
ScaleHandler()        | O(delta*N)      | O(delta)
```

### Optimization Opportunities

1. **Node Selection**: Use heap for large node count
2. **Health Checks**: Connection pooling for HTTP checks
3. **Image Caching**: Cache image list locally
4. **Event Log**: Use ring buffer instead of slice
5. **Goroutine Pooling**: Limit health check goroutines

## Failure Recovery

### Container Failure Recovery

```
Container Failure Detected
    │
    ├─ Health check times out
    ├─ Failure count increments
    ├─ if failures >= retries:
    │  ├─ Status becomes "unhealthy"
    │  └─ onFailure callback triggered
    │
    └─ handleContainerFailure()
       ├─ Log event
       ├─ Stop container (5s timeout)
       ├─ Remove container
       ├─ Update service registry
       └─ Manual re-deployment needed
          (Auto-recovery not implemented)
```

### Graceful Shutdown

```
SIGINT/SIGTERM Received
    │
    ├─ Stop accepting new requests
    ├─ Wait for in-flight requests (30s timeout)
    │
    ├─ Stop health monitors
    │  └─ Send StopCh signal to each goroutine
    │
    ├─ Close Docker client
    │  └─ Release daemon connection
    │
    └─ Exit (0 = success, 1 = timeout)
```

## Security Architecture

### Access Control

- No authentication (add API gateway for production)
- Docker socket access required (privileged)
- All containers get orchestrator labels

### Network Isolation

- Containers can communicate via Docker network
- No built-in service mesh (could add)
- Port mappings expose to host

### Data Sensitivity

- No secrets management (use Docker secrets)
- Environment variables passed to containers
- Event log stores all operations (audit trail)

## Scalability Limits

### Current Design Limits

```
Single Node Orchestrator
├─ Typical capacity: 50-100 containers
├─ Scheduling latency: <100ms
├─ Health check frequency: 1/10s per container
└─ Event log: 100 most recent events

Scaling Bottlenecks:
├─ Single-node scheduling
├─ Docker socket bandwidth
├─ Health check interval
└─ Event log size
```

### Path to Multi-Node

1. Replace local scheduler with distributed consensus
2. Add node discovery (heartbeat-based)
3. Implement service mesh for inter-container networking
4. Add persistent state storage (etcd, Consul)
5. Implement leader election (Raft)

---

**Version**: 1.0.0
**Last Updated**: 2024-01-15
