Container Orchestrator

Project: Container Orchestrator

Problem Statement

You're building a lightweight container orchestration system for a small DevOps team. While Kubernetes is powerful, it's too complex for your team's simple microservice deployment needs. You need a mini-orchestrator that can:

  • Manage container lifecycle
  • Schedule containers across multiple nodes
  • Perform health checks and auto-restart failed containers
  • Handle basic networking between containers
  • Provide a simple API for deployment management

Real-World Scenario:
Your team runs 20 microservices across 5 servers. Containers occasionally crash, and manual restarts are time-consuming. You need automated health monitoring and recovery.

 1# Deploy a service
 2$ orchestrator deploy --name api-server --image myapp:latest --replicas 3
 3
 4# Check status
 5$ orchestrator ps
 6NAME         STATUS    REPLICAS   HEALTH
 7api-server   running   3/3        healthy
 8worker       running   2/2        healthy
 9
10# Scale a service
11$ orchestrator scale api-server --replicas 5
12
13# View logs
14$ orchestrator logs api-server
15
16# Health check automatic restart
17$ orchestrator events
18[2024-01-15 10:23:45] Container api-server-2 failed health check
19[2024-01-15 10:23:46] Restarting api-server-2...
20[2024-01-15 10:23:50] Container api-server-2 healthy

Requirements

Functional Requirements

Must Have:

  • ✅ Container lifecycle management
  • ✅ Multi-node scheduling with resource awareness
  • ✅ Health check monitoring
  • ✅ Automatic restart of failed containers
  • ✅ Service discovery and DNS resolution
  • ✅ Basic load balancing across replicas

Should Have:

  • ✅ Rolling updates with zero downtime
  • ✅ Resource limits
  • ✅ Log aggregation from containers
  • ✅ CLI tool for management
  • ✅ REST API for programmatic access

Nice to Have:

  • Container image caching
  • Volume management
  • Network isolation with VLANs
  • Metrics collection

Non-Functional Requirements

  • Performance: Schedule containers in < 2 seconds
  • Reliability: Detect failures within 10 seconds
  • Scalability: Support up to 100 containers across 10 nodes
  • Availability: Continue operating if 1 node fails
  • Security: Basic authentication for API access

Constraints

  • Technology: Use Docker API for container operations
  • Platform: Linux-based
  • Networking: Use Docker bridge networks
  • Storage: SQLite for state management
  • Deployment: Single binary with embedded database

Design Considerations

High-Level Architecture

The orchestrator follows a master-agent architecture with these core components:

Master Node Components:

  • Scheduler: Selects optimal nodes for container placement based on resource availability
  • Health Checker: Monitors container health via HTTP, TCP, or command-based probes
  • Service Registry: Manages service definitions and enables service discovery
  • REST API: Provides programmatic access for deployment management
  • State Store: Persists orchestrator state using SQLite

Worker Node Components:

  • Docker Client: Manages container lifecycle
  • Container Runtime: Executes containers using Docker API
  • Resource Monitor: Tracks CPU and memory usage

Key Design Principles

  1. Simplicity First: Focus on essential orchestration features without Kubernetes complexity
  2. Resource-Aware Scheduling: Select nodes based on available CPU and memory
  3. Automated Recovery: Detect and restart failed containers automatically
  4. API-Driven: All operations accessible via REST API for automation
  5. Extensible: Modular design allows adding features like volume management and networking

Technical Decisions

  • Docker API: Leverages Docker SDK for Go instead of building custom container runtime
  • SQLite Storage: Lightweight embedded database eliminates external dependencies
  • Goroutines for Health Checks: Concurrent health monitoring without blocking
  • Bridge Networks: Uses Docker's built-in networking for container communication
  • Single Binary Deployment: Static compilation for easy distribution

Acceptance Criteria

The project is considered complete when it meets these criteria:

Core Functionality:

  • Deploy services with configurable replica counts
  • Schedule containers across multiple registered nodes
  • Perform HTTP and TCP health checks with configurable intervals
  • Automatically restart containers that fail health checks
  • Scale services up and down without downtime
  • Retrieve container logs through API

API Requirements:

  • POST /api/deploy creates new services
  • GET /api/services lists all deployed services
  • POST /api/services/{name}/scale adjusts replica count
  • GET /api/services/{name}/logs retrieves container logs
  • GET /api/events shows orchestration events

Performance Requirements:

  • Container scheduling completes in under 2 seconds
  • Health check failures detected within 10 seconds
  • System supports 100 containers across 10 nodes
  • Continues operating when 1 node fails

Quality Requirements:

  • Unit tests cover scheduler, health checker, and registry
  • Integration tests deploy real containers
  • Code follows Go best practices and passes go vet
  • README includes setup instructions and API documentation

Usage Examples

Deploy a Service

 1# Deploy API server with 3 replicas
 2curl -X POST http://localhost:8080/api/deploy \
 3  -H "Content-Type: application/json" \
 4  -d '{
 5    "name": "api-server",
 6    "image": "myapp:latest",
 7    "replicas": 3,
 8    "ports": [{"container_port": 8080, "host_port": 8080, "protocol": "tcp"}],
 9    "resources": {"cpu_shares": 1024, "memory_mb": 512},
10    "health_check": {
11      "type": "http",
12      "endpoint": "http://localhost:8080/health",
13      "interval": "10s",
14      "timeout": "2s",
15      "retries": 3
16    }
17  }'

List Services

1curl http://localhost:8080/api/services

Response:

 1[
 2  {
 3    "name": "api-server",
 4    "image": "myapp:latest",
 5    "replicas": 3,
 6    "containers": [
 7      {
 8        "id": "abc123",
 9        "name": "api-server-0",
10        "status": "running",
11        "health": "healthy",
12        "node_id": "node-1"
13      }
14    ]
15  }
16]

Scale a Service

1curl -X POST http://localhost:8080/api/services/api-server/scale \
2  -H "Content-Type: application/json" \
3  -d '{"replicas": 5}'

View Logs

1curl http://localhost:8080/api/services/api-server/logs

Monitor Events

1curl http://localhost:8080/api/events

Response:

1[
2  {
3    "timestamp": "2024-01-15T10:23:45Z",
4    "type": "health_check",
5    "service": "api-server",
6    "container": "api-server-2",
7    "message": "Container failed health check, restarting..."
8  }
9]

Key Takeaways

After completing this project, you will have gained:

Container Orchestration Skills:

  • Understanding of how container orchestrators work internally
  • Experience with Docker API and container lifecycle management
  • Knowledge of scheduling algorithms and resource allocation
  • Insight into health checking and automated recovery strategies

Distributed Systems Concepts:

  • Multi-node coordination and service discovery
  • Fault tolerance and failure handling
  • State management in distributed systems
  • Event-driven architecture for monitoring

Production Engineering:

  • Building API control planes for infrastructure tools
  • Implementing graceful shutdowns and signal handling
  • Designing for testability with integration tests
  • Creating maintainable Go project structures

Practical Skills:

  • Using Docker SDK for Go
  • Building REST APIs with Gorilla Mux
  • Concurrent programming with goroutines and channels
  • Working with SQLite for state persistence

Next Steps

Extend the Project

  1. Rolling Updates: Implement zero-downtime deployments with gradual rollout
  2. Volume Management: Add persistent storage support for stateful services
  3. Web Dashboard: Build a React/Vue UI for visual monitoring
  4. Metrics Export: Add Prometheus metrics for observability
  5. Multi-Node Support: Implement agent nodes running on separate machines

Explore Advanced Features

  1. Container Networking: Implement custom overlay networks for isolation
  2. Load Balancing: Add service-level load balancing across replicas
  3. Auto-Scaling: Automatically adjust replicas based on CPU/memory usage
  4. Config Management: Support ConfigMaps and Secrets like Kubernetes
  5. Log Aggregation: Stream logs from all containers to centralized storage
  • Study Kubernetes internals and control plane architecture
  • Explore service mesh technologies
  • Learn about container networking
  • Investigate distributed consensus
  • Practice with production orchestrators

Download Complete Solution

📦 Download Complete Solution

Get the full implementation with detailed README, setup instructions, and deployment guides:

⬇️ Download Solution

Includes: Complete source code, Docker integration, REST API implementation, health checking system, comprehensive tests, Dockerfile, Makefile, docker-compose.yml, and detailed README with architecture documentation and implementation guide.