Project: Container Orchestrator

Problem Statement

You're building a lightweight container orchestration system for a small DevOps team. While Kubernetes is powerful, it's too complex for your team's simple microservice deployment needs. You need a mini-orchestrator that can:

Manage container lifecycle
Schedule containers across multiple nodes
Perform health checks and auto-restart failed containers
Handle basic networking between containers
Provide a simple API for deployment management

Real-World Scenario:
Your team runs 20 microservices across 5 servers. Containers occasionally crash, and manual restarts are time-consuming. You need automated health monitoring and recovery.

 1# Deploy a service
 2$ orchestrator deploy --name api-server --image myapp:latest --replicas 3
 3
 4# Check status
 5$ orchestrator ps
 6NAME         STATUS    REPLICAS   HEALTH
 7api-server   running   3/3        healthy
 8worker       running   2/2        healthy
 9
10# Scale a service
11$ orchestrator scale api-server --replicas 5
12
13# View logs
14$ orchestrator logs api-server
15
16# Health check automatic restart
17$ orchestrator events
18[2024-01-15 10:23:45] Container api-server-2 failed health check
19[2024-01-15 10:23:46] Restarting api-server-2...
20[2024-01-15 10:23:50] Container api-server-2 healthy

Requirements

Functional Requirements

Must Have:

✅ Container lifecycle management
✅ Multi-node scheduling with resource awareness
✅ Health check monitoring
✅ Automatic restart of failed containers
✅ Service discovery and DNS resolution
✅ Basic load balancing across replicas

Should Have:

✅ Rolling updates with zero downtime
✅ Resource limits
✅ Log aggregation from containers
✅ CLI tool for management
✅ REST API for programmatic access

Nice to Have:

Container image caching
Volume management
Network isolation with VLANs
Metrics collection

Non-Functional Requirements

Performance: Schedule containers in < 2 seconds
Reliability: Detect failures within 10 seconds
Scalability: Support up to 100 containers across 10 nodes
Availability: Continue operating if 1 node fails
Security: Basic authentication for API access

Constraints

Technology: Use Docker API for container operations
Platform: Linux-based
Networking: Use Docker bridge networks
Storage: SQLite for state management
Deployment: Single binary with embedded database

Design Considerations

High-Level Architecture

The orchestrator follows a master-agent architecture with these core components:

Master Node Components:

Scheduler: Selects optimal nodes for container placement based on resource availability
Health Checker: Monitors container health via HTTP, TCP, or command-based probes
Service Registry: Manages service definitions and enables service discovery
REST API: Provides programmatic access for deployment management
State Store: Persists orchestrator state using SQLite

Worker Node Components:

Docker Client: Manages container lifecycle
Container Runtime: Executes containers using Docker API
Resource Monitor: Tracks CPU and memory usage

Key Design Principles

Simplicity First: Focus on essential orchestration features without Kubernetes complexity
Resource-Aware Scheduling: Select nodes based on available CPU and memory
Automated Recovery: Detect and restart failed containers automatically
API-Driven: All operations accessible via REST API for automation
Extensible: Modular design allows adding features like volume management and networking

Technical Decisions

Docker API: Leverages Docker SDK for Go instead of building custom container runtime
SQLite Storage: Lightweight embedded database eliminates external dependencies
Goroutines for Health Checks: Concurrent health monitoring without blocking
Bridge Networks: Uses Docker's built-in networking for container communication
Single Binary Deployment: Static compilation for easy distribution

Acceptance Criteria

The project is considered complete when it meets these criteria:

Core Functionality:

Deploy services with configurable replica counts
Schedule containers across multiple registered nodes
Perform HTTP and TCP health checks with configurable intervals
Automatically restart containers that fail health checks
Scale services up and down without downtime
Retrieve container logs through API

API Requirements:

POST /api/deploy creates new services
GET /api/services lists all deployed services
POST /api/services/{name}/scale adjusts replica count
GET /api/services/{name}/logs retrieves container logs
GET /api/events shows orchestration events

Performance Requirements:

Container scheduling completes in under 2 seconds
Health check failures detected within 10 seconds
System supports 100 containers across 10 nodes
Continues operating when 1 node fails

Quality Requirements:

Unit tests cover scheduler, health checker, and registry
Integration tests deploy real containers
Code follows Go best practices and passes go vet
README includes setup instructions and API documentation

Usage Examples

Deploy a Service

 1# Deploy API server with 3 replicas
 2curl -X POST http://localhost:8080/api/deploy \
 3  -H "Content-Type: application/json" \
 4  -d '{
 5    "name": "api-server",
 6    "image": "myapp:latest",
 7    "replicas": 3,
 8    "ports": [{"container_port": 8080, "host_port": 8080, "protocol": "tcp"}],
 9    "resources": {"cpu_shares": 1024, "memory_mb": 512},
10    "health_check": {
11      "type": "http",
12      "endpoint": "http://localhost:8080/health",
13      "interval": "10s",
14      "timeout": "2s",
15      "retries": 3
16    }
17  }'

List Services

1curl http://localhost:8080/api/services

Response:

 1[
 2  {
 3    "name": "api-server",
 4    "image": "myapp:latest",
 5    "replicas": 3,
 6    "containers": [
 7      {
 8        "id": "abc123",
 9        "name": "api-server-0",
10        "status": "running",
11        "health": "healthy",
12        "node_id": "node-1"
13      }
14    ]
15  }
16]

Scale a Service

1curl -X POST http://localhost:8080/api/services/api-server/scale \
2  -H "Content-Type: application/json" \
3  -d '{"replicas": 5}'

View Logs

1curl http://localhost:8080/api/services/api-server/logs

Monitor Events

1curl http://localhost:8080/api/events

Response:

1[
2  {
3    "timestamp": "2024-01-15T10:23:45Z",
4    "type": "health_check",
5    "service": "api-server",
6    "container": "api-server-2",
7    "message": "Container failed health check, restarting..."
8  }
9]

Key Takeaways

After completing this project, you will have gained:

Container Orchestration Skills:

Understanding of how container orchestrators work internally
Experience with Docker API and container lifecycle management
Knowledge of scheduling algorithms and resource allocation
Insight into health checking and automated recovery strategies

Distributed Systems Concepts:

Multi-node coordination and service discovery
Fault tolerance and failure handling
State management in distributed systems
Event-driven architecture for monitoring

Production Engineering:

Building API control planes for infrastructure tools
Implementing graceful shutdowns and signal handling
Designing for testability with integration tests
Creating maintainable Go project structures

Practical Skills:

Using Docker SDK for Go
Building REST APIs with Gorilla Mux
Concurrent programming with goroutines and channels
Working with SQLite for state persistence

Next Steps

Extend the Project

Rolling Updates: Implement zero-downtime deployments with gradual rollout
Volume Management: Add persistent storage support for stateful services
Web Dashboard: Build a React/Vue UI for visual monitoring
Metrics Export: Add Prometheus metrics for observability
Multi-Node Support: Implement agent nodes running on separate machines

Explore Advanced Features

Container Networking: Implement custom overlay networks for isolation
Load Balancing: Add service-level load balancing across replicas
Auto-Scaling: Automatically adjust replicas based on CPU/memory usage
Config Management: Support ConfigMaps and Secrets like Kubernetes
Log Aggregation: Stream logs from all containers to centralized storage

Study Kubernetes internals and control plane architecture
Explore service mesh technologies
Learn about container networking
Investigate distributed consensus
Practice with production orchestrators

Download Complete Solution

📦 Download Complete Solution

Get the full implementation with detailed README, setup instructions, and deployment guides:

⬇️ Download Solution

Includes: Complete source code, Docker integration, REST API implementation, health checking system, comprehensive tests, Dockerfile, Makefile, docker-compose.yml, and detailed README with architecture documentation and implementation guide.