Project: Container Orchestrator
Problem Statement
You're building a lightweight container orchestration system for a small DevOps team. While Kubernetes is powerful, it's too complex for your team's simple microservice deployment needs. You need a mini-orchestrator that can:
- Manage container lifecycle
- Schedule containers across multiple nodes
- Perform health checks and auto-restart failed containers
- Handle basic networking between containers
- Provide a simple API for deployment management
Real-World Scenario:
Your team runs 20 microservices across 5 servers. Containers occasionally crash, and manual restarts are time-consuming. You need automated health monitoring and recovery.
1# Deploy a service
2$ orchestrator deploy --name api-server --image myapp:latest --replicas 3
3
4# Check status
5$ orchestrator ps
6NAME STATUS REPLICAS HEALTH
7api-server running 3/3 healthy
8worker running 2/2 healthy
9
10# Scale a service
11$ orchestrator scale api-server --replicas 5
12
13# View logs
14$ orchestrator logs api-server
15
16# Health check automatic restart
17$ orchestrator events
18[2024-01-15 10:23:45] Container api-server-2 failed health check
19[2024-01-15 10:23:46] Restarting api-server-2...
20[2024-01-15 10:23:50] Container api-server-2 healthy
Requirements
Functional Requirements
Must Have:
- ✅ Container lifecycle management
- ✅ Multi-node scheduling with resource awareness
- ✅ Health check monitoring
- ✅ Automatic restart of failed containers
- ✅ Service discovery and DNS resolution
- ✅ Basic load balancing across replicas
Should Have:
- ✅ Rolling updates with zero downtime
- ✅ Resource limits
- ✅ Log aggregation from containers
- ✅ CLI tool for management
- ✅ REST API for programmatic access
Nice to Have:
- Container image caching
- Volume management
- Network isolation with VLANs
- Metrics collection
Non-Functional Requirements
- Performance: Schedule containers in < 2 seconds
- Reliability: Detect failures within 10 seconds
- Scalability: Support up to 100 containers across 10 nodes
- Availability: Continue operating if 1 node fails
- Security: Basic authentication for API access
Constraints
- Technology: Use Docker API for container operations
- Platform: Linux-based
- Networking: Use Docker bridge networks
- Storage: SQLite for state management
- Deployment: Single binary with embedded database
Design Considerations
High-Level Architecture
The orchestrator follows a master-agent architecture with these core components:
Master Node Components:
- Scheduler: Selects optimal nodes for container placement based on resource availability
- Health Checker: Monitors container health via HTTP, TCP, or command-based probes
- Service Registry: Manages service definitions and enables service discovery
- REST API: Provides programmatic access for deployment management
- State Store: Persists orchestrator state using SQLite
Worker Node Components:
- Docker Client: Manages container lifecycle
- Container Runtime: Executes containers using Docker API
- Resource Monitor: Tracks CPU and memory usage
Key Design Principles
- Simplicity First: Focus on essential orchestration features without Kubernetes complexity
- Resource-Aware Scheduling: Select nodes based on available CPU and memory
- Automated Recovery: Detect and restart failed containers automatically
- API-Driven: All operations accessible via REST API for automation
- Extensible: Modular design allows adding features like volume management and networking
Technical Decisions
- Docker API: Leverages Docker SDK for Go instead of building custom container runtime
- SQLite Storage: Lightweight embedded database eliminates external dependencies
- Goroutines for Health Checks: Concurrent health monitoring without blocking
- Bridge Networks: Uses Docker's built-in networking for container communication
- Single Binary Deployment: Static compilation for easy distribution
Acceptance Criteria
The project is considered complete when it meets these criteria:
Core Functionality:
- Deploy services with configurable replica counts
- Schedule containers across multiple registered nodes
- Perform HTTP and TCP health checks with configurable intervals
- Automatically restart containers that fail health checks
- Scale services up and down without downtime
- Retrieve container logs through API
API Requirements:
POST /api/deploycreates new servicesGET /api/serviceslists all deployed servicesPOST /api/services/{name}/scaleadjusts replica countGET /api/services/{name}/logsretrieves container logsGET /api/eventsshows orchestration events
Performance Requirements:
- Container scheduling completes in under 2 seconds
- Health check failures detected within 10 seconds
- System supports 100 containers across 10 nodes
- Continues operating when 1 node fails
Quality Requirements:
- Unit tests cover scheduler, health checker, and registry
- Integration tests deploy real containers
- Code follows Go best practices and passes
go vet - README includes setup instructions and API documentation
Usage Examples
Deploy a Service
1# Deploy API server with 3 replicas
2curl -X POST http://localhost:8080/api/deploy \
3 -H "Content-Type: application/json" \
4 -d '{
5 "name": "api-server",
6 "image": "myapp:latest",
7 "replicas": 3,
8 "ports": [{"container_port": 8080, "host_port": 8080, "protocol": "tcp"}],
9 "resources": {"cpu_shares": 1024, "memory_mb": 512},
10 "health_check": {
11 "type": "http",
12 "endpoint": "http://localhost:8080/health",
13 "interval": "10s",
14 "timeout": "2s",
15 "retries": 3
16 }
17 }'
List Services
1curl http://localhost:8080/api/services
Response:
1[
2 {
3 "name": "api-server",
4 "image": "myapp:latest",
5 "replicas": 3,
6 "containers": [
7 {
8 "id": "abc123",
9 "name": "api-server-0",
10 "status": "running",
11 "health": "healthy",
12 "node_id": "node-1"
13 }
14 ]
15 }
16]
Scale a Service
1curl -X POST http://localhost:8080/api/services/api-server/scale \
2 -H "Content-Type: application/json" \
3 -d '{"replicas": 5}'
View Logs
1curl http://localhost:8080/api/services/api-server/logs
Monitor Events
1curl http://localhost:8080/api/events
Response:
1[
2 {
3 "timestamp": "2024-01-15T10:23:45Z",
4 "type": "health_check",
5 "service": "api-server",
6 "container": "api-server-2",
7 "message": "Container failed health check, restarting..."
8 }
9]
Key Takeaways
After completing this project, you will have gained:
Container Orchestration Skills:
- Understanding of how container orchestrators work internally
- Experience with Docker API and container lifecycle management
- Knowledge of scheduling algorithms and resource allocation
- Insight into health checking and automated recovery strategies
Distributed Systems Concepts:
- Multi-node coordination and service discovery
- Fault tolerance and failure handling
- State management in distributed systems
- Event-driven architecture for monitoring
Production Engineering:
- Building API control planes for infrastructure tools
- Implementing graceful shutdowns and signal handling
- Designing for testability with integration tests
- Creating maintainable Go project structures
Practical Skills:
- Using Docker SDK for Go
- Building REST APIs with Gorilla Mux
- Concurrent programming with goroutines and channels
- Working with SQLite for state persistence
Next Steps
Extend the Project
- Rolling Updates: Implement zero-downtime deployments with gradual rollout
- Volume Management: Add persistent storage support for stateful services
- Web Dashboard: Build a React/Vue UI for visual monitoring
- Metrics Export: Add Prometheus metrics for observability
- Multi-Node Support: Implement agent nodes running on separate machines
Explore Advanced Features
- Container Networking: Implement custom overlay networks for isolation
- Load Balancing: Add service-level load balancing across replicas
- Auto-Scaling: Automatically adjust replicas based on CPU/memory usage
- Config Management: Support ConfigMaps and Secrets like Kubernetes
- Log Aggregation: Stream logs from all containers to centralized storage
Related Learning
- Study Kubernetes internals and control plane architecture
- Explore service mesh technologies
- Learn about container networking
- Investigate distributed consensus
- Practice with production orchestrators
Download Complete Solution
📦 Download Complete Solution
Get the full implementation with detailed README, setup instructions, and deployment guides:
⬇️ Download SolutionIncludes: Complete source code, Docker integration, REST API implementation, health checking system, comprehensive tests, Dockerfile, Makefile, docker-compose.yml, and detailed README with architecture documentation and implementation guide.