Welcome to the Production Engineering section exercises! These 15 exercises synthesize everything you've learned across all four sections, focusing on cloud-native development, microservices architecture, observability, and production-grade testing strategies.
š Background: These exercises build on foundational concepts from The Go Language, Standard Library, and Advanced Topics, while emphasizing production engineering practices covered in this section.
Learning Objectives
By completing these exercises, you will:
- Deploy containerized applications with Docker best practices
- Configure Kubernetes deployments with health checks and autoscaling
- Build gRPC services with Protocol Buffers
- Implement service mesh traffic management
- Add comprehensive observability
- Apply production testing strategies at multiple levels
Exercise 1 - Multi-Stage Dockerfile Optimization
Create an optimized multi-stage Dockerfile for a Go web application that minimizes image size and follows security best practices.
Requirements
- Use multi-stage build with
golang:1.21-alpineas builder - Final image based on
alpine:latestorscratch - Non-root user for running the application
- No build dependencies in final image
- Final image size < 20MB
- Include health check instruction
Click to see solution
1# Build stage
2FROM golang:1.21-alpine AS builder
3
4# Install build dependencies
5RUN apk add --no-cache git ca-certificates
6
7WORKDIR /build
8
9# Copy go mod files
10COPY go.mod go.sum ./
11RUN go mod download
12
13# Copy source code
14COPY . .
15
16# Build static binary
17RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
18 -ldflags='-w -s -extldflags "-static"' \
19 -o /app/server \
20 ./cmd/server
21
22# Final stage
23FROM alpine:latest
24
25# Install ca-certificates for HTTPS
26RUN apk --no-cache add ca-certificates
27
28# Create non-root user
29RUN addgroup -g 1001 appuser && \
30 adduser -D -u 1001 -G appuser appuser
31
32WORKDIR /app
33
34# Copy binary from builder
35COPY --from=builder --chown=appuser:appuser /app/server .
36
37# Switch to non-root user
38USER appuser
39
40# Expose port
41EXPOSE 8080
42
43# Health check
44HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
45 CMD ["/app/server", "healthcheck"]
46
47# Run application
48ENTRYPOINT ["/app/server"]
Explanation
Multi-Stage Benefits:
- Builder stage includes all compilation tools
- Final stage only contains runtime binary
- Reduces image size from ~800MB to ~15MB
Security Hardening:
- Non-root user prevents privilege escalation
- Static binary with no external dependencies
- Minimal attack surface with Alpine base
Production Features:
- Health check for container orchestration
- CA certificates for external HTTPS calls
- Proper file ownership and permissions
Key Takeaways
- Multi-stage builds separate compilation from runtime
- Static binaries eliminate runtime dependencies
- Non-root users are essential for security
Exercise 2 - Kubernetes Deployment with Best Practices
Create a production-ready Kubernetes deployment manifest with health checks, resource limits, and horizontal pod autoscaling.
Requirements
- Deployment with 3 replicas
- Liveness and readiness probes
- Resource requests and limits
- Rolling update strategy
- HorizontalPodAutoscaler targeting 70% CPU
Click to see solution
1apiVersion: apps/v1
2kind: Deployment
3metadata:
4 name: web-app
5 labels:
6 app: web-app
7 version: v1
8spec:
9 replicas: 3
10 strategy:
11 type: RollingUpdate
12 rollingUpdate:
13 maxSurge: 1
14 maxUnavailable: 0
15 selector:
16 matchLabels:
17 app: web-app
18 template:
19 metadata:
20 labels:
21 app: web-app
22 version: v1
23 spec:
24 containers:
25 - name: app
26 image: myapp:v1.0.0
27 ports:
28 - containerPort: 8080
29 name: http
30 env:
31 - name: PORT
32 value: "8080"
33 - name: LOG_LEVEL
34 value: "info"
35 resources:
36 requests:
37 cpu: 100m
38 memory: 128Mi
39 limits:
40 cpu: 500m
41 memory: 512Mi
42 livenessProbe:
43 httpGet:
44 path: /healthz
45 port: 8080
46 initialDelaySeconds: 10
47 periodSeconds: 10
48 timeoutSeconds: 3
49 failureThreshold: 3
50 readinessProbe:
51 httpGet:
52 path: /ready
53 port: 8080
54 initialDelaySeconds: 5
55 periodSeconds: 5
56 timeoutSeconds: 2
57 failureThreshold: 2
58---
59apiVersion: v1
60kind: Service
61metadata:
62 name: web-app
63spec:
64 selector:
65 app: web-app
66 ports:
67 - port: 80
68 targetPort: 8080
69 name: http
70 type: ClusterIP
71---
72apiVersion: autoscaling/v2
73kind: HorizontalPodAutoscaler
74metadata:
75 name: web-app-hpa
76spec:
77 scaleTargetRef:
78 apiVersion: apps/v1
79 kind: Deployment
80 name: web-app
81 minReplicas: 3
82 maxReplicas: 10
83 metrics:
84 - type: Resource
85 resource:
86 name: cpu
87 target:
88 type: Utilization
89 averageUtilization: 70
Explanation
Health Checks:
- Liveness probe restarts unhealthy pods
- Readiness probe removes pods from service until ready
- Different endpoints allow granular health reporting
Resource Management:
- Requests guarantee minimum resources
- Limits prevent resource hogging
- HPA scales based on actual CPU usage
High Availability:
- 3 replicas for redundancy
- Zero downtime with maxUnavailable: 0
- Service load balances across healthy pods
Key Takeaways
- Always define resource requests and limits
- Separate liveness and readiness probes
- HPA enables automatic scaling under load
Exercise 3 - gRPC Service with Protocol Buffers
Implement a User service with gRPC supporting CRUD operations using Protocol Buffers.
Requirements
- Define
.protofile with User message and UserService - Implement GetUser, ListUsers, CreateUser, UpdateUser, DeleteUser
- Use proper error handling with gRPC status codes
- Add server-side streaming for ListUsers
Click to see solution
1// user.proto
2syntax = "proto3";
3
4package user.v1;
5option go_package = "github.com/example/userservice/gen/user/v1";
6
7import "google/protobuf/timestamp.proto";
8import "google/protobuf/empty.proto";
9
10message User {
11 string id = 1;
12 string email = 2;
13 string name = 3;
14 google.protobuf.Timestamp created_at = 4;
15}
16
17message GetUserRequest {
18 string id = 1;
19}
20
21message ListUsersRequest {
22 int32 page_size = 1;
23 string page_token = 2;
24}
25
26message CreateUserRequest {
27 string email = 1;
28 string name = 2;
29}
30
31message UpdateUserRequest {
32 string id = 1;
33 string email = 2;
34 string name = 3;
35}
36
37message DeleteUserRequest {
38 string id = 1;
39}
40
41service UserService {
42 rpc GetUser(GetUserRequest) returns;
43 rpc ListUsers(ListUsersRequest) returns;
44 rpc CreateUser(CreateUserRequest) returns;
45 rpc UpdateUser(UpdateUserRequest) returns;
46 rpc DeleteUser(DeleteUserRequest) returns;
47}
1// server.go
2package main
3
4import (
5 "context"
6 "fmt"
7 "time"
8
9 "google.golang.org/grpc/codes"
10 "google.golang.org/grpc/status"
11 "google.golang.org/protobuf/types/known/emptypb"
12 "google.golang.org/protobuf/types/known/timestamppb"
13
14 pb "github.com/example/userservice/gen/user/v1"
15)
16
17type userServer struct {
18 pb.UnimplementedUserServiceServer
19 users map[string]*pb.User
20}
21
22func GetUser(ctx context.Context, req *pb.GetUserRequest) {
23 if req.Id == "" {
24 return nil, status.Error(codes.InvalidArgument, "user id is required")
25 }
26
27 user, ok := s.users[req.Id]
28 if !ok {
29 return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
30 }
31
32 return user, nil
33}
34
35func ListUsers(req *pb.ListUsersRequest, stream pb.UserService_ListUsersServer) error {
36 for _, user := range s.users {
37 if err := stream.Send(user); err != nil {
38 return status.Error(codes.Internal, "failed to send user")
39 }
40 }
41 return nil
42}
43
44func CreateUser(ctx context.Context, req *pb.CreateUserRequest) {
45 if req.Email == "" || req.Name == "" {
46 return nil, status.Error(codes.InvalidArgument, "email and name are required")
47 }
48
49 user := &pb.User{
50 Id: fmt.Sprintf("user_%d", time.Now().Unix()),
51 Email: req.Email,
52 Name: req.Name,
53 CreatedAt: timestamppb.Now(),
54 }
55
56 s.users[user.Id] = user
57 return user, nil
58}
59
60func UpdateUser(ctx context.Context, req *pb.UpdateUserRequest) {
61 user, ok := s.users[req.Id]
62 if !ok {
63 return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
64 }
65
66 user.Email = req.Email
67 user.Name = req.Name
68
69 return user, nil
70}
71
72func DeleteUser(ctx context.Context, req *pb.DeleteUserRequest) {
73 if _, ok := s.users[req.Id]; !ok {
74 return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
75 }
76
77 delete(s.users, req.Id)
78 return &emptypb.Empty{}, nil
79}
Explanation
Protocol Buffers:
- Strongly typed messages ensure API contracts
- Server-side streaming for efficient list operations
- Timestamps with well-known types
Error Handling:
- gRPC status codes for semantic errors
- InvalidArgument for validation failures
- NotFound for missing resources
Key Takeaways
- gRPC provides type-safe, high-performance RPC
- Server streaming reduces memory for large lists
- Use proper gRPC status codes for errors
Exercise 4 - Istio Traffic Splitting
Configure Istio VirtualService for canary deployment with 90/10 traffic split.
Requirements
- VirtualService routing 90% to v1, 10% to v2
- DestinationRule with subsets for v1 and v2
- HTTP header-based routing for testing v2
Click to see solution
1apiVersion: networking.istio.io/v1beta1
2kind: DestinationRule
3metadata:
4 name: web-app
5spec:
6 host: web-app
7 subsets:
8 - name: v1
9 labels:
10 version: v1
11 - name: v2
12 labels:
13 version: v2
14---
15apiVersion: networking.istio.io/v1beta1
16kind: VirtualService
17metadata:
18 name: web-app
19spec:
20 hosts:
21 - web-app
22 http:
23 - match:
24 - headers:
25 x-version:
26 exact: v2
27 route:
28 - destination:
29 host: web-app
30 subset: v2
31 - route:
32 - destination:
33 host: web-app
34 subset: v1
35 weight: 90
36 - destination:
37 host: web-app
38 subset: v2
39 weight: 10
Explanation
Canary Deployment:
- 90% traffic to stable v1
- 10% traffic to new v2 for gradual rollout
- Header-based routing for internal testing
Progressive Delivery:
- Monitor v2 metrics before increasing traffic
- Rollback by changing weights to 100/0
- Zero downtime deployment strategy
Key Takeaways
- Service mesh enables traffic control without code changes
- Canary deployments reduce risk of new releases
- Header-based routing allows testing before production traffic
Exercise 5 - AWS Lambda Function with Cold Start Optimization
Implement an AWS Lambda function in Go with cold start optimization techniques.
Requirements
- HTTP handler for Lambda
- Global variable initialization outside handler
- Connection pooling for database
- Estimated cold start < 500ms
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "database/sql"
7 "encoding/json"
8 "log"
9 "os"
10
11 "github.com/aws/aws-lambda-go/events"
12 "github.com/aws/aws-lambda-go/lambda"
13 _ "github.com/lib/pq"
14)
15
16// Global variables initialized once during cold start
17var (
18 db *sql.DB
19 logger *log.Logger
20)
21
22// init runs once per container
23func init() {
24 logger = log.New(os.Stdout, "[LAMBDA] ", log.LstdFlags)
25
26 // Initialize database connection pool
27 dsn := os.Getenv("DATABASE_URL")
28 var err error
29 db, err = sql.Open("postgres", dsn)
30 if err != nil {
31 logger.Fatalf("failed to connect to database: %v", err)
32 }
33
34 // Configure connection pool for Lambda
35 db.SetMaxOpenConns(10)
36 db.SetMaxIdleConns(5)
37 db.SetConnMaxLifetime(0) // Reuse connections
38
39 logger.Println("initialized database connection pool")
40}
41
42type User struct {
43 ID string `json:"id"`
44 Email string `json:"email"`
45 Name string `json:"name"`
46}
47
48func handler(ctx context.Context, request events.APIGatewayProxyRequest) {
49 // Handler logic runs on every invocation
50 switch request.HTTPMethod {
51 case "GET":
52 return getUsers(ctx)
53 case "POST":
54 return createUser(ctx, request.Body)
55 default:
56 return events.APIGatewayProxyResponse{
57 StatusCode: 405,
58 Body: `{"error": "method not allowed"}`,
59 }, nil
60 }
61}
62
63func getUsers(ctx context.Context) {
64 rows, err := db.QueryContext(ctx, "SELECT id, email, name FROM users LIMIT 100")
65 if err != nil {
66 return events.APIGatewayProxyResponse{
67 StatusCode: 500,
68 Body: `{"error": "database query failed"}`,
69 }, err
70 }
71 defer rows.Close()
72
73 var users []User
74 for rows.Next() {
75 var u User
76 if err := rows.Scan(&u.ID, &u.Email, &u.Name); err != nil {
77 continue
78 }
79 users = append(users, u)
80 }
81
82 body, _ := json.Marshal(users)
83 return events.APIGatewayProxyResponse{
84 StatusCode: 200,
85 Headers: map[string]string{"Content-Type": "application/json"},
86 Body: string(body),
87 }, nil
88}
89
90func createUser(ctx context.Context, body string) {
91 var u User
92 if err := json.Unmarshal([]byte(body), &u); err != nil {
93 return events.APIGatewayProxyResponse{
94 StatusCode: 400,
95 Body: `{"error": "invalid request body"}`,
96 }, nil
97 }
98
99 err := db.QueryRowContext(ctx,
100 "INSERT INTO users VALUES RETURNING id",
101 u.Email, u.Name).Scan(&u.ID)
102 if err != nil {
103 return events.APIGatewayProxyResponse{
104 StatusCode: 500,
105 Body: `{"error": "failed to create user"}`,
106 }, err
107 }
108
109 respBody, _ := json.Marshal(u)
110 return events.APIGatewayProxyResponse{
111 StatusCode: 201,
112 Headers: map[string]string{"Content-Type": "application/json"},
113 Body: string(respBody),
114 }, nil
115}
116
117func main() {
118 lambda.Start(handler)
119}
Explanation
Cold Start Optimization:
- Database connection pool initialized in
init() - Logger created globally
- Environment variables read once at startup
Connection Pooling:
- MaxOpenConns limits concurrent connections
- MaxIdleConns keeps connections alive between invocations
- ConnMaxLifetime=0 reuses connections indefinitely
Performance:
- Cold start: ~300-400ms
- Warm invocations: ~10-50ms
Key Takeaways
- Initialize expensive resources in
init()for cold start optimization - Use connection pooling to reuse database connections
- Lambda containers are reused, enabling warm starts
Exercise 6 - Kafka Producer and Consumer
Implement Kafka producer and consumer for event streaming with proper error handling.
Requirements
- Producer sends events with partitioning by user ID
- Consumer processes events with manual commit
- Handle producer errors with retries
- Consumer graceful shutdown on signal
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "encoding/json"
7 "log"
8 "os"
9 "os/signal"
10 "syscall"
11
12 "github.com/segmentio/kafka-go"
13)
14
15type OrderEvent struct {
16 OrderID string `json:"order_id"`
17 UserID string `json:"user_id"`
18 Amount float64 `json:"amount"`
19}
20
21// Producer
22func produceEvents(ctx context.Context) error {
23 writer := kafka.NewWriter(kafka.WriterConfig{
24 Brokers: []string{"localhost:9092"},
25 Topic: "orders",
26 Balancer: &kafka.Hash{}, // Partition by key
27 MaxAttempts: 3,
28 RequiredAcks: kafka.RequireAll,
29 })
30 defer writer.Close()
31
32 event := OrderEvent{
33 OrderID: "order_123",
34 UserID: "user_456",
35 Amount: 99.99,
36 }
37
38 value, _ := json.Marshal(event)
39 err := writer.WriteMessages(ctx, kafka.Message{
40 Key: []byte(event.UserID), // Partition by user ID
41 Value: value,
42 })
43
44 if err != nil {
45 log.Printf("failed to write message: %v", err)
46 return err
47 }
48
49 log.Println("event published successfully")
50 return nil
51}
52
53// Consumer
54func consumeEvents(ctx context.Context) error {
55 reader := kafka.NewReader(kafka.ReaderConfig{
56 Brokers: []string{"localhost:9092"},
57 Topic: "orders",
58 GroupID: "order-processor",
59 MinBytes: 10e3, // 10KB
60 MaxBytes: 10e6, // 10MB
61 })
62 defer reader.Close()
63
64 for {
65 select {
66 case <-ctx.Done():
67 log.Println("shutting down consumer")
68 return ctx.Err()
69 default:
70 msg, err := reader.FetchMessage(ctx)
71 if err != nil {
72 log.Printf("error fetching message: %v", err)
73 continue
74 }
75
76 var event OrderEvent
77 if err := json.Unmarshal(msg.Value, &event); err != nil {
78 log.Printf("failed to unmarshal event: %v", err)
79 reader.CommitMessages(ctx, msg) // Commit to skip bad message
80 continue
81 }
82
83 // Process event
84 log.Printf("processing order %s for user %s: $%.2f",
85 event.OrderID, event.UserID, event.Amount)
86
87 // Manual commit after successful processing
88 if err := reader.CommitMessages(ctx, msg); err != nil {
89 log.Printf("failed to commit message: %v", err)
90 }
91 }
92 }
93}
94
95func main() {
96 ctx, cancel := context.WithCancel(context.Background())
97 defer cancel()
98
99 // Graceful shutdown
100 sigCh := make(chan os.Signal, 1)
101 signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM)
102
103 go func() {
104 <-sigCh
105 log.Println("received shutdown signal")
106 cancel()
107 }()
108
109 if err := consumeEvents(ctx); err != nil {
110 log.Fatalf("consumer error: %v", err)
111 }
112}
Explanation
Producer Configuration:
- Hash balancer partitions by key
- RequireAll ensures durability
- MaxAttempts provides retry logic
Consumer Configuration:
- Consumer group for load balancing
- Manual commit for at-least-once semantics
- Context cancellation for graceful shutdown
Key Takeaways
- Partition by key for ordering guarantees
- Manual commit enables custom error handling
- Graceful shutdown prevents message loss
Exercise 7 - Redis Cache-Aside Pattern
Implement cache-aside pattern with Redis for user lookup optimization.
Requirements
- Check cache before database query
- Set TTL of 5 minutes for cached entries
- Handle cache miss by querying database
- Update cache after database write
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "encoding/json"
7 "fmt"
8 "time"
9
10 "github.com/go-redis/redis/v9"
11)
12
13type User struct {
14 ID string `json:"id"`
15 Email string `json:"email"`
16 Name string `json:"name"`
17}
18
19type UserService struct {
20 redis *redis.Client
21 db UserDB
22}
23
24type UserDB interface {
25 GetUser(ctx context.Context, id string)
26 CreateUser(ctx context.Context, user *User) error
27}
28
29const userCacheTTL = 5 * time.Minute
30
31func GetUser(ctx context.Context, userID string) {
32 cacheKey := fmt.Sprintf("user:%s", userID)
33
34 // 1. Try cache first
35 cached, err := s.redis.Get(ctx, cacheKey).Result()
36 if err == nil {
37 var user User
38 if err := json.Unmarshal([]byte(cached), &user); err == nil {
39 return &user, nil // Cache hit
40 }
41 }
42
43 // 2. Cache miss - query database
44 user, err := s.db.GetUser(ctx, userID)
45 if err != nil {
46 return nil, fmt.Errorf("database query failed: %w", err)
47 }
48
49 // 3. Update cache
50 userData, _ := json.Marshal(user)
51 s.redis.Set(ctx, cacheKey, userData, userCacheTTL)
52
53 return user, nil
54}
55
56func CreateUser(ctx context.Context, user *User) error {
57 // 1. Write to database
58 if err := s.db.CreateUser(ctx, user); err != nil {
59 return fmt.Errorf("database write failed: %w", err)
60 }
61
62 // 2. Invalidate/update cache
63 cacheKey := fmt.Sprintf("user:%s", user.ID)
64 userData, _ := json.Marshal(user)
65 s.redis.Set(ctx, cacheKey, userData, userCacheTTL)
66
67 return nil
68}
69
70func InvalidateUser(ctx context.Context, userID string) error {
71 cacheKey := fmt.Sprintf("user:%s", userID)
72 return s.redis.Del(ctx, cacheKey).Err()
73}
Explanation
Cache-Aside Pattern:
- Check cache on read
- On miss, query database and populate cache
- On write, update database then cache
TTL Strategy:
- 5-minute TTL prevents stale data
- Auto-expiration reduces memory usage
- Manual invalidation for critical updates
Error Handling:
- Cache failures don't break reads
- Write-through ensures consistency
Key Takeaways
- Cache-aside pattern optimizes read-heavy workloads
- TTL balances freshness and performance
- Always update cache after writes
Exercise 8 - Rate Limiter Middleware
Implement token bucket rate limiter middleware for HTTP API.
Requirements
- Token bucket algorithm with Redis
- 100 requests per minute per IP
- Return 429 status when limit exceeded
- Include rate limit headers in response
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "fmt"
7 "net/http"
8 "strconv"
9 "time"
10
11 "github.com/gin-gonic/gin"
12 "github.com/go-redis/redis/v9"
13)
14
15const (
16 rateLimit = 100 // requests
17 timeWindow = 1 * time.Minute // per minute
18)
19
20type RateLimiter struct {
21 redis *redis.Client
22}
23
24func NewRateLimiter(redis *redis.Client) *RateLimiter {
25 return &RateLimiter{redis: redis}
26}
27
28func Middleware() gin.HandlerFunc {
29 return func(c *gin.Context) {
30 ip := c.ClientIP()
31 key := fmt.Sprintf("rate_limit:%s", ip)
32
33 allowed, remaining, resetTime, err := rl.checkLimit(c.Request.Context(), key)
34 if err != nil {
35 c.JSON(http.StatusInternalServerError, gin.H{"error": "rate limiter error"})
36 c.Abort()
37 return
38 }
39
40 // Set rate limit headers
41 c.Header("X-RateLimit-Limit", strconv.Itoa(rateLimit))
42 c.Header("X-RateLimit-Remaining", strconv.Itoa(remaining))
43 c.Header("X-RateLimit-Reset", strconv.FormatInt(resetTime.Unix(), 10))
44
45 if !allowed {
46 c.Header("Retry-After", strconv.Itoa(int(time.Until(resetTime).Seconds())))
47 c.JSON(http.StatusTooManyRequests, gin.H{
48 "error": "rate limit exceeded",
49 "retry_after": time.Until(resetTime).Seconds(),
50 })
51 c.Abort()
52 return
53 }
54
55 c.Next()
56 }
57}
58
59func checkLimit(ctx context.Context, key string) {
60 now := time.Now()
61 windowStart := now.Add(-timeWindow)
62
63 pipe := rl.redis.Pipeline()
64
65 // Remove old entries
66 pipe.ZRemRangeByScore(ctx, key, "0", strconv.FormatInt(windowStart.UnixNano(), 10))
67
68 // Count current requests
69 countCmd := pipe.ZCard(ctx, key)
70
71 // Add current request
72 pipe.ZAdd(ctx, key, redis.Z{
73 Score: float64(now.UnixNano()),
74 Member: fmt.Sprintf("%d", now.UnixNano()),
75 })
76
77 // Set expiration
78 pipe.Expire(ctx, key, timeWindow)
79
80 _, err := pipe.Exec(ctx)
81 if err != nil {
82 return false, 0, time.Time{}, err
83 }
84
85 count := int(countCmd.Val())
86 remaining := rateLimit - count - 1
87 if remaining < 0 {
88 remaining = 0
89 }
90
91 resetTime := now.Add(timeWindow)
92 allowed := count < rateLimit
93
94 return allowed, remaining, resetTime, nil
95}
96
97func main() {
98 rdb := redis.NewClient(&redis.Options{
99 Addr: "localhost:6379",
100 })
101
102 rateLimiter := NewRateLimiter(rdb)
103
104 r := gin.Default()
105 r.Use(rateLimiter.Middleware())
106
107 r.GET("/api/users", func(c *gin.Context) {
108 c.JSON(200, gin.H{"users": []string{"alice", "bob"}})
109 })
110
111 r.Run(":8080")
112}
Explanation
Token Bucket Algorithm:
- Sorted set stores timestamps in sliding window
- Remove old entries outside window
- Count current requests and compare to limit
Rate Limit Headers:
- X-RateLimit-Limit: Max requests allowed
- X-RateLimit-Remaining: Requests left
- X-RateLimit-Reset: When limit resets
- Retry-After: Seconds to wait
Key Takeaways
- Token bucket provides smooth rate limiting
- Redis sorted sets enable distributed rate limiting
- Standard headers inform clients of limits
Exercise 9 - Circuit Breaker Pattern
Implement circuit breaker for resilient service calls with state management.
Requirements
- Three states: Closed, Open, Half-Open
- Open circuit after 5 consecutive failures
- Half-open state allows test request after 30 seconds
- Return to closed on successful test request
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "errors"
7 "fmt"
8 "sync"
9 "time"
10)
11
12type State int
13
14const (
15 StateClosed State = iota
16 StateOpen
17 StateHalfOpen
18)
19
20type CircuitBreaker struct {
21 maxFailures int
22 timeout time.Duration
23 failures int
24 state State
25 lastFailTime time.Time
26 mu sync.RWMutex
27}
28
29func NewCircuitBreaker(maxFailures int, timeout time.Duration) *CircuitBreaker {
30 return &CircuitBreaker{
31 maxFailures: maxFailures,
32 timeout: timeout,
33 state: StateClosed,
34 }
35}
36
37var ErrCircuitOpen = errors.New("circuit breaker is open")
38
39func Call(ctx context.Context, fn func() error) error {
40 cb.mu.Lock()
41
42 // Check if we can transition from Open to Half-Open
43 if cb.state == StateOpen {
44 if time.Since(cb.lastFailTime) > cb.timeout {
45 cb.state = StateHalfOpen
46 cb.failures = 0
47 } else {
48 cb.mu.Unlock()
49 return ErrCircuitOpen
50 }
51 }
52
53 // Allow only one request in Half-Open state
54 if cb.state == StateHalfOpen && cb.failures > 0 {
55 cb.mu.Unlock()
56 return ErrCircuitOpen
57 }
58
59 cb.mu.Unlock()
60
61 // Execute the function
62 err := fn()
63
64 cb.mu.Lock()
65 defer cb.mu.Unlock()
66
67 if err != nil {
68 cb.failures++
69 cb.lastFailTime = time.Now()
70
71 if cb.failures >= cb.maxFailures {
72 cb.state = StateOpen
73 }
74
75 return fmt.Errorf("call failed: %w", err)
76 }
77
78 // Success - reset circuit
79 if cb.state == StateHalfOpen {
80 cb.state = StateClosed
81 }
82 cb.failures = 0
83
84 return nil
85}
86
87func GetState() State {
88 cb.mu.RLock()
89 defer cb.mu.RUnlock()
90 return cb.state
91}
92
93// Example usage
94func main() {
95 cb := NewCircuitBreaker(5, 30*time.Second)
96
97 // Simulate service call
98 err := cb.Call(context.Background(), func() error {
99 // Call external service
100 return callExternalService()
101 })
102
103 if err != nil {
104 if errors.Is(err, ErrCircuitOpen) {
105 fmt.Println("circuit breaker is open, request rejected")
106 } else {
107 fmt.Printf("service call failed: %v\n", err)
108 }
109 }
110}
111
112func callExternalService() error {
113 // Simulated external service call
114 return nil
115}
Explanation
State Transitions:
- Closed ā Open: After maxFailures consecutive failures
- Open ā Half-Open: After timeout period
- Half-Open ā Closed: On successful test request
- Half-Open ā Open: On failed test request
Concurrency Safety:
- RWMutex protects state and counters
- Atomic state transitions
Benefits:
- Prevents cascading failures
- Gives failing services time to recover
- Fast-fails when service is down
Key Takeaways
- Circuit breakers prevent cascade failures
- State machine manages recovery automatically
- Fast-fail reduces latency during outages
Exercise 10 - Correlation IDs for Distributed Tracing
Add request tracking through distributed system with correlation IDs.
Requirements
- Generate correlation ID for each request
- Propagate ID through middleware
- Include ID in logs and downstream calls
- Add ID to response headers
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "log"
7 "net/http"
8
9 "github.com/gin-gonic/gin"
10 "github.com/google/uuid"
11)
12
13type contextKey string
14
15const correlationIDKey contextKey = "correlation_id"
16
17// Middleware to add correlation ID
18func CorrelationIDMiddleware() gin.HandlerFunc {
19 return func(c *gin.Context) {
20 // Check if correlation ID exists in request header
21 correlationID := c.GetHeader("X-Correlation-ID")
22
23 // Generate new ID if not present
24 if correlationID == "" {
25 correlationID = uuid.New().String()
26 }
27
28 // Add to context
29 ctx := context.WithValue(c.Request.Context(), correlationIDKey, correlationID)
30 c.Request = c.Request.WithContext(ctx)
31
32 // Add to response header
33 c.Header("X-Correlation-ID", correlationID)
34
35 // Log request with correlation ID
36 log.Printf("[%s] %s %s", correlationID, c.Request.Method, c.Request.URL.Path)
37
38 c.Next()
39
40 // Log response
41 log.Printf("[%s] Response: %d", correlationID, c.Writer.Status())
42 }
43}
44
45// Extract correlation ID from context
46func GetCorrelationID(ctx context.Context) string {
47 if id, ok := ctx.Value(correlationIDKey).(string); ok {
48 return id
49 }
50 return ""
51}
52
53// Service layer using correlation ID
54func getUserService(ctx context.Context, userID string) {
55 correlationID := GetCorrelationID(ctx)
56 log.Printf("[%s] Fetching user %s from database", correlationID, userID)
57
58 // Database call would go here
59 user := &User{ID: userID, Name: "Alice"}
60
61 return user, nil
62}
63
64// HTTP client propagating correlation ID
65func callDownstreamService(ctx context.Context, url string) error {
66 correlationID := GetCorrelationID(ctx)
67
68 req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
69 req.Header.Set("X-Correlation-ID", correlationID)
70
71 log.Printf("[%s] Calling downstream service: %s", correlationID, url)
72
73 client := &http.Client{}
74 resp, err := client.Do(req)
75 if err != nil {
76 return err
77 }
78 defer resp.Body.Close()
79
80 log.Printf("[%s] Downstream response: %d", correlationID, resp.StatusCode)
81 return nil
82}
83
84type User struct {
85 ID string `json:"id"`
86 Name string `json:"name"`
87}
88
89func main() {
90 r := gin.Default()
91 r.Use(CorrelationIDMiddleware())
92
93 r.GET("/users/:id", func(c *gin.Context) {
94 userID := c.Param("id")
95
96 user, err := getUserService(c.Request.Context(), userID)
97 if err != nil {
98 c.JSON(500, gin.H{"error": "internal server error"})
99 return
100 }
101
102 // Call downstream service
103 _ = callDownstreamService(c.Request.Context(), "http://notification-service/notify")
104
105 c.JSON(200, user)
106 })
107
108 r.Run(":8080")
109}
Explanation
Correlation ID Flow:
- Middleware generates/extracts correlation ID
- ID stored in request context
- ID propagated to all logs
- ID sent to downstream services via header
Benefits:
- Trace requests across services
- Debug distributed systems easily
- Link logs from multiple services
Key Takeaways
- Correlation IDs enable distributed tracing
- Context propagates IDs through call stack
- Include ID in all logs and downstream calls
Exercise 11 - Prometheus Metrics Instrumentation
Add Prometheus metrics to HTTP handler with custom metrics.
Requirements
- HTTP request counter by method and path
- Request duration histogram
- Active connections gauge
- Custom business metric
Click to see solution
1// run
2package main
3
4import (
5 "time"
6
7 "github.com/gin-gonic/gin"
8 "github.com/prometheus/client_golang/prometheus"
9 "github.com/prometheus/client_golang/prometheus/promauto"
10 "github.com/prometheus/client_golang/prometheus/promhttp"
11)
12
13var (
14 httpRequestsTotal = promauto.NewCounterVec(
15 prometheus.CounterOpts{
16 Name: "http_requests_total",
17 Help: "Total number of HTTP requests",
18 },
19 []string{"method", "path", "status"},
20 )
21
22 httpRequestDuration = promauto.NewHistogramVec(
23 prometheus.HistogramOpts{
24 Name: "http_request_duration_seconds",
25 Help: "HTTP request duration in seconds",
26 Buckets: prometheus.DefBuckets,
27 },
28 []string{"method", "path"},
29 )
30
31 activeConnections = promauto.NewGauge(
32 prometheus.GaugeOpts{
33 Name: "http_active_connections",
34 Help: "Number of active HTTP connections",
35 },
36 )
37
38 ordersCreated = promauto.NewCounter(
39 prometheus.CounterOpts{
40 Name: "orders_created_total",
41 Help: "Total number of orders created",
42 },
43 )
44)
45
46func PrometheusMiddleware() gin.HandlerFunc {
47 return func(c *gin.Context) {
48 start := time.Now()
49
50 activeConnections.Inc()
51 defer activeConnections.Dec()
52
53 c.Next()
54
55 duration := time.Since(start).Seconds()
56 status := c.Writer.Status()
57
58 httpRequestsTotal.WithLabelValues(
59 c.Request.Method,
60 c.FullPath(),
61 fmt.Sprintf("%d", status),
62 ).Inc()
63
64 httpRequestDuration.WithLabelValues(
65 c.Request.Method,
66 c.FullPath(),
67 ).Observe(duration)
68 }
69}
70
71func main() {
72 r := gin.Default()
73 r.Use(PrometheusMiddleware())
74
75 // Metrics endpoint
76 r.GET("/metrics", gin.WrapH(promhttp.Handler()))
77
78 r.POST("/orders", func(c *gin.Context) {
79 // Create order logic
80 ordersCreated.Inc()
81
82 c.JSON(201, gin.H{"status": "created"})
83 })
84
85 r.Run(":8080")
86}
Explanation
Metric Types:
- Counter: Monotonically increasing
- Histogram: Distribution of values
- Gauge: Current value
Labels:
- Enable filtering and aggregation
- Method, path, status for request metrics
Best Practices:
- Use promauto for automatic registration
- Choose appropriate metric type
- Keep cardinality low
Key Takeaways
- Prometheus metrics enable observability
- Use appropriate metric types for data
- Labels allow powerful queries in PromQL
Exercise 12 - OpenTelemetry Distributed Tracing
Add distributed tracing with OpenTelemetry to track requests across services.
Requirements
- Initialize tracer provider with Jaeger exporter
- Create spans for HTTP handlers
- Propagate trace context to downstream calls
- Add custom span attributes
Click to see solution
1// run
2package main
3
4import (
5 "context"
6 "log"
7 "net/http"
8
9 "github.com/gin-gonic/gin"
10 "go.opentelemetry.io/otel"
11 "go.opentelemetry.io/otel/attribute"
12 "go.opentelemetry.io/otel/exporters/jaeger"
13 "go.opentelemetry.io/otel/propagation"
14 "go.opentelemetry.io/otel/sdk/resource"
15 sdktrace "go.opentelemetry.io/otel/sdk/trace"
16 semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
17 "go.opentelemetry.io/otel/trace"
18)
19
20var tracer trace.Tracer
21
22func initTracer() func() {
23 exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
24 jaeger.WithEndpoint("http://localhost:14268/api/traces"),
25 ))
26 if err != nil {
27 log.Fatal(err)
28 }
29
30 tp := sdktrace.NewTracerProvider(
31 sdktrace.WithBatcher(exporter),
32 sdktrace.WithResource(resource.NewWithAttributes(
33 semconv.SchemaURL,
34 semconv.ServiceNameKey.String("user-service"),
35 )),
36 )
37
38 otel.SetTracerProvider(tp)
39 otel.SetTextMapPropagator(propagation.TraceContext{})
40
41 tracer = tp.Tracer("user-service")
42
43 return func() { tp.Shutdown(context.Background()) }
44}
45
46func TracingMiddleware() gin.HandlerFunc {
47 return func(c *gin.Context) {
48 ctx := otel.GetTextMapPropagator().Extract(
49 c.Request.Context(),
50 propagation.HeaderCarrier(c.Request.Header),
51 )
52
53 ctx, span := tracer.Start(ctx, c.Request.URL.Path)
54 defer span.End()
55
56 span.SetAttributes(
57 attribute.String("http.method", c.Request.Method),
58 attribute.String("http.url", c.Request.URL.String()),
59 attribute.String("http.user_agent", c.Request.UserAgent()),
60 )
61
62 c.Request = c.Request.WithContext(ctx)
63 c.Next()
64
65 span.SetAttributes(
66 attribute.Int("http.status_code", c.Writer.Status()),
67 )
68 }
69}
70
71func getUser(ctx context.Context, userID string) error {
72 _, span := tracer.Start(ctx, "getUser")
73 defer span.End()
74
75 span.SetAttributes(attribute.String("user.id", userID))
76
77 // Database query
78 // ...
79
80 return nil
81}
82
83func callNotificationService(ctx context.Context) error {
84 ctx, span := tracer.Start(ctx, "callNotificationService")
85 defer span.End()
86
87 req, _ := http.NewRequestWithContext(ctx, "POST", "http://notification-service/send", nil)
88
89 // Propagate trace context
90 otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
91
92 client := &http.Client{}
93 resp, err := client.Do(req)
94 if err != nil {
95 span.RecordError(err)
96 return err
97 }
98 defer resp.Body.Close()
99
100 span.SetAttributes(attribute.Int("http.status_code", resp.StatusCode))
101 return nil
102}
103
104func main() {
105 shutdown := initTracer()
106 defer shutdown()
107
108 r := gin.Default()
109 r.Use(TracingMiddleware())
110
111 r.GET("/users/:id", func(c *gin.Context) {
112 userID := c.Param("id")
113
114 if err := getUser(c.Request.Context(), userID); err != nil {
115 c.JSON(500, gin.H{"error": "failed to get user"})
116 return
117 }
118
119 _ = callNotificationService(c.Request.Context())
120
121 c.JSON(200, gin.H{"user_id": userID})
122 })
123
124 r.Run(":8080")
125}
Explanation
Distributed Tracing:
- Tracer creates spans for operations
- Context propagation links spans across services
- Jaeger collects and visualizes traces
Span Attributes:
- Add metadata to spans
- Enable filtering and analysis in Jaeger
Error Recording:
- Record errors in spans for debugging
- Mark span as failed
Key Takeaways
- OpenTelemetry provides vendor-neutral tracing
- Context propagation is critical for distributed tracing
- Spans with attributes enable powerful debugging
Exercise 13 - Integration Testing with Testcontainers
Write integration tests using testcontainers for database-backed API.
Requirements
- Spin up PostgreSQL container for tests
- Run migrations before tests
- Test CRUD operations end-to-end
- Clean up containers after tests
Click to see solution
1package main
2
3import (
4 "context"
5 "database/sql"
6 "testing"
7 "time"
8
9 _ "github.com/lib/pq"
10 "github.com/stretchr/testify/assert"
11 "github.com/stretchr/testify/require"
12 "github.com/testcontainers/testcontainers-go"
13 "github.com/testcontainers/testcontainers-go/wait"
14)
15
16func setupTestDB(t *testing.T)) {
17 ctx := context.Background()
18
19 req := testcontainers.ContainerRequest{
20 Image: "postgres:15-alpine",
21 ExposedPorts: []string{"5432/tcp"},
22 Env: map[string]string{
23 "POSTGRES_USER": "test",
24 "POSTGRES_PASSWORD": "test",
25 "POSTGRES_DB": "testdb",
26 },
27 WaitingFor: wait.ForLog("database system is ready to accept connections").
28 WithOccurrence(2).
29 WithStartupTimeout(60 * time.Second),
30 }
31
32 postgres, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
33 ContainerRequest: req,
34 Started: true,
35 })
36 require.NoError(t, err)
37
38 host, _ := postgres.Host(ctx)
39 port, _ := postgres.MappedPort(ctx, "5432")
40
41 dsn := fmt.Sprintf("postgres://test:test@%s:%s/testdb?sslmode=disable", host, port.Port())
42 db, err := sql.Open("postgres", dsn)
43 require.NoError(t, err)
44
45 // Run migrations
46 _, err = db.Exec(`
47 CREATE TABLE users (
48 id SERIAL PRIMARY KEY,
49 email VARCHAR(255) UNIQUE NOT NULL,
50 name VARCHAR(255) NOT NULL,
51 created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
52 )
53 `)
54 require.NoError(t, err)
55
56 cleanup := func() {
57 db.Close()
58 postgres.Terminate(ctx)
59 }
60
61 return db, cleanup
62}
63
64func TestUserCRUD(t *testing.T) {
65 db, cleanup := setupTestDB(t)
66 defer cleanup()
67
68 ctx := context.Background()
69
70 // Test Create
71 var userID int
72 err := db.QueryRowContext(ctx,
73 "INSERT INTO users VALUES RETURNING id",
74 "alice@example.com", "Alice").Scan(&userID)
75 require.NoError(t, err)
76 assert.Greater(t, userID, 0)
77
78 // Test Read
79 var email, name string
80 err = db.QueryRowContext(ctx,
81 "SELECT email, name FROM users WHERE id = $1", userID).Scan(&email, &name)
82 require.NoError(t, err)
83 assert.Equal(t, "alice@example.com", email)
84 assert.Equal(t, "Alice", name)
85
86 // Test Update
87 _, err = db.ExecContext(ctx,
88 "UPDATE users SET name = $1 WHERE id = $2", "Alice Smith", userID)
89 require.NoError(t, err)
90
91 err = db.QueryRowContext(ctx,
92 "SELECT name FROM users WHERE id = $1", userID).Scan(&name)
93 require.NoError(t, err)
94 assert.Equal(t, "Alice Smith", name)
95
96 // Test Delete
97 _, err = db.ExecContext(ctx, "DELETE FROM users WHERE id = $1", userID)
98 require.NoError(t, err)
99
100 err = db.QueryRowContext(ctx,
101 "SELECT id FROM users WHERE id = $1", userID).Scan(&userID)
102 assert.Equal(t, sql.ErrNoRows, err)
103}
Explanation
Testcontainers Benefits:
- Real PostgreSQL instance
- Isolated test environment
- Automatic cleanup
Test Structure:
- Setup: Spin up container, run migrations
- Execute: CRUD operations
- Cleanup: Close DB, terminate container
Wait Strategy:
- Wait for PostgreSQL ready message
- Ensures container is fully initialized
Key Takeaways
- Testcontainers enable realistic integration tests
- Real databases catch bugs mocks can't
- Automatic cleanup prevents resource leaks
Exercise 14 - Load Testing with vegeta
Create load test script using vegeta to stress test HTTP API.
Requirements
- Target endpoint at 100 requests/second
- Duration of 30 seconds
- Generate report with latency percentiles
- Identify performance bottlenecks
Click to see solution
1#!/bin/bash
2# load_test.sh
3
4# vegeta attack parameters
5RATE=100
6DURATION=30s
7TARGET="http://localhost:8080/api/users"
8
9# Create targets file
10cat > targets.txt <<EOF
11GET $TARGET
12Content-Type: application/json
13
14POST $TARGET
15Content-Type: application/json
16@user_payload.json
17EOF
18
19# Create payload
20cat > user_payload.json <<EOF
21{
22 "email": "test@example.com",
23 "name": "Test User"
24}
25EOF
26
27# Run load test
28echo "Starting load test: $RATE req/s for $DURATION"
29vegeta attack \
30 -rate=$RATE \
31 -duration=$DURATION \
32 -targets=targets.txt \
33 | tee results.bin \
34 | vegeta report
35
36# Generate reports
37echo ""
38echo "=== Latency Report ==="
39vegeta report -type=text results.bin
40
41echo ""
42echo "=== Histogram ==="
43vegeta report -type='hist[0,10ms,20ms,50ms,100ms,200ms,500ms,1s]' results.bin
44
45echo ""
46echo "=== JSON Report ==="
47vegeta report -type=json results.bin > report.json
48
49echo ""
50echo "=== Plotting Results ==="
51vegeta plot results.bin > plot.html
52echo "Plot saved to plot.html"
53
54# Cleanup
55rm targets.txt user_payload.json
1// run
2// analyze_results.go - Parse vegeta JSON output
3package main
4
5import (
6 "encoding/json"
7 "fmt"
8 "os"
9)
10
11type VegetaReport struct {
12 Latencies struct {
13 P50 int64 `json:"50th"`
14 P95 int64 `json:"95th"`
15 P99 int64 `json:"99th"`
16 Max int64 `json:"max"`
17 Mean int64 `json:"mean"`
18 } `json:"latencies"`
19 Requests int `json:"requests"`
20 Success float64 `json:"success"`
21 Duration int64 `json:"duration"`
22 Throughput float64 `json:"throughput"`
23 StatusCodes map[string]int `json:"status_codes"`
24}
25
26func main() {
27 data, _ := os.ReadFile("report.json")
28
29 var report VegetaReport
30 json.Unmarshal(data, &report)
31
32 fmt.Println("=== Load Test Analysis ===")
33 fmt.Printf("Total Requests: %d\n", report.Requests)
34 fmt.Printf("Success Rate: %.2f%%\n", report.Success*100)
35 fmt.Printf("Throughput: %.2f req/s\n\n", report.Throughput)
36
37 fmt.Println("Latency Percentiles:")
38 fmt.Printf(" P50: %d ms\n", report.Latencies.P50/1000000)
39 fmt.Printf(" P95: %d ms\n", report.Latencies.P95/1000000)
40 fmt.Printf(" P99: %d ms\n", report.Latencies.P99/1000000)
41 fmt.Printf(" Max: %d ms\n", report.Latencies.Max/1000000)
42
43 fmt.Println("\nStatus Codes:")
44 for code, count := range report.StatusCodes {
45 fmt.Printf(" %s: %d\n", code, count)
46 }
47
48 // Performance evaluation
49 if report.Success < 0.99 {
50 fmt.Println("\nā ļø Warning: Success rate below 99%")
51 }
52
53 if report.Latencies.P95 > 200*1000000 {
54 fmt.Println("ā ļø Warning: P95 latency exceeds 200ms")
55 }
56
57 if report.Latencies.P99 > 500*1000000 {
58 fmt.Println("ā ļø Warning: P99 latency exceeds 500ms")
59 }
60}
Explanation
Load Test Configuration:
- Rate: 100 requests per second
- Duration: 30 seconds
- Multiple HTTP methods
Metrics Analyzed:
- Latency percentiles
- Success rate
- Throughput
- Status code distribution
Performance Thresholds:
- Success rate should be > 99%
- P95 latency < 200ms
- P99 latency < 500ms
Key Takeaways
- Load testing reveals performance under stress
- Monitor latency percentiles, not just averages
- Set SLOs based on P95/P99 latency
Exercise 15 - Chaos Testing
Implement chaos test that injects random failures to verify system resilience.
Requirements
- Randomly fail 10% of requests
- Verify circuit breaker opens after failures
- Test retry logic handles transient errors
- Confirm graceful degradation
Click to see solution
1package main
2
3import (
4 "errors"
5 "math/rand"
6 "net/http"
7 "sync/atomic"
8 "testing"
9 "time"
10
11 "github.com/stretchr/testify/assert"
12)
13
14// Chaos middleware that randomly fails requests
15func ChaosMiddleware(failureRate float64) func(http.Handler) http.Handler {
16 return func(next http.Handler) http.Handler {
17 return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
18 if rand.Float64() < failureRate {
19 w.WriteHeader(http.StatusInternalServerError)
20 w.Write([]byte(`{"error": "chaos monkey struck"}`))
21 return
22 }
23 next.ServeHTTP(w, r)
24 })
25 }
26}
27
28// Resilient client with retries
29type ResilientClient struct {
30 client *http.Client
31 maxRetries int
32 retryDelay time.Duration
33}
34
35func Get(url string) {
36 var lastErr error
37
38 for attempt := 0; attempt <= c.maxRetries; attempt++ {
39 resp, err := c.client.Get(url)
40
41 if err == nil && resp.StatusCode < 500 {
42 return resp, nil
43 }
44
45 if resp != nil {
46 resp.Body.Close()
47 }
48
49 lastErr = err
50 if lastErr == nil {
51 lastErr = errors.New("server error")
52 }
53
54 if attempt < c.maxRetries {
55 time.Sleep(c.retryDelay * time.Duration(attempt+1))
56 }
57 }
58
59 return nil, lastErr
60}
61
62func TestChaosResilience(t *testing.T) {
63 // Test circuit breaker under chaos
64 cb := NewCircuitBreaker(5, 10*time.Second)
65
66 var successCount, failureCount int32
67
68 chaosFunc := func() error {
69 if rand.Float64() < 0.3 { // 30% failure rate
70 atomic.AddInt32(&failureCount, 1)
71 return errors.New("chaos failure")
72 }
73 atomic.AddInt32(&successCount, 1)
74 return nil
75 }
76
77 // Execute many requests
78 for i := 0; i < 100; i++ {
79 err := cb.Call(context.Background(), chaosFunc)
80
81 // Circuit should open after consecutive failures
82 if err == ErrCircuitOpen {
83 t.Logf("Circuit opened after %d total calls", i+1)
84 break
85 }
86
87 time.Sleep(10 * time.Millisecond)
88 }
89
90 assert.Equal(t, StateOpen, cb.GetState(), "circuit should be open")
91 t.Logf("Successes: %d, Failures: %d", successCount, failureCount)
92}
93
94func TestRetryResilience(t *testing.T) {
95 client := &ResilientClient{
96 client: &http.Client{Timeout: 5 * time.Second},
97 maxRetries: 3,
98 retryDelay: 100 * time.Millisecond,
99 }
100
101 server := httptest.NewServer(ChaosMiddleware(0.5)(
102 http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
103 w.WriteHeader(http.StatusOK)
104 w.Write([]byte(`{"status": "ok"}`))
105 }),
106 ))
107 defer server.Close()
108
109 successCount := 0
110 totalRequests := 50
111
112 for i := 0; i < totalRequests; i++ {
113 resp, err := client.Get(server.URL)
114 if err == nil && resp.StatusCode == 200 {
115 successCount++
116 resp.Body.Close()
117 }
118 }
119
120 successRate := float64(successCount) / float64(totalRequests)
121 t.Logf("Success rate with retries: %.2f%%", successRate*100)
122
123 // With 50% chaos and 3 retries, success rate should be high
124 assert.Greater(t, successRate, 0.85, "retry logic should achieve >85% success")
125}
Explanation
Chaos Testing Principles:
- Inject random failures to test resilience
- Verify error handling and recovery
- Measure system behavior under failure
Chaos Middleware:
- Randomly fails percentage of requests
- Simulates service instability
- Tests downstream resilience
Verification:
- Circuit breaker opens after failures
- Retry logic improves success rate
- System degrades gracefully
Key Takeaways
- Chaos testing validates resilience mechanisms
- Random failures reveal hidden bugs
- Measure success rates under chaos to set SLOs
Comprehensive Key Takeaways
Congratulations on completing all 15 production engineering exercises! You've gained hands-on experience with:
Cloud-Native Development
- Docker multi-stage builds for secure, minimal images
- Kubernetes deployments with health checks and autoscaling
- Service mesh traffic management with Istio
- Serverless functions with cold start optimization
Microservices & Communication
- gRPC services with Protocol Buffers for type-safe APIs
- Event streaming with Kafka for asynchronous communication
- Redis caching strategies for performance optimization
- Rate limiting algorithms for API protection
Observability
- Prometheus metrics for monitoring system health
- OpenTelemetry distributed tracing across services
- Correlation IDs for request tracking
- Structured logging for debugging
Resilience & Testing
- Circuit breaker pattern for preventing cascade failures
- Integration testing with real dependencies
- Load testing to identify performance bottlenecks
- Chaos testing to verify resilience under failure
Production Patterns
- Infrastructure as code with Kubernetes manifests
- Graceful shutdown and health checks
- Resource management and autoscaling
- Security hardening and non-root containers
Next Steps
You've now completed exercises covering all four sections of The Modern Go Tutorial:
- ā The Go Language - Fundamentals and syntax
- ā Standard Library - Essential packages and patterns
- ā Advanced Topics - Generics, reflection, design patterns, performance
- ā Production Engineering - Cloud-native, observability, testing
Continue your learning:
- Build the Section Project: Apply these concepts in the Cloud-Native E-Commerce Platform - a comprehensive microservices system
- Explore Capstone Projects: Tackle expert-level projects in Section 7: Capstone Projects
- Apply to Real Projects: Use these patterns in production systems
- Contribute to Open Source: Practice production engineering in real-world Go projects
You're now equipped with production-ready Go engineering skills. Keep building! š