Section Exercises: Production Engineering Practices

Welcome to the Production Engineering section exercises! These 15 exercises synthesize everything you've learned across all four sections, focusing on cloud-native development, microservices architecture, observability, and production-grade testing strategies.

šŸ“– Background: These exercises build on foundational concepts from The Go Language, Standard Library, and Advanced Topics, while emphasizing production engineering practices covered in this section.

Learning Objectives

By completing these exercises, you will:

  • Deploy containerized applications with Docker best practices
  • Configure Kubernetes deployments with health checks and autoscaling
  • Build gRPC services with Protocol Buffers
  • Implement service mesh traffic management
  • Add comprehensive observability
  • Apply production testing strategies at multiple levels

Exercise 1 - Multi-Stage Dockerfile Optimization

Create an optimized multi-stage Dockerfile for a Go web application that minimizes image size and follows security best practices.

Requirements

  • Use multi-stage build with golang:1.21-alpine as builder
  • Final image based on alpine:latest or scratch
  • Non-root user for running the application
  • No build dependencies in final image
  • Final image size < 20MB
  • Include health check instruction
Click to see solution
 1# Build stage
 2FROM golang:1.21-alpine AS builder
 3
 4# Install build dependencies
 5RUN apk add --no-cache git ca-certificates
 6
 7WORKDIR /build
 8
 9# Copy go mod files
10COPY go.mod go.sum ./
11RUN go mod download
12
13# Copy source code
14COPY . .
15
16# Build static binary
17RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
18    -ldflags='-w -s -extldflags "-static"' \
19    -o /app/server \
20    ./cmd/server
21
22# Final stage
23FROM alpine:latest
24
25# Install ca-certificates for HTTPS
26RUN apk --no-cache add ca-certificates
27
28# Create non-root user
29RUN addgroup -g 1001 appuser && \
30    adduser -D -u 1001 -G appuser appuser
31
32WORKDIR /app
33
34# Copy binary from builder
35COPY --from=builder --chown=appuser:appuser /app/server .
36
37# Switch to non-root user
38USER appuser
39
40# Expose port
41EXPOSE 8080
42
43# Health check
44HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
45    CMD ["/app/server", "healthcheck"]
46
47# Run application
48ENTRYPOINT ["/app/server"]

Explanation

Multi-Stage Benefits:

  • Builder stage includes all compilation tools
  • Final stage only contains runtime binary
  • Reduces image size from ~800MB to ~15MB

Security Hardening:

  • Non-root user prevents privilege escalation
  • Static binary with no external dependencies
  • Minimal attack surface with Alpine base

Production Features:

  • Health check for container orchestration
  • CA certificates for external HTTPS calls
  • Proper file ownership and permissions

Key Takeaways

  • Multi-stage builds separate compilation from runtime
  • Static binaries eliminate runtime dependencies
  • Non-root users are essential for security

Exercise 2 - Kubernetes Deployment with Best Practices

Create a production-ready Kubernetes deployment manifest with health checks, resource limits, and horizontal pod autoscaling.

Requirements

  • Deployment with 3 replicas
  • Liveness and readiness probes
  • Resource requests and limits
  • Rolling update strategy
  • HorizontalPodAutoscaler targeting 70% CPU
Click to see solution
 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: web-app
 5  labels:
 6    app: web-app
 7    version: v1
 8spec:
 9  replicas: 3
10  strategy:
11    type: RollingUpdate
12    rollingUpdate:
13      maxSurge: 1
14      maxUnavailable: 0
15  selector:
16    matchLabels:
17      app: web-app
18  template:
19    metadata:
20      labels:
21        app: web-app
22        version: v1
23    spec:
24      containers:
25      - name: app
26        image: myapp:v1.0.0
27        ports:
28        - containerPort: 8080
29          name: http
30        env:
31        - name: PORT
32          value: "8080"
33        - name: LOG_LEVEL
34          value: "info"
35        resources:
36          requests:
37            cpu: 100m
38            memory: 128Mi
39          limits:
40            cpu: 500m
41            memory: 512Mi
42        livenessProbe:
43          httpGet:
44            path: /healthz
45            port: 8080
46          initialDelaySeconds: 10
47          periodSeconds: 10
48          timeoutSeconds: 3
49          failureThreshold: 3
50        readinessProbe:
51          httpGet:
52            path: /ready
53            port: 8080
54          initialDelaySeconds: 5
55          periodSeconds: 5
56          timeoutSeconds: 2
57          failureThreshold: 2
58---
59apiVersion: v1
60kind: Service
61metadata:
62  name: web-app
63spec:
64  selector:
65    app: web-app
66  ports:
67  - port: 80
68    targetPort: 8080
69    name: http
70  type: ClusterIP
71---
72apiVersion: autoscaling/v2
73kind: HorizontalPodAutoscaler
74metadata:
75  name: web-app-hpa
76spec:
77  scaleTargetRef:
78    apiVersion: apps/v1
79    kind: Deployment
80    name: web-app
81  minReplicas: 3
82  maxReplicas: 10
83  metrics:
84  - type: Resource
85    resource:
86      name: cpu
87      target:
88        type: Utilization
89        averageUtilization: 70

Explanation

Health Checks:

  • Liveness probe restarts unhealthy pods
  • Readiness probe removes pods from service until ready
  • Different endpoints allow granular health reporting

Resource Management:

  • Requests guarantee minimum resources
  • Limits prevent resource hogging
  • HPA scales based on actual CPU usage

High Availability:

  • 3 replicas for redundancy
  • Zero downtime with maxUnavailable: 0
  • Service load balances across healthy pods

Key Takeaways

  • Always define resource requests and limits
  • Separate liveness and readiness probes
  • HPA enables automatic scaling under load

Exercise 3 - gRPC Service with Protocol Buffers

Implement a User service with gRPC supporting CRUD operations using Protocol Buffers.

Requirements

  • Define .proto file with User message and UserService
  • Implement GetUser, ListUsers, CreateUser, UpdateUser, DeleteUser
  • Use proper error handling with gRPC status codes
  • Add server-side streaming for ListUsers
Click to see solution
 1// user.proto
 2syntax = "proto3";
 3
 4package user.v1;
 5option go_package = "github.com/example/userservice/gen/user/v1";
 6
 7import "google/protobuf/timestamp.proto";
 8import "google/protobuf/empty.proto";
 9
10message User {
11  string id = 1;
12  string email = 2;
13  string name = 3;
14  google.protobuf.Timestamp created_at = 4;
15}
16
17message GetUserRequest {
18  string id = 1;
19}
20
21message ListUsersRequest {
22  int32 page_size = 1;
23  string page_token = 2;
24}
25
26message CreateUserRequest {
27  string email = 1;
28  string name = 2;
29}
30
31message UpdateUserRequest {
32  string id = 1;
33  string email = 2;
34  string name = 3;
35}
36
37message DeleteUserRequest {
38  string id = 1;
39}
40
41service UserService {
42  rpc GetUser(GetUserRequest) returns;
43  rpc ListUsers(ListUsersRequest) returns;
44  rpc CreateUser(CreateUserRequest) returns;
45  rpc UpdateUser(UpdateUserRequest) returns;
46  rpc DeleteUser(DeleteUserRequest) returns;
47}
 1// server.go
 2package main
 3
 4import (
 5	"context"
 6	"fmt"
 7	"time"
 8
 9	"google.golang.org/grpc/codes"
10	"google.golang.org/grpc/status"
11	"google.golang.org/protobuf/types/known/emptypb"
12	"google.golang.org/protobuf/types/known/timestamppb"
13
14	pb "github.com/example/userservice/gen/user/v1"
15)
16
17type userServer struct {
18	pb.UnimplementedUserServiceServer
19	users map[string]*pb.User
20}
21
22func GetUser(ctx context.Context, req *pb.GetUserRequest) {
23	if req.Id == "" {
24		return nil, status.Error(codes.InvalidArgument, "user id is required")
25	}
26
27	user, ok := s.users[req.Id]
28	if !ok {
29		return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
30	}
31
32	return user, nil
33}
34
35func ListUsers(req *pb.ListUsersRequest, stream pb.UserService_ListUsersServer) error {
36	for _, user := range s.users {
37		if err := stream.Send(user); err != nil {
38			return status.Error(codes.Internal, "failed to send user")
39		}
40	}
41	return nil
42}
43
44func CreateUser(ctx context.Context, req *pb.CreateUserRequest) {
45	if req.Email == "" || req.Name == "" {
46		return nil, status.Error(codes.InvalidArgument, "email and name are required")
47	}
48
49	user := &pb.User{
50		Id:        fmt.Sprintf("user_%d", time.Now().Unix()),
51		Email:     req.Email,
52		Name:      req.Name,
53		CreatedAt: timestamppb.Now(),
54	}
55
56	s.users[user.Id] = user
57	return user, nil
58}
59
60func UpdateUser(ctx context.Context, req *pb.UpdateUserRequest) {
61	user, ok := s.users[req.Id]
62	if !ok {
63		return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
64	}
65
66	user.Email = req.Email
67	user.Name = req.Name
68
69	return user, nil
70}
71
72func DeleteUser(ctx context.Context, req *pb.DeleteUserRequest) {
73	if _, ok := s.users[req.Id]; !ok {
74		return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
75	}
76
77	delete(s.users, req.Id)
78	return &emptypb.Empty{}, nil
79}

Explanation

Protocol Buffers:

  • Strongly typed messages ensure API contracts
  • Server-side streaming for efficient list operations
  • Timestamps with well-known types

Error Handling:

  • gRPC status codes for semantic errors
  • InvalidArgument for validation failures
  • NotFound for missing resources

Key Takeaways

  • gRPC provides type-safe, high-performance RPC
  • Server streaming reduces memory for large lists
  • Use proper gRPC status codes for errors

Exercise 4 - Istio Traffic Splitting

Configure Istio VirtualService for canary deployment with 90/10 traffic split.

Requirements

  • VirtualService routing 90% to v1, 10% to v2
  • DestinationRule with subsets for v1 and v2
  • HTTP header-based routing for testing v2
Click to see solution
 1apiVersion: networking.istio.io/v1beta1
 2kind: DestinationRule
 3metadata:
 4  name: web-app
 5spec:
 6  host: web-app
 7  subsets:
 8  - name: v1
 9    labels:
10      version: v1
11  - name: v2
12    labels:
13      version: v2
14---
15apiVersion: networking.istio.io/v1beta1
16kind: VirtualService
17metadata:
18  name: web-app
19spec:
20  hosts:
21  - web-app
22  http:
23  - match:
24    - headers:
25        x-version:
26          exact: v2
27    route:
28    - destination:
29        host: web-app
30        subset: v2
31  - route:
32    - destination:
33        host: web-app
34        subset: v1
35      weight: 90
36    - destination:
37        host: web-app
38        subset: v2
39      weight: 10

Explanation

Canary Deployment:

  • 90% traffic to stable v1
  • 10% traffic to new v2 for gradual rollout
  • Header-based routing for internal testing

Progressive Delivery:

  • Monitor v2 metrics before increasing traffic
  • Rollback by changing weights to 100/0
  • Zero downtime deployment strategy

Key Takeaways

  • Service mesh enables traffic control without code changes
  • Canary deployments reduce risk of new releases
  • Header-based routing allows testing before production traffic

Exercise 5 - AWS Lambda Function with Cold Start Optimization

Implement an AWS Lambda function in Go with cold start optimization techniques.

Requirements

  • HTTP handler for Lambda
  • Global variable initialization outside handler
  • Connection pooling for database
  • Estimated cold start < 500ms
Click to see solution
  1// run
  2package main
  3
  4import (
  5	"context"
  6	"database/sql"
  7	"encoding/json"
  8	"log"
  9	"os"
 10
 11	"github.com/aws/aws-lambda-go/events"
 12	"github.com/aws/aws-lambda-go/lambda"
 13	_ "github.com/lib/pq"
 14)
 15
 16// Global variables initialized once during cold start
 17var (
 18	db     *sql.DB
 19	logger *log.Logger
 20)
 21
 22// init runs once per container
 23func init() {
 24	logger = log.New(os.Stdout, "[LAMBDA] ", log.LstdFlags)
 25
 26	// Initialize database connection pool
 27	dsn := os.Getenv("DATABASE_URL")
 28	var err error
 29	db, err = sql.Open("postgres", dsn)
 30	if err != nil {
 31		logger.Fatalf("failed to connect to database: %v", err)
 32	}
 33
 34	// Configure connection pool for Lambda
 35	db.SetMaxOpenConns(10)
 36	db.SetMaxIdleConns(5)
 37	db.SetConnMaxLifetime(0) // Reuse connections
 38
 39	logger.Println("initialized database connection pool")
 40}
 41
 42type User struct {
 43	ID    string `json:"id"`
 44	Email string `json:"email"`
 45	Name  string `json:"name"`
 46}
 47
 48func handler(ctx context.Context, request events.APIGatewayProxyRequest) {
 49	// Handler logic runs on every invocation
 50	switch request.HTTPMethod {
 51	case "GET":
 52		return getUsers(ctx)
 53	case "POST":
 54		return createUser(ctx, request.Body)
 55	default:
 56		return events.APIGatewayProxyResponse{
 57			StatusCode: 405,
 58			Body:       `{"error": "method not allowed"}`,
 59		}, nil
 60	}
 61}
 62
 63func getUsers(ctx context.Context) {
 64	rows, err := db.QueryContext(ctx, "SELECT id, email, name FROM users LIMIT 100")
 65	if err != nil {
 66		return events.APIGatewayProxyResponse{
 67			StatusCode: 500,
 68			Body:       `{"error": "database query failed"}`,
 69		}, err
 70	}
 71	defer rows.Close()
 72
 73	var users []User
 74	for rows.Next() {
 75		var u User
 76		if err := rows.Scan(&u.ID, &u.Email, &u.Name); err != nil {
 77			continue
 78		}
 79		users = append(users, u)
 80	}
 81
 82	body, _ := json.Marshal(users)
 83	return events.APIGatewayProxyResponse{
 84		StatusCode: 200,
 85		Headers:    map[string]string{"Content-Type": "application/json"},
 86		Body:       string(body),
 87	}, nil
 88}
 89
 90func createUser(ctx context.Context, body string) {
 91	var u User
 92	if err := json.Unmarshal([]byte(body), &u); err != nil {
 93		return events.APIGatewayProxyResponse{
 94			StatusCode: 400,
 95			Body:       `{"error": "invalid request body"}`,
 96		}, nil
 97	}
 98
 99	err := db.QueryRowContext(ctx,
100		"INSERT INTO users VALUES RETURNING id",
101		u.Email, u.Name).Scan(&u.ID)
102	if err != nil {
103		return events.APIGatewayProxyResponse{
104			StatusCode: 500,
105			Body:       `{"error": "failed to create user"}`,
106		}, err
107	}
108
109	respBody, _ := json.Marshal(u)
110	return events.APIGatewayProxyResponse{
111		StatusCode: 201,
112		Headers:    map[string]string{"Content-Type": "application/json"},
113		Body:       string(respBody),
114	}, nil
115}
116
117func main() {
118	lambda.Start(handler)
119}

Explanation

Cold Start Optimization:

  • Database connection pool initialized in init()
  • Logger created globally
  • Environment variables read once at startup

Connection Pooling:

  • MaxOpenConns limits concurrent connections
  • MaxIdleConns keeps connections alive between invocations
  • ConnMaxLifetime=0 reuses connections indefinitely

Performance:

  • Cold start: ~300-400ms
  • Warm invocations: ~10-50ms

Key Takeaways

  • Initialize expensive resources in init() for cold start optimization
  • Use connection pooling to reuse database connections
  • Lambda containers are reused, enabling warm starts

Exercise 6 - Kafka Producer and Consumer

Implement Kafka producer and consumer for event streaming with proper error handling.

Requirements

  • Producer sends events with partitioning by user ID
  • Consumer processes events with manual commit
  • Handle producer errors with retries
  • Consumer graceful shutdown on signal
Click to see solution
  1// run
  2package main
  3
  4import (
  5	"context"
  6	"encoding/json"
  7	"log"
  8	"os"
  9	"os/signal"
 10	"syscall"
 11
 12	"github.com/segmentio/kafka-go"
 13)
 14
 15type OrderEvent struct {
 16	OrderID string  `json:"order_id"`
 17	UserID  string  `json:"user_id"`
 18	Amount  float64 `json:"amount"`
 19}
 20
 21// Producer
 22func produceEvents(ctx context.Context) error {
 23	writer := kafka.NewWriter(kafka.WriterConfig{
 24		Brokers:      []string{"localhost:9092"},
 25		Topic:        "orders",
 26		Balancer:     &kafka.Hash{}, // Partition by key
 27		MaxAttempts:  3,
 28		RequiredAcks: kafka.RequireAll,
 29	})
 30	defer writer.Close()
 31
 32	event := OrderEvent{
 33		OrderID: "order_123",
 34		UserID:  "user_456",
 35		Amount:  99.99,
 36	}
 37
 38	value, _ := json.Marshal(event)
 39	err := writer.WriteMessages(ctx, kafka.Message{
 40		Key:   []byte(event.UserID), // Partition by user ID
 41		Value: value,
 42	})
 43
 44	if err != nil {
 45		log.Printf("failed to write message: %v", err)
 46		return err
 47	}
 48
 49	log.Println("event published successfully")
 50	return nil
 51}
 52
 53// Consumer
 54func consumeEvents(ctx context.Context) error {
 55	reader := kafka.NewReader(kafka.ReaderConfig{
 56		Brokers:  []string{"localhost:9092"},
 57		Topic:    "orders",
 58		GroupID:  "order-processor",
 59		MinBytes: 10e3, // 10KB
 60		MaxBytes: 10e6, // 10MB
 61	})
 62	defer reader.Close()
 63
 64	for {
 65		select {
 66		case <-ctx.Done():
 67			log.Println("shutting down consumer")
 68			return ctx.Err()
 69		default:
 70			msg, err := reader.FetchMessage(ctx)
 71			if err != nil {
 72				log.Printf("error fetching message: %v", err)
 73				continue
 74			}
 75
 76			var event OrderEvent
 77			if err := json.Unmarshal(msg.Value, &event); err != nil {
 78				log.Printf("failed to unmarshal event: %v", err)
 79				reader.CommitMessages(ctx, msg) // Commit to skip bad message
 80				continue
 81			}
 82
 83			// Process event
 84			log.Printf("processing order %s for user %s: $%.2f",
 85				event.OrderID, event.UserID, event.Amount)
 86
 87			// Manual commit after successful processing
 88			if err := reader.CommitMessages(ctx, msg); err != nil {
 89				log.Printf("failed to commit message: %v", err)
 90			}
 91		}
 92	}
 93}
 94
 95func main() {
 96	ctx, cancel := context.WithCancel(context.Background())
 97	defer cancel()
 98
 99	// Graceful shutdown
100	sigCh := make(chan os.Signal, 1)
101	signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM)
102
103	go func() {
104		<-sigCh
105		log.Println("received shutdown signal")
106		cancel()
107	}()
108
109	if err := consumeEvents(ctx); err != nil {
110		log.Fatalf("consumer error: %v", err)
111	}
112}

Explanation

Producer Configuration:

  • Hash balancer partitions by key
  • RequireAll ensures durability
  • MaxAttempts provides retry logic

Consumer Configuration:

  • Consumer group for load balancing
  • Manual commit for at-least-once semantics
  • Context cancellation for graceful shutdown

Key Takeaways

  • Partition by key for ordering guarantees
  • Manual commit enables custom error handling
  • Graceful shutdown prevents message loss

Exercise 7 - Redis Cache-Aside Pattern

Implement cache-aside pattern with Redis for user lookup optimization.

Requirements

  • Check cache before database query
  • Set TTL of 5 minutes for cached entries
  • Handle cache miss by querying database
  • Update cache after database write
Click to see solution
 1// run
 2package main
 3
 4import (
 5	"context"
 6	"encoding/json"
 7	"fmt"
 8	"time"
 9
10	"github.com/go-redis/redis/v9"
11)
12
13type User struct {
14	ID    string `json:"id"`
15	Email string `json:"email"`
16	Name  string `json:"name"`
17}
18
19type UserService struct {
20	redis *redis.Client
21	db    UserDB
22}
23
24type UserDB interface {
25	GetUser(ctx context.Context, id string)
26	CreateUser(ctx context.Context, user *User) error
27}
28
29const userCacheTTL = 5 * time.Minute
30
31func GetUser(ctx context.Context, userID string) {
32	cacheKey := fmt.Sprintf("user:%s", userID)
33
34	// 1. Try cache first
35	cached, err := s.redis.Get(ctx, cacheKey).Result()
36	if err == nil {
37		var user User
38		if err := json.Unmarshal([]byte(cached), &user); err == nil {
39			return &user, nil // Cache hit
40		}
41	}
42
43	// 2. Cache miss - query database
44	user, err := s.db.GetUser(ctx, userID)
45	if err != nil {
46		return nil, fmt.Errorf("database query failed: %w", err)
47	}
48
49	// 3. Update cache
50	userData, _ := json.Marshal(user)
51	s.redis.Set(ctx, cacheKey, userData, userCacheTTL)
52
53	return user, nil
54}
55
56func CreateUser(ctx context.Context, user *User) error {
57	// 1. Write to database
58	if err := s.db.CreateUser(ctx, user); err != nil {
59		return fmt.Errorf("database write failed: %w", err)
60	}
61
62	// 2. Invalidate/update cache
63	cacheKey := fmt.Sprintf("user:%s", user.ID)
64	userData, _ := json.Marshal(user)
65	s.redis.Set(ctx, cacheKey, userData, userCacheTTL)
66
67	return nil
68}
69
70func InvalidateUser(ctx context.Context, userID string) error {
71	cacheKey := fmt.Sprintf("user:%s", userID)
72	return s.redis.Del(ctx, cacheKey).Err()
73}

Explanation

Cache-Aside Pattern:

  1. Check cache on read
  2. On miss, query database and populate cache
  3. On write, update database then cache

TTL Strategy:

  • 5-minute TTL prevents stale data
  • Auto-expiration reduces memory usage
  • Manual invalidation for critical updates

Error Handling:

  • Cache failures don't break reads
  • Write-through ensures consistency

Key Takeaways

  • Cache-aside pattern optimizes read-heavy workloads
  • TTL balances freshness and performance
  • Always update cache after writes

Exercise 8 - Rate Limiter Middleware

Implement token bucket rate limiter middleware for HTTP API.

Requirements

  • Token bucket algorithm with Redis
  • 100 requests per minute per IP
  • Return 429 status when limit exceeded
  • Include rate limit headers in response
Click to see solution
  1// run
  2package main
  3
  4import (
  5	"context"
  6	"fmt"
  7	"net/http"
  8	"strconv"
  9	"time"
 10
 11	"github.com/gin-gonic/gin"
 12	"github.com/go-redis/redis/v9"
 13)
 14
 15const (
 16	rateLimit  = 100              // requests
 17	timeWindow = 1 * time.Minute  // per minute
 18)
 19
 20type RateLimiter struct {
 21	redis *redis.Client
 22}
 23
 24func NewRateLimiter(redis *redis.Client) *RateLimiter {
 25	return &RateLimiter{redis: redis}
 26}
 27
 28func Middleware() gin.HandlerFunc {
 29	return func(c *gin.Context) {
 30		ip := c.ClientIP()
 31		key := fmt.Sprintf("rate_limit:%s", ip)
 32
 33		allowed, remaining, resetTime, err := rl.checkLimit(c.Request.Context(), key)
 34		if err != nil {
 35			c.JSON(http.StatusInternalServerError, gin.H{"error": "rate limiter error"})
 36			c.Abort()
 37			return
 38		}
 39
 40		// Set rate limit headers
 41		c.Header("X-RateLimit-Limit", strconv.Itoa(rateLimit))
 42		c.Header("X-RateLimit-Remaining", strconv.Itoa(remaining))
 43		c.Header("X-RateLimit-Reset", strconv.FormatInt(resetTime.Unix(), 10))
 44
 45		if !allowed {
 46			c.Header("Retry-After", strconv.Itoa(int(time.Until(resetTime).Seconds())))
 47			c.JSON(http.StatusTooManyRequests, gin.H{
 48				"error": "rate limit exceeded",
 49				"retry_after": time.Until(resetTime).Seconds(),
 50			})
 51			c.Abort()
 52			return
 53		}
 54
 55		c.Next()
 56	}
 57}
 58
 59func checkLimit(ctx context.Context, key string) {
 60	now := time.Now()
 61	windowStart := now.Add(-timeWindow)
 62
 63	pipe := rl.redis.Pipeline()
 64
 65	// Remove old entries
 66	pipe.ZRemRangeByScore(ctx, key, "0", strconv.FormatInt(windowStart.UnixNano(), 10))
 67
 68	// Count current requests
 69	countCmd := pipe.ZCard(ctx, key)
 70
 71	// Add current request
 72	pipe.ZAdd(ctx, key, redis.Z{
 73		Score:  float64(now.UnixNano()),
 74		Member: fmt.Sprintf("%d", now.UnixNano()),
 75	})
 76
 77	// Set expiration
 78	pipe.Expire(ctx, key, timeWindow)
 79
 80	_, err := pipe.Exec(ctx)
 81	if err != nil {
 82		return false, 0, time.Time{}, err
 83	}
 84
 85	count := int(countCmd.Val())
 86	remaining := rateLimit - count - 1
 87	if remaining < 0 {
 88		remaining = 0
 89	}
 90
 91	resetTime := now.Add(timeWindow)
 92	allowed := count < rateLimit
 93
 94	return allowed, remaining, resetTime, nil
 95}
 96
 97func main() {
 98	rdb := redis.NewClient(&redis.Options{
 99		Addr: "localhost:6379",
100	})
101
102	rateLimiter := NewRateLimiter(rdb)
103
104	r := gin.Default()
105	r.Use(rateLimiter.Middleware())
106
107	r.GET("/api/users", func(c *gin.Context) {
108		c.JSON(200, gin.H{"users": []string{"alice", "bob"}})
109	})
110
111	r.Run(":8080")
112}

Explanation

Token Bucket Algorithm:

  • Sorted set stores timestamps in sliding window
  • Remove old entries outside window
  • Count current requests and compare to limit

Rate Limit Headers:

  • X-RateLimit-Limit: Max requests allowed
  • X-RateLimit-Remaining: Requests left
  • X-RateLimit-Reset: When limit resets
  • Retry-After: Seconds to wait

Key Takeaways

  • Token bucket provides smooth rate limiting
  • Redis sorted sets enable distributed rate limiting
  • Standard headers inform clients of limits

Exercise 9 - Circuit Breaker Pattern

Implement circuit breaker for resilient service calls with state management.

Requirements

  • Three states: Closed, Open, Half-Open
  • Open circuit after 5 consecutive failures
  • Half-open state allows test request after 30 seconds
  • Return to closed on successful test request
Click to see solution
  1// run
  2package main
  3
  4import (
  5	"context"
  6	"errors"
  7	"fmt"
  8	"sync"
  9	"time"
 10)
 11
 12type State int
 13
 14const (
 15	StateClosed State = iota
 16	StateOpen
 17	StateHalfOpen
 18)
 19
 20type CircuitBreaker struct {
 21	maxFailures    int
 22	timeout        time.Duration
 23	failures       int
 24	state          State
 25	lastFailTime   time.Time
 26	mu             sync.RWMutex
 27}
 28
 29func NewCircuitBreaker(maxFailures int, timeout time.Duration) *CircuitBreaker {
 30	return &CircuitBreaker{
 31		maxFailures: maxFailures,
 32		timeout:     timeout,
 33		state:       StateClosed,
 34	}
 35}
 36
 37var ErrCircuitOpen = errors.New("circuit breaker is open")
 38
 39func Call(ctx context.Context, fn func() error) error {
 40	cb.mu.Lock()
 41
 42	// Check if we can transition from Open to Half-Open
 43	if cb.state == StateOpen {
 44		if time.Since(cb.lastFailTime) > cb.timeout {
 45			cb.state = StateHalfOpen
 46			cb.failures = 0
 47		} else {
 48			cb.mu.Unlock()
 49			return ErrCircuitOpen
 50		}
 51	}
 52
 53	// Allow only one request in Half-Open state
 54	if cb.state == StateHalfOpen && cb.failures > 0 {
 55		cb.mu.Unlock()
 56		return ErrCircuitOpen
 57	}
 58
 59	cb.mu.Unlock()
 60
 61	// Execute the function
 62	err := fn()
 63
 64	cb.mu.Lock()
 65	defer cb.mu.Unlock()
 66
 67	if err != nil {
 68		cb.failures++
 69		cb.lastFailTime = time.Now()
 70
 71		if cb.failures >= cb.maxFailures {
 72			cb.state = StateOpen
 73		}
 74
 75		return fmt.Errorf("call failed: %w", err)
 76	}
 77
 78	// Success - reset circuit
 79	if cb.state == StateHalfOpen {
 80		cb.state = StateClosed
 81	}
 82	cb.failures = 0
 83
 84	return nil
 85}
 86
 87func GetState() State {
 88	cb.mu.RLock()
 89	defer cb.mu.RUnlock()
 90	return cb.state
 91}
 92
 93// Example usage
 94func main() {
 95	cb := NewCircuitBreaker(5, 30*time.Second)
 96
 97	// Simulate service call
 98	err := cb.Call(context.Background(), func() error {
 99		// Call external service
100		return callExternalService()
101	})
102
103	if err != nil {
104		if errors.Is(err, ErrCircuitOpen) {
105			fmt.Println("circuit breaker is open, request rejected")
106		} else {
107			fmt.Printf("service call failed: %v\n", err)
108		}
109	}
110}
111
112func callExternalService() error {
113	// Simulated external service call
114	return nil
115}

Explanation

State Transitions:

  • Closed → Open: After maxFailures consecutive failures
  • Open → Half-Open: After timeout period
  • Half-Open → Closed: On successful test request
  • Half-Open → Open: On failed test request

Concurrency Safety:

  • RWMutex protects state and counters
  • Atomic state transitions

Benefits:

  • Prevents cascading failures
  • Gives failing services time to recover
  • Fast-fails when service is down

Key Takeaways

  • Circuit breakers prevent cascade failures
  • State machine manages recovery automatically
  • Fast-fail reduces latency during outages

Exercise 10 - Correlation IDs for Distributed Tracing

Add request tracking through distributed system with correlation IDs.

Requirements

  • Generate correlation ID for each request
  • Propagate ID through middleware
  • Include ID in logs and downstream calls
  • Add ID to response headers
Click to see solution
  1// run
  2package main
  3
  4import (
  5	"context"
  6	"log"
  7	"net/http"
  8
  9	"github.com/gin-gonic/gin"
 10	"github.com/google/uuid"
 11)
 12
 13type contextKey string
 14
 15const correlationIDKey contextKey = "correlation_id"
 16
 17// Middleware to add correlation ID
 18func CorrelationIDMiddleware() gin.HandlerFunc {
 19	return func(c *gin.Context) {
 20		// Check if correlation ID exists in request header
 21		correlationID := c.GetHeader("X-Correlation-ID")
 22
 23		// Generate new ID if not present
 24		if correlationID == "" {
 25			correlationID = uuid.New().String()
 26		}
 27
 28		// Add to context
 29		ctx := context.WithValue(c.Request.Context(), correlationIDKey, correlationID)
 30		c.Request = c.Request.WithContext(ctx)
 31
 32		// Add to response header
 33		c.Header("X-Correlation-ID", correlationID)
 34
 35		// Log request with correlation ID
 36		log.Printf("[%s] %s %s", correlationID, c.Request.Method, c.Request.URL.Path)
 37
 38		c.Next()
 39
 40		// Log response
 41		log.Printf("[%s] Response: %d", correlationID, c.Writer.Status())
 42	}
 43}
 44
 45// Extract correlation ID from context
 46func GetCorrelationID(ctx context.Context) string {
 47	if id, ok := ctx.Value(correlationIDKey).(string); ok {
 48		return id
 49	}
 50	return ""
 51}
 52
 53// Service layer using correlation ID
 54func getUserService(ctx context.Context, userID string) {
 55	correlationID := GetCorrelationID(ctx)
 56	log.Printf("[%s] Fetching user %s from database", correlationID, userID)
 57
 58	// Database call would go here
 59	user := &User{ID: userID, Name: "Alice"}
 60
 61	return user, nil
 62}
 63
 64// HTTP client propagating correlation ID
 65func callDownstreamService(ctx context.Context, url string) error {
 66	correlationID := GetCorrelationID(ctx)
 67
 68	req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
 69	req.Header.Set("X-Correlation-ID", correlationID)
 70
 71	log.Printf("[%s] Calling downstream service: %s", correlationID, url)
 72
 73	client := &http.Client{}
 74	resp, err := client.Do(req)
 75	if err != nil {
 76		return err
 77	}
 78	defer resp.Body.Close()
 79
 80	log.Printf("[%s] Downstream response: %d", correlationID, resp.StatusCode)
 81	return nil
 82}
 83
 84type User struct {
 85	ID   string `json:"id"`
 86	Name string `json:"name"`
 87}
 88
 89func main() {
 90	r := gin.Default()
 91	r.Use(CorrelationIDMiddleware())
 92
 93	r.GET("/users/:id", func(c *gin.Context) {
 94		userID := c.Param("id")
 95
 96		user, err := getUserService(c.Request.Context(), userID)
 97		if err != nil {
 98			c.JSON(500, gin.H{"error": "internal server error"})
 99			return
100		}
101
102		// Call downstream service
103		_ = callDownstreamService(c.Request.Context(), "http://notification-service/notify")
104
105		c.JSON(200, user)
106	})
107
108	r.Run(":8080")
109}

Explanation

Correlation ID Flow:

  1. Middleware generates/extracts correlation ID
  2. ID stored in request context
  3. ID propagated to all logs
  4. ID sent to downstream services via header

Benefits:

  • Trace requests across services
  • Debug distributed systems easily
  • Link logs from multiple services

Key Takeaways

  • Correlation IDs enable distributed tracing
  • Context propagates IDs through call stack
  • Include ID in all logs and downstream calls

Exercise 11 - Prometheus Metrics Instrumentation

Add Prometheus metrics to HTTP handler with custom metrics.

Requirements

  • HTTP request counter by method and path
  • Request duration histogram
  • Active connections gauge
  • Custom business metric
Click to see solution
 1// run
 2package main
 3
 4import (
 5	"time"
 6
 7	"github.com/gin-gonic/gin"
 8	"github.com/prometheus/client_golang/prometheus"
 9	"github.com/prometheus/client_golang/prometheus/promauto"
10	"github.com/prometheus/client_golang/prometheus/promhttp"
11)
12
13var (
14	httpRequestsTotal = promauto.NewCounterVec(
15		prometheus.CounterOpts{
16			Name: "http_requests_total",
17			Help: "Total number of HTTP requests",
18		},
19		[]string{"method", "path", "status"},
20	)
21
22	httpRequestDuration = promauto.NewHistogramVec(
23		prometheus.HistogramOpts{
24			Name:    "http_request_duration_seconds",
25			Help:    "HTTP request duration in seconds",
26			Buckets: prometheus.DefBuckets,
27		},
28		[]string{"method", "path"},
29	)
30
31	activeConnections = promauto.NewGauge(
32		prometheus.GaugeOpts{
33			Name: "http_active_connections",
34			Help: "Number of active HTTP connections",
35		},
36	)
37
38	ordersCreated = promauto.NewCounter(
39		prometheus.CounterOpts{
40			Name: "orders_created_total",
41			Help: "Total number of orders created",
42		},
43	)
44)
45
46func PrometheusMiddleware() gin.HandlerFunc {
47	return func(c *gin.Context) {
48		start := time.Now()
49
50		activeConnections.Inc()
51		defer activeConnections.Dec()
52
53		c.Next()
54
55		duration := time.Since(start).Seconds()
56		status := c.Writer.Status()
57
58		httpRequestsTotal.WithLabelValues(
59			c.Request.Method,
60			c.FullPath(),
61			fmt.Sprintf("%d", status),
62		).Inc()
63
64		httpRequestDuration.WithLabelValues(
65			c.Request.Method,
66			c.FullPath(),
67		).Observe(duration)
68	}
69}
70
71func main() {
72	r := gin.Default()
73	r.Use(PrometheusMiddleware())
74
75	// Metrics endpoint
76	r.GET("/metrics", gin.WrapH(promhttp.Handler()))
77
78	r.POST("/orders", func(c *gin.Context) {
79		// Create order logic
80		ordersCreated.Inc()
81
82		c.JSON(201, gin.H{"status": "created"})
83	})
84
85	r.Run(":8080")
86}

Explanation

Metric Types:

  • Counter: Monotonically increasing
  • Histogram: Distribution of values
  • Gauge: Current value

Labels:

  • Enable filtering and aggregation
  • Method, path, status for request metrics

Best Practices:

  • Use promauto for automatic registration
  • Choose appropriate metric type
  • Keep cardinality low

Key Takeaways

  • Prometheus metrics enable observability
  • Use appropriate metric types for data
  • Labels allow powerful queries in PromQL

Exercise 12 - OpenTelemetry Distributed Tracing

Add distributed tracing with OpenTelemetry to track requests across services.

Requirements

  • Initialize tracer provider with Jaeger exporter
  • Create spans for HTTP handlers
  • Propagate trace context to downstream calls
  • Add custom span attributes
Click to see solution
  1// run
  2package main
  3
  4import (
  5	"context"
  6	"log"
  7	"net/http"
  8
  9	"github.com/gin-gonic/gin"
 10	"go.opentelemetry.io/otel"
 11	"go.opentelemetry.io/otel/attribute"
 12	"go.opentelemetry.io/otel/exporters/jaeger"
 13	"go.opentelemetry.io/otel/propagation"
 14	"go.opentelemetry.io/otel/sdk/resource"
 15	sdktrace "go.opentelemetry.io/otel/sdk/trace"
 16	semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
 17	"go.opentelemetry.io/otel/trace"
 18)
 19
 20var tracer trace.Tracer
 21
 22func initTracer() func() {
 23	exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
 24		jaeger.WithEndpoint("http://localhost:14268/api/traces"),
 25	))
 26	if err != nil {
 27		log.Fatal(err)
 28	}
 29
 30	tp := sdktrace.NewTracerProvider(
 31		sdktrace.WithBatcher(exporter),
 32		sdktrace.WithResource(resource.NewWithAttributes(
 33			semconv.SchemaURL,
 34			semconv.ServiceNameKey.String("user-service"),
 35		)),
 36	)
 37
 38	otel.SetTracerProvider(tp)
 39	otel.SetTextMapPropagator(propagation.TraceContext{})
 40
 41	tracer = tp.Tracer("user-service")
 42
 43	return func() { tp.Shutdown(context.Background()) }
 44}
 45
 46func TracingMiddleware() gin.HandlerFunc {
 47	return func(c *gin.Context) {
 48		ctx := otel.GetTextMapPropagator().Extract(
 49			c.Request.Context(),
 50			propagation.HeaderCarrier(c.Request.Header),
 51		)
 52
 53		ctx, span := tracer.Start(ctx, c.Request.URL.Path)
 54		defer span.End()
 55
 56		span.SetAttributes(
 57			attribute.String("http.method", c.Request.Method),
 58			attribute.String("http.url", c.Request.URL.String()),
 59			attribute.String("http.user_agent", c.Request.UserAgent()),
 60		)
 61
 62		c.Request = c.Request.WithContext(ctx)
 63		c.Next()
 64
 65		span.SetAttributes(
 66			attribute.Int("http.status_code", c.Writer.Status()),
 67		)
 68	}
 69}
 70
 71func getUser(ctx context.Context, userID string) error {
 72	_, span := tracer.Start(ctx, "getUser")
 73	defer span.End()
 74
 75	span.SetAttributes(attribute.String("user.id", userID))
 76
 77	// Database query
 78	// ...
 79
 80	return nil
 81}
 82
 83func callNotificationService(ctx context.Context) error {
 84	ctx, span := tracer.Start(ctx, "callNotificationService")
 85	defer span.End()
 86
 87	req, _ := http.NewRequestWithContext(ctx, "POST", "http://notification-service/send", nil)
 88
 89	// Propagate trace context
 90	otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
 91
 92	client := &http.Client{}
 93	resp, err := client.Do(req)
 94	if err != nil {
 95		span.RecordError(err)
 96		return err
 97	}
 98	defer resp.Body.Close()
 99
100	span.SetAttributes(attribute.Int("http.status_code", resp.StatusCode))
101	return nil
102}
103
104func main() {
105	shutdown := initTracer()
106	defer shutdown()
107
108	r := gin.Default()
109	r.Use(TracingMiddleware())
110
111	r.GET("/users/:id", func(c *gin.Context) {
112		userID := c.Param("id")
113
114		if err := getUser(c.Request.Context(), userID); err != nil {
115			c.JSON(500, gin.H{"error": "failed to get user"})
116			return
117		}
118
119		_ = callNotificationService(c.Request.Context())
120
121		c.JSON(200, gin.H{"user_id": userID})
122	})
123
124	r.Run(":8080")
125}

Explanation

Distributed Tracing:

  • Tracer creates spans for operations
  • Context propagation links spans across services
  • Jaeger collects and visualizes traces

Span Attributes:

  • Add metadata to spans
  • Enable filtering and analysis in Jaeger

Error Recording:

  • Record errors in spans for debugging
  • Mark span as failed

Key Takeaways

  • OpenTelemetry provides vendor-neutral tracing
  • Context propagation is critical for distributed tracing
  • Spans with attributes enable powerful debugging

Exercise 13 - Integration Testing with Testcontainers

Write integration tests using testcontainers for database-backed API.

Requirements

  • Spin up PostgreSQL container for tests
  • Run migrations before tests
  • Test CRUD operations end-to-end
  • Clean up containers after tests
Click to see solution
  1package main
  2
  3import (
  4	"context"
  5	"database/sql"
  6	"testing"
  7	"time"
  8
  9	_ "github.com/lib/pq"
 10	"github.com/stretchr/testify/assert"
 11	"github.com/stretchr/testify/require"
 12	"github.com/testcontainers/testcontainers-go"
 13	"github.com/testcontainers/testcontainers-go/wait"
 14)
 15
 16func setupTestDB(t *testing.T)) {
 17	ctx := context.Background()
 18
 19	req := testcontainers.ContainerRequest{
 20		Image:        "postgres:15-alpine",
 21		ExposedPorts: []string{"5432/tcp"},
 22		Env: map[string]string{
 23			"POSTGRES_USER":     "test",
 24			"POSTGRES_PASSWORD": "test",
 25			"POSTGRES_DB":       "testdb",
 26		},
 27		WaitingFor: wait.ForLog("database system is ready to accept connections").
 28			WithOccurrence(2).
 29			WithStartupTimeout(60 * time.Second),
 30	}
 31
 32	postgres, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
 33		ContainerRequest: req,
 34		Started:          true,
 35	})
 36	require.NoError(t, err)
 37
 38	host, _ := postgres.Host(ctx)
 39	port, _ := postgres.MappedPort(ctx, "5432")
 40
 41	dsn := fmt.Sprintf("postgres://test:test@%s:%s/testdb?sslmode=disable", host, port.Port())
 42	db, err := sql.Open("postgres", dsn)
 43	require.NoError(t, err)
 44
 45	// Run migrations
 46	_, err = db.Exec(`
 47		CREATE TABLE users (
 48			id SERIAL PRIMARY KEY,
 49			email VARCHAR(255) UNIQUE NOT NULL,
 50			name VARCHAR(255) NOT NULL,
 51			created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
 52		)
 53	`)
 54	require.NoError(t, err)
 55
 56	cleanup := func() {
 57		db.Close()
 58		postgres.Terminate(ctx)
 59	}
 60
 61	return db, cleanup
 62}
 63
 64func TestUserCRUD(t *testing.T) {
 65	db, cleanup := setupTestDB(t)
 66	defer cleanup()
 67
 68	ctx := context.Background()
 69
 70	// Test Create
 71	var userID int
 72	err := db.QueryRowContext(ctx,
 73		"INSERT INTO users VALUES RETURNING id",
 74		"alice@example.com", "Alice").Scan(&userID)
 75	require.NoError(t, err)
 76	assert.Greater(t, userID, 0)
 77
 78	// Test Read
 79	var email, name string
 80	err = db.QueryRowContext(ctx,
 81		"SELECT email, name FROM users WHERE id = $1", userID).Scan(&email, &name)
 82	require.NoError(t, err)
 83	assert.Equal(t, "alice@example.com", email)
 84	assert.Equal(t, "Alice", name)
 85
 86	// Test Update
 87	_, err = db.ExecContext(ctx,
 88		"UPDATE users SET name = $1 WHERE id = $2", "Alice Smith", userID)
 89	require.NoError(t, err)
 90
 91	err = db.QueryRowContext(ctx,
 92		"SELECT name FROM users WHERE id = $1", userID).Scan(&name)
 93	require.NoError(t, err)
 94	assert.Equal(t, "Alice Smith", name)
 95
 96	// Test Delete
 97	_, err = db.ExecContext(ctx, "DELETE FROM users WHERE id = $1", userID)
 98	require.NoError(t, err)
 99
100	err = db.QueryRowContext(ctx,
101		"SELECT id FROM users WHERE id = $1", userID).Scan(&userID)
102	assert.Equal(t, sql.ErrNoRows, err)
103}

Explanation

Testcontainers Benefits:

  • Real PostgreSQL instance
  • Isolated test environment
  • Automatic cleanup

Test Structure:

  • Setup: Spin up container, run migrations
  • Execute: CRUD operations
  • Cleanup: Close DB, terminate container

Wait Strategy:

  • Wait for PostgreSQL ready message
  • Ensures container is fully initialized

Key Takeaways

  • Testcontainers enable realistic integration tests
  • Real databases catch bugs mocks can't
  • Automatic cleanup prevents resource leaks

Exercise 14 - Load Testing with vegeta

Create load test script using vegeta to stress test HTTP API.

Requirements

  • Target endpoint at 100 requests/second
  • Duration of 30 seconds
  • Generate report with latency percentiles
  • Identify performance bottlenecks
Click to see solution
 1#!/bin/bash
 2# load_test.sh
 3
 4# vegeta attack parameters
 5RATE=100
 6DURATION=30s
 7TARGET="http://localhost:8080/api/users"
 8
 9# Create targets file
10cat > targets.txt <<EOF
11GET $TARGET
12Content-Type: application/json
13
14POST $TARGET
15Content-Type: application/json
16@user_payload.json
17EOF
18
19# Create payload
20cat > user_payload.json <<EOF
21{
22  "email": "test@example.com",
23  "name": "Test User"
24}
25EOF
26
27# Run load test
28echo "Starting load test: $RATE req/s for $DURATION"
29vegeta attack \
30  -rate=$RATE \
31  -duration=$DURATION \
32  -targets=targets.txt \
33  | tee results.bin \
34  | vegeta report
35
36# Generate reports
37echo ""
38echo "=== Latency Report ==="
39vegeta report -type=text results.bin
40
41echo ""
42echo "=== Histogram ==="
43vegeta report -type='hist[0,10ms,20ms,50ms,100ms,200ms,500ms,1s]' results.bin
44
45echo ""
46echo "=== JSON Report ==="
47vegeta report -type=json results.bin > report.json
48
49echo ""
50echo "=== Plotting Results ==="
51vegeta plot results.bin > plot.html
52echo "Plot saved to plot.html"
53
54# Cleanup
55rm targets.txt user_payload.json
 1// run
 2// analyze_results.go - Parse vegeta JSON output
 3package main
 4
 5import (
 6	"encoding/json"
 7	"fmt"
 8	"os"
 9)
10
11type VegetaReport struct {
12	Latencies struct {
13		P50   int64 `json:"50th"`
14		P95   int64 `json:"95th"`
15		P99   int64 `json:"99th"`
16		Max   int64 `json:"max"`
17		Mean  int64 `json:"mean"`
18	} `json:"latencies"`
19	Requests    int     `json:"requests"`
20	Success     float64 `json:"success"`
21	Duration    int64   `json:"duration"`
22	Throughput  float64 `json:"throughput"`
23	StatusCodes map[string]int `json:"status_codes"`
24}
25
26func main() {
27	data, _ := os.ReadFile("report.json")
28
29	var report VegetaReport
30	json.Unmarshal(data, &report)
31
32	fmt.Println("=== Load Test Analysis ===")
33	fmt.Printf("Total Requests: %d\n", report.Requests)
34	fmt.Printf("Success Rate: %.2f%%\n", report.Success*100)
35	fmt.Printf("Throughput: %.2f req/s\n\n", report.Throughput)
36
37	fmt.Println("Latency Percentiles:")
38	fmt.Printf("  P50: %d ms\n", report.Latencies.P50/1000000)
39	fmt.Printf("  P95: %d ms\n", report.Latencies.P95/1000000)
40	fmt.Printf("  P99: %d ms\n", report.Latencies.P99/1000000)
41	fmt.Printf("  Max: %d ms\n", report.Latencies.Max/1000000)
42
43	fmt.Println("\nStatus Codes:")
44	for code, count := range report.StatusCodes {
45		fmt.Printf("  %s: %d\n", code, count)
46	}
47
48	// Performance evaluation
49	if report.Success < 0.99 {
50		fmt.Println("\nāš ļø  Warning: Success rate below 99%")
51	}
52
53	if report.Latencies.P95 > 200*1000000 {
54		fmt.Println("āš ļø  Warning: P95 latency exceeds 200ms")
55	}
56
57	if report.Latencies.P99 > 500*1000000 {
58		fmt.Println("āš ļø  Warning: P99 latency exceeds 500ms")
59	}
60}

Explanation

Load Test Configuration:

  • Rate: 100 requests per second
  • Duration: 30 seconds
  • Multiple HTTP methods

Metrics Analyzed:

  • Latency percentiles
  • Success rate
  • Throughput
  • Status code distribution

Performance Thresholds:

  • Success rate should be > 99%
  • P95 latency < 200ms
  • P99 latency < 500ms

Key Takeaways

  • Load testing reveals performance under stress
  • Monitor latency percentiles, not just averages
  • Set SLOs based on P95/P99 latency

Exercise 15 - Chaos Testing

Implement chaos test that injects random failures to verify system resilience.

Requirements

  • Randomly fail 10% of requests
  • Verify circuit breaker opens after failures
  • Test retry logic handles transient errors
  • Confirm graceful degradation
Click to see solution
  1package main
  2
  3import (
  4	"errors"
  5	"math/rand"
  6	"net/http"
  7	"sync/atomic"
  8	"testing"
  9	"time"
 10
 11	"github.com/stretchr/testify/assert"
 12)
 13
 14// Chaos middleware that randomly fails requests
 15func ChaosMiddleware(failureRate float64) func(http.Handler) http.Handler {
 16	return func(next http.Handler) http.Handler {
 17		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 18			if rand.Float64() < failureRate {
 19				w.WriteHeader(http.StatusInternalServerError)
 20				w.Write([]byte(`{"error": "chaos monkey struck"}`))
 21				return
 22			}
 23			next.ServeHTTP(w, r)
 24		})
 25	}
 26}
 27
 28// Resilient client with retries
 29type ResilientClient struct {
 30	client      *http.Client
 31	maxRetries  int
 32	retryDelay  time.Duration
 33}
 34
 35func Get(url string) {
 36	var lastErr error
 37
 38	for attempt := 0; attempt <= c.maxRetries; attempt++ {
 39		resp, err := c.client.Get(url)
 40
 41		if err == nil && resp.StatusCode < 500 {
 42			return resp, nil
 43		}
 44
 45		if resp != nil {
 46			resp.Body.Close()
 47		}
 48
 49		lastErr = err
 50		if lastErr == nil {
 51			lastErr = errors.New("server error")
 52		}
 53
 54		if attempt < c.maxRetries {
 55			time.Sleep(c.retryDelay * time.Duration(attempt+1))
 56		}
 57	}
 58
 59	return nil, lastErr
 60}
 61
 62func TestChaosResilience(t *testing.T) {
 63	// Test circuit breaker under chaos
 64	cb := NewCircuitBreaker(5, 10*time.Second)
 65
 66	var successCount, failureCount int32
 67
 68	chaosFunc := func() error {
 69		if rand.Float64() < 0.3 { // 30% failure rate
 70			atomic.AddInt32(&failureCount, 1)
 71			return errors.New("chaos failure")
 72		}
 73		atomic.AddInt32(&successCount, 1)
 74		return nil
 75	}
 76
 77	// Execute many requests
 78	for i := 0; i < 100; i++ {
 79		err := cb.Call(context.Background(), chaosFunc)
 80
 81		// Circuit should open after consecutive failures
 82		if err == ErrCircuitOpen {
 83			t.Logf("Circuit opened after %d total calls", i+1)
 84			break
 85		}
 86
 87		time.Sleep(10 * time.Millisecond)
 88	}
 89
 90	assert.Equal(t, StateOpen, cb.GetState(), "circuit should be open")
 91	t.Logf("Successes: %d, Failures: %d", successCount, failureCount)
 92}
 93
 94func TestRetryResilience(t *testing.T) {
 95	client := &ResilientClient{
 96		client:     &http.Client{Timeout: 5 * time.Second},
 97		maxRetries: 3,
 98		retryDelay: 100 * time.Millisecond,
 99	}
100
101	server := httptest.NewServer(ChaosMiddleware(0.5)(
102		http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
103			w.WriteHeader(http.StatusOK)
104			w.Write([]byte(`{"status": "ok"}`))
105		}),
106	))
107	defer server.Close()
108
109	successCount := 0
110	totalRequests := 50
111
112	for i := 0; i < totalRequests; i++ {
113		resp, err := client.Get(server.URL)
114		if err == nil && resp.StatusCode == 200 {
115			successCount++
116			resp.Body.Close()
117		}
118	}
119
120	successRate := float64(successCount) / float64(totalRequests)
121	t.Logf("Success rate with retries: %.2f%%", successRate*100)
122
123	// With 50% chaos and 3 retries, success rate should be high
124	assert.Greater(t, successRate, 0.85, "retry logic should achieve >85% success")
125}

Explanation

Chaos Testing Principles:

  • Inject random failures to test resilience
  • Verify error handling and recovery
  • Measure system behavior under failure

Chaos Middleware:

  • Randomly fails percentage of requests
  • Simulates service instability
  • Tests downstream resilience

Verification:

  • Circuit breaker opens after failures
  • Retry logic improves success rate
  • System degrades gracefully

Key Takeaways

  • Chaos testing validates resilience mechanisms
  • Random failures reveal hidden bugs
  • Measure success rates under chaos to set SLOs

Comprehensive Key Takeaways

Congratulations on completing all 15 production engineering exercises! You've gained hands-on experience with:

Cloud-Native Development

  • Docker multi-stage builds for secure, minimal images
  • Kubernetes deployments with health checks and autoscaling
  • Service mesh traffic management with Istio
  • Serverless functions with cold start optimization

Microservices & Communication

  • gRPC services with Protocol Buffers for type-safe APIs
  • Event streaming with Kafka for asynchronous communication
  • Redis caching strategies for performance optimization
  • Rate limiting algorithms for API protection

Observability

  • Prometheus metrics for monitoring system health
  • OpenTelemetry distributed tracing across services
  • Correlation IDs for request tracking
  • Structured logging for debugging

Resilience & Testing

  • Circuit breaker pattern for preventing cascade failures
  • Integration testing with real dependencies
  • Load testing to identify performance bottlenecks
  • Chaos testing to verify resilience under failure

Production Patterns

  • Infrastructure as code with Kubernetes manifests
  • Graceful shutdown and health checks
  • Resource management and autoscaling
  • Security hardening and non-root containers

Next Steps

You've now completed exercises covering all four sections of The Modern Go Tutorial:

  1. āœ… The Go Language - Fundamentals and syntax
  2. āœ… Standard Library - Essential packages and patterns
  3. āœ… Advanced Topics - Generics, reflection, design patterns, performance
  4. āœ… Production Engineering - Cloud-native, observability, testing

Continue your learning:

  • Build the Section Project: Apply these concepts in the Cloud-Native E-Commerce Platform - a comprehensive microservices system
  • Explore Capstone Projects: Tackle expert-level projects in Section 7: Capstone Projects
  • Apply to Real Projects: Use these patterns in production systems
  • Contribute to Open Source: Practice production engineering in real-world Go projects

You're now equipped with production-ready Go engineering skills. Keep building! šŸš€