Section Exercises: Production Engineering Practices

Welcome to the Production Engineering section exercises! These 15 exercises synthesize everything you've learned across all four sections, focusing on cloud-native development, microservices architecture, observability, and production-grade testing strategies.

📖 Background: These exercises build on foundational concepts from The Go Language, Standard Library, and Advanced Topics, while emphasizing production engineering practices covered in this section.

Learning Objectives

By completing these exercises, you will:

Deploy containerized applications with Docker best practices
Configure Kubernetes deployments with health checks and autoscaling
Build gRPC services with Protocol Buffers
Implement service mesh traffic management
Add comprehensive observability
Apply production testing strategies at multiple levels

Exercise 1 - Multi-Stage Dockerfile Optimization

Create an optimized multi-stage Dockerfile for a Go web application that minimizes image size and follows security best practices.

Requirements

Use multi-stage build with golang:1.21-alpine as builder
Final image based on alpine:latest or scratch
Non-root user for running the application
No build dependencies in final image
Final image size < 20MB
Include health check instruction

Click to see solution

 1# Build stage
 2FROM golang:1.21-alpine AS builder
 3
 4# Install build dependencies
 5RUN apk add --no-cache git ca-certificates
 6
 7WORKDIR /build
 8
 9# Copy go mod files
10COPY go.mod go.sum ./
11RUN go mod download
12
13# Copy source code
14COPY . .
15
16# Build static binary
17RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
18    -ldflags='-w -s -extldflags "-static"' \
19    -o /app/server \
20    ./cmd/server
21
22# Final stage
23FROM alpine:latest
24
25# Install ca-certificates for HTTPS
26RUN apk --no-cache add ca-certificates
27
28# Create non-root user
29RUN addgroup -g 1001 appuser && \
30    adduser -D -u 1001 -G appuser appuser
31
32WORKDIR /app
33
34# Copy binary from builder
35COPY --from=builder --chown=appuser:appuser /app/server .
36
37# Switch to non-root user
38USER appuser
39
40# Expose port
41EXPOSE 8080
42
43# Health check
44HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
45    CMD ["/app/server", "healthcheck"]
46
47# Run application
48ENTRYPOINT ["/app/server"]

Explanation

Multi-Stage Benefits:

Builder stage includes all compilation tools
Final stage only contains runtime binary
Reduces image size from ~800MB to ~15MB

Security Hardening:

Non-root user prevents privilege escalation
Static binary with no external dependencies
Minimal attack surface with Alpine base

Production Features:

Health check for container orchestration
CA certificates for external HTTPS calls
Proper file ownership and permissions

Key Takeaways

Multi-stage builds separate compilation from runtime
Static binaries eliminate runtime dependencies
Non-root users are essential for security

Exercise 2 - Kubernetes Deployment with Best Practices

Create a production-ready Kubernetes deployment manifest with health checks, resource limits, and horizontal pod autoscaling.

Requirements

Deployment with 3 replicas
Liveness and readiness probes
Resource requests and limits
Rolling update strategy
HorizontalPodAutoscaler targeting 70% CPU

Click to see solution

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: web-app
 5  labels:
 6    app: web-app
 7    version: v1
 8spec:
 9  replicas: 3
10  strategy:
11    type: RollingUpdate
12    rollingUpdate:
13      maxSurge: 1
14      maxUnavailable: 0
15  selector:
16    matchLabels:
17      app: web-app
18  template:
19    metadata:
20      labels:
21        app: web-app
22        version: v1
23    spec:
24      containers:
25      - name: app
26        image: myapp:v1.0.0
27        ports:
28        - containerPort: 8080
29          name: http
30        env:
31        - name: PORT
32          value: "8080"
33        - name: LOG_LEVEL
34          value: "info"
35        resources:
36          requests:
37            cpu: 100m
38            memory: 128Mi
39          limits:
40            cpu: 500m
41            memory: 512Mi
42        livenessProbe:
43          httpGet:
44            path: /healthz
45            port: 8080
46          initialDelaySeconds: 10
47          periodSeconds: 10
48          timeoutSeconds: 3
49          failureThreshold: 3
50        readinessProbe:
51          httpGet:
52            path: /ready
53            port: 8080
54          initialDelaySeconds: 5
55          periodSeconds: 5
56          timeoutSeconds: 2
57          failureThreshold: 2
58---
59apiVersion: v1
60kind: Service
61metadata:
62  name: web-app
63spec:
64  selector:
65    app: web-app
66  ports:
67  - port: 80
68    targetPort: 8080
69    name: http
70  type: ClusterIP
71---
72apiVersion: autoscaling/v2
73kind: HorizontalPodAutoscaler
74metadata:
75  name: web-app-hpa
76spec:
77  scaleTargetRef:
78    apiVersion: apps/v1
79    kind: Deployment
80    name: web-app
81  minReplicas: 3
82  maxReplicas: 10
83  metrics:
84  - type: Resource
85    resource:
86      name: cpu
87      target:
88        type: Utilization
89        averageUtilization: 70

Explanation

Health Checks:

Liveness probe restarts unhealthy pods
Readiness probe removes pods from service until ready
Different endpoints allow granular health reporting

Resource Management:

Requests guarantee minimum resources
Limits prevent resource hogging
HPA scales based on actual CPU usage

High Availability:

3 replicas for redundancy
Zero downtime with maxUnavailable: 0
Service load balances across healthy pods

Key Takeaways

Always define resource requests and limits
Separate liveness and readiness probes
HPA enables automatic scaling under load

Exercise 3 - gRPC Service with Protocol Buffers

Implement a User service with gRPC supporting CRUD operations using Protocol Buffers.

Requirements

Define .proto file with User message and UserService
Implement GetUser, ListUsers, CreateUser, UpdateUser, DeleteUser
Use proper error handling with gRPC status codes
Add server-side streaming for ListUsers

Click to see solution

 1// user.proto
 2syntax = "proto3";
 3
 4package user.v1;
 5option go_package = "github.com/example/userservice/gen/user/v1";
 6
 7import "google/protobuf/timestamp.proto";
 8import "google/protobuf/empty.proto";
 9
10message User {
11  string id = 1;
12  string email = 2;
13  string name = 3;
14  google.protobuf.Timestamp created_at = 4;
15}
16
17message GetUserRequest {
18  string id = 1;
19}
20
21message ListUsersRequest {
22  int32 page_size = 1;
23  string page_token = 2;
24}
25
26message CreateUserRequest {
27  string email = 1;
28  string name = 2;
29}
30
31message UpdateUserRequest {
32  string id = 1;
33  string email = 2;
34  string name = 3;
35}
36
37message DeleteUserRequest {
38  string id = 1;
39}
40
41service UserService {
42  rpc GetUser(GetUserRequest) returns;
43  rpc ListUsers(ListUsersRequest) returns;
44  rpc CreateUser(CreateUserRequest) returns;
45  rpc UpdateUser(UpdateUserRequest) returns;
46  rpc DeleteUser(DeleteUserRequest) returns;
47}

 1// server.go
 2package main
 3
 4import (
 5	"context"
 6	"fmt"
 7	"time"
 8
 9	"google.golang.org/grpc/codes"
10	"google.golang.org/grpc/status"
11	"google.golang.org/protobuf/types/known/emptypb"
12	"google.golang.org/protobuf/types/known/timestamppb"
13
14	pb "github.com/example/userservice/gen/user/v1"
15)
16
17type userServer struct {
18	pb.UnimplementedUserServiceServer
19	users map[string]*pb.User
20}
21
22func GetUser(ctx context.Context, req *pb.GetUserRequest) {
23	if req.Id == "" {
24		return nil, status.Error(codes.InvalidArgument, "user id is required")
25	}
26
27	user, ok := s.users[req.Id]
28	if !ok {
29		return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
30	}
31
32	return user, nil
33}
34
35func ListUsers(req *pb.ListUsersRequest, stream pb.UserService_ListUsersServer) error {
36	for _, user := range s.users {
37		if err := stream.Send(user); err != nil {
38			return status.Error(codes.Internal, "failed to send user")
39		}
40	}
41	return nil
42}
43
44func CreateUser(ctx context.Context, req *pb.CreateUserRequest) {
45	if req.Email == "" || req.Name == "" {
46		return nil, status.Error(codes.InvalidArgument, "email and name are required")
47	}
48
49	user := &pb.User{
50		Id:        fmt.Sprintf("user_%d", time.Now().Unix()),
51		Email:     req.Email,
52		Name:      req.Name,
53		CreatedAt: timestamppb.Now(),
54	}
55
56	s.users[user.Id] = user
57	return user, nil
58}
59
60func UpdateUser(ctx context.Context, req *pb.UpdateUserRequest) {
61	user, ok := s.users[req.Id]
62	if !ok {
63		return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
64	}
65
66	user.Email = req.Email
67	user.Name = req.Name
68
69	return user, nil
70}
71
72func DeleteUser(ctx context.Context, req *pb.DeleteUserRequest) {
73	if _, ok := s.users[req.Id]; !ok {
74		return nil, status.Errorf(codes.NotFound, "user %s not found", req.Id)
75	}
76
77	delete(s.users, req.Id)
78	return &emptypb.Empty{}, nil
79}

Explanation

Protocol Buffers:

Strongly typed messages ensure API contracts
Server-side streaming for efficient list operations
Timestamps with well-known types

Error Handling:

gRPC status codes for semantic errors
InvalidArgument for validation failures
NotFound for missing resources

Key Takeaways

gRPC provides type-safe, high-performance RPC
Server streaming reduces memory for large lists
Use proper gRPC status codes for errors

Exercise 4 - Istio Traffic Splitting

Configure Istio VirtualService for canary deployment with 90/10 traffic split.

Requirements

VirtualService routing 90% to v1, 10% to v2
DestinationRule with subsets for v1 and v2
HTTP header-based routing for testing v2

Click to see solution

 1apiVersion: networking.istio.io/v1beta1
 2kind: DestinationRule
 3metadata:
 4  name: web-app
 5spec:
 6  host: web-app
 7  subsets:
 8  - name: v1
 9    labels:
10      version: v1
11  - name: v2
12    labels:
13      version: v2
14---
15apiVersion: networking.istio.io/v1beta1
16kind: VirtualService
17metadata:
18  name: web-app
19spec:
20  hosts:
21  - web-app
22  http:
23  - match:
24    - headers:
25        x-version:
26          exact: v2
27    route:
28    - destination:
29        host: web-app
30        subset: v2
31  - route:
32    - destination:
33        host: web-app
34        subset: v1
35      weight: 90
36    - destination:
37        host: web-app
38        subset: v2
39      weight: 10

Explanation

Canary Deployment:

90% traffic to stable v1
10% traffic to new v2 for gradual rollout
Header-based routing for internal testing

Progressive Delivery:

Monitor v2 metrics before increasing traffic
Rollback by changing weights to 100/0
Zero downtime deployment strategy

Key Takeaways

Service mesh enables traffic control without code changes
Canary deployments reduce risk of new releases
Header-based routing allows testing before production traffic

Exercise 5 - AWS Lambda Function with Cold Start Optimization

Implement an AWS Lambda function in Go with cold start optimization techniques.

Requirements

HTTP handler for Lambda
Global variable initialization outside handler
Connection pooling for database
Estimated cold start < 500ms

Click to see solution

  1// run
  2package main
  3
  4import (
  5	"context"
  6	"database/sql"
  7	"encoding/json"
  8	"log"
  9	"os"
 10
 11	"github.com/aws/aws-lambda-go/events"
 12	"github.com/aws/aws-lambda-go/lambda"
 13	_ "github.com/lib/pq"
 14)
 15
 16// Global variables initialized once during cold start
 17var (
 18	db     *sql.DB
 19	logger *log.Logger
 20)
 21
 22// init runs once per container
 23func init() {
 24	logger = log.New(os.Stdout, "[LAMBDA] ", log.LstdFlags)
 25
 26	// Initialize database connection pool
 27	dsn := os.Getenv("DATABASE_URL")
 28	var err error
 29	db, err = sql.Open("postgres", dsn)
 30	if err != nil {
 31		logger.Fatalf("failed to connect to database: %v", err)
 32	}
 33
 34	// Configure connection pool for Lambda
 35	db.SetMaxOpenConns(10)
 36	db.SetMaxIdleConns(5)
 37	db.SetConnMaxLifetime(0) // Reuse connections
 38
 39	logger.Println("initialized database connection pool")
 40}
 41
 42type User struct {
 43	ID    string `json:"id"`
 44	Email string `json:"email"`
 45	Name  string `json:"name"`
 46}
 47
 48func handler(ctx context.Context, request events.APIGatewayProxyRequest) {
 49	// Handler logic runs on every invocation
 50	switch request.HTTPMethod {
 51	case "GET":
 52		return getUsers(ctx)
 53	case "POST":
 54		return createUser(ctx, request.Body)
 55	default:
 56		return events.APIGatewayProxyResponse{
 57			StatusCode: 405,
 58			Body:       `{"error": "method not allowed"}`,
 59		}, nil
 60	}
 61}
 62
 63func getUsers(ctx context.Context) {
 64	rows, err := db.QueryContext(ctx, "SELECT id, email, name FROM users LIMIT 100")
 65	if err != nil {
 66		return events.APIGatewayProxyResponse{
 67			StatusCode: 500,
 68			Body:       `{"error": "database query failed"}`,
 69		}, err
 70	}
 71	defer rows.Close()
 72
 73	var users []User
 74	for rows.Next() {
 75		var u User
 76		if err := rows.Scan(&u.ID, &u.Email, &u.Name); err != nil {
 77			continue
 78		}
 79		users = append(users, u)
 80	}
 81
 82	body, _ := json.Marshal(users)
 83	return events.APIGatewayProxyResponse{
 84		StatusCode: 200,
 85		Headers:    map[string]string{"Content-Type": "application/json"},
 86		Body:       string(body),
 87	}, nil
 88}
 89
 90func createUser(ctx context.Context, body string) {
 91	var u User
 92	if err := json.Unmarshal([]byte(body), &u); err != nil {
 93		return events.APIGatewayProxyResponse{
 94			StatusCode: 400,
 95			Body:       `{"error": "invalid request body"}`,
 96		}, nil
 97	}
 98
 99	err := db.QueryRowContext(ctx,
100		"INSERT INTO users VALUES RETURNING id",
101		u.Email, u.Name).Scan(&u.ID)
102	if err != nil {
103		return events.APIGatewayProxyResponse{
104			StatusCode: 500,
105			Body:       `{"error": "failed to create user"}`,
106		}, err
107	}
108
109	respBody, _ := json.Marshal(u)
110	return events.APIGatewayProxyResponse{
111		StatusCode: 201,
112		Headers:    map[string]string{"Content-Type": "application/json"},
113		Body:       string(respBody),
114	}, nil
115}
116
117func main() {
118	lambda.Start(handler)
119}

Explanation

Cold Start Optimization:

Database connection pool initialized in init()
Logger created globally
Environment variables read once at startup

Connection Pooling:

MaxOpenConns limits concurrent connections
MaxIdleConns keeps connections alive between invocations
ConnMaxLifetime=0 reuses connections indefinitely

Performance:

Cold start: ~300-400ms
Warm invocations: ~10-50ms

Key Takeaways

Initialize expensive resources in init() for cold start optimization
Use connection pooling to reuse database connections
Lambda containers are reused, enabling warm starts

Exercise 6 - Kafka Producer and Consumer

Implement Kafka producer and consumer for event streaming with proper error handling.

Requirements

Producer sends events with partitioning by user ID
Consumer processes events with manual commit
Handle producer errors with retries
Consumer graceful shutdown on signal

Click to see solution

  1// run
  2package main
  3
  4import (
  5	"context"
  6	"encoding/json"
  7	"log"
  8	"os"
  9	"os/signal"
 10	"syscall"
 11
 12	"github.com/segmentio/kafka-go"
 13)
 14
 15type OrderEvent struct {
 16	OrderID string  `json:"order_id"`
 17	UserID  string  `json:"user_id"`
 18	Amount  float64 `json:"amount"`
 19}
 20
 21// Producer
 22func produceEvents(ctx context.Context) error {
 23	writer := kafka.NewWriter(kafka.WriterConfig{
 24		Brokers:      []string{"localhost:9092"},
 25		Topic:        "orders",
 26		Balancer:     &kafka.Hash{}, // Partition by key
 27		MaxAttempts:  3,
 28		RequiredAcks: kafka.RequireAll,
 29	})
 30	defer writer.Close()
 31
 32	event := OrderEvent{
 33		OrderID: "order_123",
 34		UserID:  "user_456",
 35		Amount:  99.99,
 36	}
 37
 38	value, _ := json.Marshal(event)
 39	err := writer.WriteMessages(ctx, kafka.Message{
 40		Key:   []byte(event.UserID), // Partition by user ID
 41		Value: value,
 42	})
 43
 44	if err != nil {
 45		log.Printf("failed to write message: %v", err)
 46		return err
 47	}
 48
 49	log.Println("event published successfully")
 50	return nil
 51}
 52
 53// Consumer
 54func consumeEvents(ctx context.Context) error {
 55	reader := kafka.NewReader(kafka.ReaderConfig{
 56		Brokers:  []string{"localhost:9092"},
 57		Topic:    "orders",
 58		GroupID:  "order-processor",
 59		MinBytes: 10e3, // 10KB
 60		MaxBytes: 10e6, // 10MB
 61	})
 62	defer reader.Close()
 63
 64	for {
 65		select {
 66		case <-ctx.Done():
 67			log.Println("shutting down consumer")
 68			return ctx.Err()
 69		default:
 70			msg, err := reader.FetchMessage(ctx)
 71			if err != nil {
 72				log.Printf("error fetching message: %v", err)
 73				continue
 74			}
 75
 76			var event OrderEvent
 77			if err := json.Unmarshal(msg.Value, &event); err != nil {
 78				log.Printf("failed to unmarshal event: %v", err)
 79				reader.CommitMessages(ctx, msg) // Commit to skip bad message
 80				continue
 81			}
 82
 83			// Process event
 84			log.Printf("processing order %s for user %s: $%.2f",
 85				event.OrderID, event.UserID, event.Amount)
 86
 87			// Manual commit after successful processing
 88			if err := reader.CommitMessages(ctx, msg); err != nil {
 89				log.Printf("failed to commit message: %v", err)
 90			}
 91		}
 92	}
 93}
 94
 95func main() {
 96	ctx, cancel := context.WithCancel(context.Background())
 97	defer cancel()
 98
 99	// Graceful shutdown
100	sigCh := make(chan os.Signal, 1)
101	signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM)
102
103	go func() {
104		<-sigCh
105		log.Println("received shutdown signal")
106		cancel()
107	}()
108
109	if err := consumeEvents(ctx); err != nil {
110		log.Fatalf("consumer error: %v", err)
111	}
112}

Explanation

Producer Configuration:

Hash balancer partitions by key
RequireAll ensures durability
MaxAttempts provides retry logic

Consumer Configuration:

Consumer group for load balancing
Manual commit for at-least-once semantics
Context cancellation for graceful shutdown

Key Takeaways

Partition by key for ordering guarantees
Manual commit enables custom error handling
Graceful shutdown prevents message loss

Exercise 7 - Redis Cache-Aside Pattern

Implement cache-aside pattern with Redis for user lookup optimization.

Requirements

Check cache before database query
Set TTL of 5 minutes for cached entries
Handle cache miss by querying database
Update cache after database write

Click to see solution

 1// run
 2package main
 3
 4import (
 5	"context"
 6	"encoding/json"
 7	"fmt"
 8	"time"
 9
10	"github.com/go-redis/redis/v9"
11)
12
13type User struct {
14	ID    string `json:"id"`
15	Email string `json:"email"`
16	Name  string `json:"name"`
17}
18
19type UserService struct {
20	redis *redis.Client
21	db    UserDB
22}
23
24type UserDB interface {
25	GetUser(ctx context.Context, id string)
26	CreateUser(ctx context.Context, user *User) error
27}
28
29const userCacheTTL = 5 * time.Minute
30
31func GetUser(ctx context.Context, userID string) {
32	cacheKey := fmt.Sprintf("user:%s", userID)
33
34	// 1. Try cache first
35	cached, err := s.redis.Get(ctx, cacheKey).Result()
36	if err == nil {
37		var user User
38		if err := json.Unmarshal([]byte(cached), &user); err == nil {
39			return &user, nil // Cache hit
40		}
41	}
42
43	// 2. Cache miss - query database
44	user, err := s.db.GetUser(ctx, userID)
45	if err != nil {
46		return nil, fmt.Errorf("database query failed: %w", err)
47	}
48
49	// 3. Update cache
50	userData, _ := json.Marshal(user)
51	s.redis.Set(ctx, cacheKey, userData, userCacheTTL)
52
53	return user, nil
54}
55
56func CreateUser(ctx context.Context, user *User) error {
57	// 1. Write to database
58	if err := s.db.CreateUser(ctx, user); err != nil {
59		return fmt.Errorf("database write failed: %w", err)
60	}
61
62	// 2. Invalidate/update cache
63	cacheKey := fmt.Sprintf("user:%s", user.ID)
64	userData, _ := json.Marshal(user)
65	s.redis.Set(ctx, cacheKey, userData, userCacheTTL)
66
67	return nil
68}
69
70func InvalidateUser(ctx context.Context, userID string) error {
71	cacheKey := fmt.Sprintf("user:%s", userID)
72	return s.redis.Del(ctx, cacheKey).Err()
73}

Explanation

Cache-Aside Pattern:

Check cache on read
On miss, query database and populate cache
On write, update database then cache

TTL Strategy:

5-minute TTL prevents stale data
Auto-expiration reduces memory usage
Manual invalidation for critical updates

Error Handling:

Cache failures don't break reads
Write-through ensures consistency

Key Takeaways

Cache-aside pattern optimizes read-heavy workloads
TTL balances freshness and performance
Always update cache after writes

Exercise 8 - Rate Limiter Middleware

Implement token bucket rate limiter middleware for HTTP API.

Requirements

Token bucket algorithm with Redis
100 requests per minute per IP
Return 429 status when limit exceeded
Include rate limit headers in response

Click to see solution

  1// run
  2package main
  3
  4import (
  5	"context"
  6	"fmt"
  7	"net/http"
  8	"strconv"
  9	"time"
 10
 11	"github.com/gin-gonic/gin"
 12	"github.com/go-redis/redis/v9"
 13)
 14
 15const (
 16	rateLimit  = 100              // requests
 17	timeWindow = 1 * time.Minute  // per minute
 18)
 19
 20type RateLimiter struct {
 21	redis *redis.Client
 22}
 23
 24func NewRateLimiter(redis *redis.Client) *RateLimiter {
 25	return &RateLimiter{redis: redis}
 26}
 27
 28func Middleware() gin.HandlerFunc {
 29	return func(c *gin.Context) {
 30		ip := c.ClientIP()
 31		key := fmt.Sprintf("rate_limit:%s", ip)
 32
 33		allowed, remaining, resetTime, err := rl.checkLimit(c.Request.Context(), key)
 34		if err != nil {
 35			c.JSON(http.StatusInternalServerError, gin.H{"error": "rate limiter error"})
 36			c.Abort()
 37			return
 38		}
 39
 40		// Set rate limit headers
 41		c.Header("X-RateLimit-Limit", strconv.Itoa(rateLimit))
 42		c.Header("X-RateLimit-Remaining", strconv.Itoa(remaining))
 43		c.Header("X-RateLimit-Reset", strconv.FormatInt(resetTime.Unix(), 10))
 44
 45		if !allowed {
 46			c.Header("Retry-After", strconv.Itoa(int(time.Until(resetTime).Seconds())))
 47			c.JSON(http.StatusTooManyRequests, gin.H{
 48				"error": "rate limit exceeded",
 49				"retry_after": time.Until(resetTime).Seconds(),
 50			})
 51			c.Abort()
 52			return
 53		}
 54
 55		c.Next()
 56	}
 57}
 58
 59func checkLimit(ctx context.Context, key string) {
 60	now := time.Now()
 61	windowStart := now.Add(-timeWindow)
 62
 63	pipe := rl.redis.Pipeline()
 64
 65	// Remove old entries
 66	pipe.ZRemRangeByScore(ctx, key, "0", strconv.FormatInt(windowStart.UnixNano(), 10))
 67
 68	// Count current requests
 69	countCmd := pipe.ZCard(ctx, key)
 70
 71	// Add current request
 72	pipe.ZAdd(ctx, key, redis.Z{
 73		Score:  float64(now.UnixNano()),
 74		Member: fmt.Sprintf("%d", now.UnixNano()),
 75	})
 76
 77	// Set expiration
 78	pipe.Expire(ctx, key, timeWindow)
 79
 80	_, err := pipe.Exec(ctx)
 81	if err != nil {
 82		return false, 0, time.Time{}, err
 83	}
 84
 85	count := int(countCmd.Val())
 86	remaining := rateLimit - count - 1
 87	if remaining < 0 {
 88		remaining = 0
 89	}
 90
 91	resetTime := now.Add(timeWindow)
 92	allowed := count < rateLimit
 93
 94	return allowed, remaining, resetTime, nil
 95}
 96
 97func main() {
 98	rdb := redis.NewClient(&redis.Options{
 99		Addr: "localhost:6379",
100	})
101
102	rateLimiter := NewRateLimiter(rdb)
103
104	r := gin.Default()
105	r.Use(rateLimiter.Middleware())
106
107	r.GET("/api/users", func(c *gin.Context) {
108		c.JSON(200, gin.H{"users": []string{"alice", "bob"}})
109	})
110
111	r.Run(":8080")
112}

Explanation

Token Bucket Algorithm:

Sorted set stores timestamps in sliding window
Remove old entries outside window
Count current requests and compare to limit

Rate Limit Headers:

X-RateLimit-Limit: Max requests allowed
X-RateLimit-Remaining: Requests left
X-RateLimit-Reset: When limit resets
Retry-After: Seconds to wait

Key Takeaways

Token bucket provides smooth rate limiting
Redis sorted sets enable distributed rate limiting
Standard headers inform clients of limits

Exercise 9 - Circuit Breaker Pattern

Implement circuit breaker for resilient service calls with state management.

Requirements

Three states: Closed, Open, Half-Open
Open circuit after 5 consecutive failures
Half-open state allows test request after 30 seconds
Return to closed on successful test request

Click to see solution

  1// run
  2package main
  3
  4import (
  5	"context"
  6	"errors"
  7	"fmt"
  8	"sync"
  9	"time"
 10)
 11
 12type State int
 13
 14const (
 15	StateClosed State = iota
 16	StateOpen
 17	StateHalfOpen
 18)
 19
 20type CircuitBreaker struct {
 21	maxFailures    int
 22	timeout        time.Duration
 23	failures       int
 24	state          State
 25	lastFailTime   time.Time
 26	mu             sync.RWMutex
 27}
 28
 29func NewCircuitBreaker(maxFailures int, timeout time.Duration) *CircuitBreaker {
 30	return &CircuitBreaker{
 31		maxFailures: maxFailures,
 32		timeout:     timeout,
 33		state:       StateClosed,
 34	}
 35}
 36
 37var ErrCircuitOpen = errors.New("circuit breaker is open")
 38
 39func Call(ctx context.Context, fn func() error) error {
 40	cb.mu.Lock()
 41
 42	// Check if we can transition from Open to Half-Open
 43	if cb.state == StateOpen {
 44		if time.Since(cb.lastFailTime) > cb.timeout {
 45			cb.state = StateHalfOpen
 46			cb.failures = 0
 47		} else {
 48			cb.mu.Unlock()
 49			return ErrCircuitOpen
 50		}
 51	}
 52
 53	// Allow only one request in Half-Open state
 54	if cb.state == StateHalfOpen && cb.failures > 0 {
 55		cb.mu.Unlock()
 56		return ErrCircuitOpen
 57	}
 58
 59	cb.mu.Unlock()
 60
 61	// Execute the function
 62	err := fn()
 63
 64	cb.mu.Lock()
 65	defer cb.mu.Unlock()
 66
 67	if err != nil {
 68		cb.failures++
 69		cb.lastFailTime = time.Now()
 70
 71		if cb.failures >= cb.maxFailures {
 72			cb.state = StateOpen
 73		}
 74
 75		return fmt.Errorf("call failed: %w", err)
 76	}
 77
 78	// Success - reset circuit
 79	if cb.state == StateHalfOpen {
 80		cb.state = StateClosed
 81	}
 82	cb.failures = 0
 83
 84	return nil
 85}
 86
 87func GetState() State {
 88	cb.mu.RLock()
 89	defer cb.mu.RUnlock()
 90	return cb.state
 91}
 92
 93// Example usage
 94func main() {
 95	cb := NewCircuitBreaker(5, 30*time.Second)
 96
 97	// Simulate service call
 98	err := cb.Call(context.Background(), func() error {
 99		// Call external service
100		return callExternalService()
101	})
102
103	if err != nil {
104		if errors.Is(err, ErrCircuitOpen) {
105			fmt.Println("circuit breaker is open, request rejected")
106		} else {
107			fmt.Printf("service call failed: %v\n", err)
108		}
109	}
110}
111
112func callExternalService() error {
113	// Simulated external service call
114	return nil
115}

Explanation

State Transitions:

Closed → Open: After maxFailures consecutive failures
Open → Half-Open: After timeout period
Half-Open → Closed: On successful test request
Half-Open → Open: On failed test request

Concurrency Safety:

RWMutex protects state and counters
Atomic state transitions

Benefits:

Prevents cascading failures
Gives failing services time to recover
Fast-fails when service is down

Key Takeaways

Circuit breakers prevent cascade failures
State machine manages recovery automatically
Fast-fail reduces latency during outages

Exercise 10 - Correlation IDs for Distributed Tracing

Add request tracking through distributed system with correlation IDs.

Requirements

Generate correlation ID for each request
Propagate ID through middleware
Include ID in logs and downstream calls
Add ID to response headers

Click to see solution

  1// run
  2package main
  3
  4import (
  5	"context"
  6	"log"
  7	"net/http"
  8
  9	"github.com/gin-gonic/gin"
 10	"github.com/google/uuid"
 11)
 12
 13type contextKey string
 14
 15const correlationIDKey contextKey = "correlation_id"
 16
 17// Middleware to add correlation ID
 18func CorrelationIDMiddleware() gin.HandlerFunc {
 19	return func(c *gin.Context) {
 20		// Check if correlation ID exists in request header
 21		correlationID := c.GetHeader("X-Correlation-ID")
 22
 23		// Generate new ID if not present
 24		if correlationID == "" {
 25			correlationID = uuid.New().String()
 26		}
 27
 28		// Add to context
 29		ctx := context.WithValue(c.Request.Context(), correlationIDKey, correlationID)
 30		c.Request = c.Request.WithContext(ctx)
 31
 32		// Add to response header
 33		c.Header("X-Correlation-ID", correlationID)
 34
 35		// Log request with correlation ID
 36		log.Printf("[%s] %s %s", correlationID, c.Request.Method, c.Request.URL.Path)
 37
 38		c.Next()
 39
 40		// Log response
 41		log.Printf("[%s] Response: %d", correlationID, c.Writer.Status())
 42	}
 43}
 44
 45// Extract correlation ID from context
 46func GetCorrelationID(ctx context.Context) string {
 47	if id, ok := ctx.Value(correlationIDKey).(string); ok {
 48		return id
 49	}
 50	return ""
 51}
 52
 53// Service layer using correlation ID
 54func getUserService(ctx context.Context, userID string) {
 55	correlationID := GetCorrelationID(ctx)
 56	log.Printf("[%s] Fetching user %s from database", correlationID, userID)
 57
 58	// Database call would go here
 59	user := &User{ID: userID, Name: "Alice"}
 60
 61	return user, nil
 62}
 63
 64// HTTP client propagating correlation ID
 65func callDownstreamService(ctx context.Context, url string) error {
 66	correlationID := GetCorrelationID(ctx)
 67
 68	req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
 69	req.Header.Set("X-Correlation-ID", correlationID)
 70
 71	log.Printf("[%s] Calling downstream service: %s", correlationID, url)
 72
 73	client := &http.Client{}
 74	resp, err := client.Do(req)
 75	if err != nil {
 76		return err
 77	}
 78	defer resp.Body.Close()
 79
 80	log.Printf("[%s] Downstream response: %d", correlationID, resp.StatusCode)
 81	return nil
 82}
 83
 84type User struct {
 85	ID   string `json:"id"`
 86	Name string `json:"name"`
 87}
 88
 89func main() {
 90	r := gin.Default()
 91	r.Use(CorrelationIDMiddleware())
 92
 93	r.GET("/users/:id", func(c *gin.Context) {
 94		userID := c.Param("id")
 95
 96		user, err := getUserService(c.Request.Context(), userID)
 97		if err != nil {
 98			c.JSON(500, gin.H{"error": "internal server error"})
 99			return
100		}
101
102		// Call downstream service
103		_ = callDownstreamService(c.Request.Context(), "http://notification-service/notify")
104
105		c.JSON(200, user)
106	})
107
108	r.Run(":8080")
109}

Explanation

Correlation ID Flow:

Middleware generates/extracts correlation ID
ID stored in request context
ID propagated to all logs
ID sent to downstream services via header

Benefits:

Trace requests across services
Debug distributed systems easily
Link logs from multiple services

Key Takeaways

Correlation IDs enable distributed tracing
Context propagates IDs through call stack
Include ID in all logs and downstream calls

Exercise 11 - Prometheus Metrics Instrumentation

Add Prometheus metrics to HTTP handler with custom metrics.

Requirements

HTTP request counter by method and path
Request duration histogram
Active connections gauge
Custom business metric

Click to see solution

 1// run
 2package main
 3
 4import (
 5	"time"
 6
 7	"github.com/gin-gonic/gin"
 8	"github.com/prometheus/client_golang/prometheus"
 9	"github.com/prometheus/client_golang/prometheus/promauto"
10	"github.com/prometheus/client_golang/prometheus/promhttp"
11)
12
13var (
14	httpRequestsTotal = promauto.NewCounterVec(
15		prometheus.CounterOpts{
16			Name: "http_requests_total",
17			Help: "Total number of HTTP requests",
18		},
19		[]string{"method", "path", "status"},
20	)
21
22	httpRequestDuration = promauto.NewHistogramVec(
23		prometheus.HistogramOpts{
24			Name:    "http_request_duration_seconds",
25			Help:    "HTTP request duration in seconds",
26			Buckets: prometheus.DefBuckets,
27		},
28		[]string{"method", "path"},
29	)
30
31	activeConnections = promauto.NewGauge(
32		prometheus.GaugeOpts{
33			Name: "http_active_connections",
34			Help: "Number of active HTTP connections",
35		},
36	)
37
38	ordersCreated = promauto.NewCounter(
39		prometheus.CounterOpts{
40			Name: "orders_created_total",
41			Help: "Total number of orders created",
42		},
43	)
44)
45
46func PrometheusMiddleware() gin.HandlerFunc {
47	return func(c *gin.Context) {
48		start := time.Now()
49
50		activeConnections.Inc()
51		defer activeConnections.Dec()
52
53		c.Next()
54
55		duration := time.Since(start).Seconds()
56		status := c.Writer.Status()
57
58		httpRequestsTotal.WithLabelValues(
59			c.Request.Method,
60			c.FullPath(),
61			fmt.Sprintf("%d", status),
62		).Inc()
63
64		httpRequestDuration.WithLabelValues(
65			c.Request.Method,
66			c.FullPath(),
67		).Observe(duration)
68	}
69}
70
71func main() {
72	r := gin.Default()
73	r.Use(PrometheusMiddleware())
74
75	// Metrics endpoint
76	r.GET("/metrics", gin.WrapH(promhttp.Handler()))
77
78	r.POST("/orders", func(c *gin.Context) {
79		// Create order logic
80		ordersCreated.Inc()
81
82		c.JSON(201, gin.H{"status": "created"})
83	})
84
85	r.Run(":8080")
86}

Explanation

Metric Types:

Counter: Monotonically increasing
Histogram: Distribution of values
Gauge: Current value

Labels:

Enable filtering and aggregation
Method, path, status for request metrics

Best Practices:

Use promauto for automatic registration
Choose appropriate metric type
Keep cardinality low

Key Takeaways

Prometheus metrics enable observability
Use appropriate metric types for data
Labels allow powerful queries in PromQL

Exercise 12 - OpenTelemetry Distributed Tracing

Add distributed tracing with OpenTelemetry to track requests across services.

Requirements

Initialize tracer provider with Jaeger exporter
Create spans for HTTP handlers
Propagate trace context to downstream calls
Add custom span attributes

Click to see solution

  1// run
  2package main
  3
  4import (
  5	"context"
  6	"log"
  7	"net/http"
  8
  9	"github.com/gin-gonic/gin"
 10	"go.opentelemetry.io/otel"
 11	"go.opentelemetry.io/otel/attribute"
 12	"go.opentelemetry.io/otel/exporters/jaeger"
 13	"go.opentelemetry.io/otel/propagation"
 14	"go.opentelemetry.io/otel/sdk/resource"
 15	sdktrace "go.opentelemetry.io/otel/sdk/trace"
 16	semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
 17	"go.opentelemetry.io/otel/trace"
 18)
 19
 20var tracer trace.Tracer
 21
 22func initTracer() func() {
 23	exporter, err := jaeger.New(jaeger.WithCollectorEndpoint(
 24		jaeger.WithEndpoint("http://localhost:14268/api/traces"),
 25	))
 26	if err != nil {
 27		log.Fatal(err)
 28	}
 29
 30	tp := sdktrace.NewTracerProvider(
 31		sdktrace.WithBatcher(exporter),
 32		sdktrace.WithResource(resource.NewWithAttributes(
 33			semconv.SchemaURL,
 34			semconv.ServiceNameKey.String("user-service"),
 35		)),
 36	)
 37
 38	otel.SetTracerProvider(tp)
 39	otel.SetTextMapPropagator(propagation.TraceContext{})
 40
 41	tracer = tp.Tracer("user-service")
 42
 43	return func() { tp.Shutdown(context.Background()) }
 44}
 45
 46func TracingMiddleware() gin.HandlerFunc {
 47	return func(c *gin.Context) {
 48		ctx := otel.GetTextMapPropagator().Extract(
 49			c.Request.Context(),
 50			propagation.HeaderCarrier(c.Request.Header),
 51		)
 52
 53		ctx, span := tracer.Start(ctx, c.Request.URL.Path)
 54		defer span.End()
 55
 56		span.SetAttributes(
 57			attribute.String("http.method", c.Request.Method),
 58			attribute.String("http.url", c.Request.URL.String()),
 59			attribute.String("http.user_agent", c.Request.UserAgent()),
 60		)
 61
 62		c.Request = c.Request.WithContext(ctx)
 63		c.Next()
 64
 65		span.SetAttributes(
 66			attribute.Int("http.status_code", c.Writer.Status()),
 67		)
 68	}
 69}
 70
 71func getUser(ctx context.Context, userID string) error {
 72	_, span := tracer.Start(ctx, "getUser")
 73	defer span.End()
 74
 75	span.SetAttributes(attribute.String("user.id", userID))
 76
 77	// Database query
 78	// ...
 79
 80	return nil
 81}
 82
 83func callNotificationService(ctx context.Context) error {
 84	ctx, span := tracer.Start(ctx, "callNotificationService")
 85	defer span.End()
 86
 87	req, _ := http.NewRequestWithContext(ctx, "POST", "http://notification-service/send", nil)
 88
 89	// Propagate trace context
 90	otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))
 91
 92	client := &http.Client{}
 93	resp, err := client.Do(req)
 94	if err != nil {
 95		span.RecordError(err)
 96		return err
 97	}
 98	defer resp.Body.Close()
 99
100	span.SetAttributes(attribute.Int("http.status_code", resp.StatusCode))
101	return nil
102}
103
104func main() {
105	shutdown := initTracer()
106	defer shutdown()
107
108	r := gin.Default()
109	r.Use(TracingMiddleware())
110
111	r.GET("/users/:id", func(c *gin.Context) {
112		userID := c.Param("id")
113
114		if err := getUser(c.Request.Context(), userID); err != nil {
115			c.JSON(500, gin.H{"error": "failed to get user"})
116			return
117		}
118
119		_ = callNotificationService(c.Request.Context())
120
121		c.JSON(200, gin.H{"user_id": userID})
122	})
123
124	r.Run(":8080")
125}

Explanation

Distributed Tracing:

Tracer creates spans for operations
Context propagation links spans across services
Jaeger collects and visualizes traces

Span Attributes:

Add metadata to spans
Enable filtering and analysis in Jaeger

Error Recording:

Record errors in spans for debugging
Mark span as failed

Key Takeaways

OpenTelemetry provides vendor-neutral tracing
Context propagation is critical for distributed tracing
Spans with attributes enable powerful debugging

Exercise 13 - Integration Testing with Testcontainers

Write integration tests using testcontainers for database-backed API.

Requirements

Spin up PostgreSQL container for tests
Run migrations before tests
Test CRUD operations end-to-end
Clean up containers after tests

Click to see solution

  1package main
  2
  3import (
  4	"context"
  5	"database/sql"
  6	"testing"
  7	"time"
  8
  9	_ "github.com/lib/pq"
 10	"github.com/stretchr/testify/assert"
 11	"github.com/stretchr/testify/require"
 12	"github.com/testcontainers/testcontainers-go"
 13	"github.com/testcontainers/testcontainers-go/wait"
 14)
 15
 16func setupTestDB(t *testing.T)) {
 17	ctx := context.Background()
 18
 19	req := testcontainers.ContainerRequest{
 20		Image:        "postgres:15-alpine",
 21		ExposedPorts: []string{"5432/tcp"},
 22		Env: map[string]string{
 23			"POSTGRES_USER":     "test",
 24			"POSTGRES_PASSWORD": "test",
 25			"POSTGRES_DB":       "testdb",
 26		},
 27		WaitingFor: wait.ForLog("database system is ready to accept connections").
 28			WithOccurrence(2).
 29			WithStartupTimeout(60 * time.Second),
 30	}
 31
 32	postgres, err := testcontainers.GenericContainer(ctx, testcontainers.GenericContainerRequest{
 33		ContainerRequest: req,
 34		Started:          true,
 35	})
 36	require.NoError(t, err)
 37
 38	host, _ := postgres.Host(ctx)
 39	port, _ := postgres.MappedPort(ctx, "5432")
 40
 41	dsn := fmt.Sprintf("postgres://test:test@%s:%s/testdb?sslmode=disable", host, port.Port())
 42	db, err := sql.Open("postgres", dsn)
 43	require.NoError(t, err)
 44
 45	// Run migrations
 46	_, err = db.Exec(`
 47		CREATE TABLE users (
 48			id SERIAL PRIMARY KEY,
 49			email VARCHAR(255) UNIQUE NOT NULL,
 50			name VARCHAR(255) NOT NULL,
 51			created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
 52		)
 53	`)
 54	require.NoError(t, err)
 55
 56	cleanup := func() {
 57		db.Close()
 58		postgres.Terminate(ctx)
 59	}
 60
 61	return db, cleanup
 62}
 63
 64func TestUserCRUD(t *testing.T) {
 65	db, cleanup := setupTestDB(t)
 66	defer cleanup()
 67
 68	ctx := context.Background()
 69
 70	// Test Create
 71	var userID int
 72	err := db.QueryRowContext(ctx,
 73		"INSERT INTO users VALUES RETURNING id",
 74		"alice@example.com", "Alice").Scan(&userID)
 75	require.NoError(t, err)
 76	assert.Greater(t, userID, 0)
 77
 78	// Test Read
 79	var email, name string
 80	err = db.QueryRowContext(ctx,
 81		"SELECT email, name FROM users WHERE id = $1", userID).Scan(&email, &name)
 82	require.NoError(t, err)
 83	assert.Equal(t, "alice@example.com", email)
 84	assert.Equal(t, "Alice", name)
 85
 86	// Test Update
 87	_, err = db.ExecContext(ctx,
 88		"UPDATE users SET name = $1 WHERE id = $2", "Alice Smith", userID)
 89	require.NoError(t, err)
 90
 91	err = db.QueryRowContext(ctx,
 92		"SELECT name FROM users WHERE id = $1", userID).Scan(&name)
 93	require.NoError(t, err)
 94	assert.Equal(t, "Alice Smith", name)
 95
 96	// Test Delete
 97	_, err = db.ExecContext(ctx, "DELETE FROM users WHERE id = $1", userID)
 98	require.NoError(t, err)
 99
100	err = db.QueryRowContext(ctx,
101		"SELECT id FROM users WHERE id = $1", userID).Scan(&userID)
102	assert.Equal(t, sql.ErrNoRows, err)
103}

Explanation

Testcontainers Benefits:

Real PostgreSQL instance
Isolated test environment
Automatic cleanup

Test Structure:

Setup: Spin up container, run migrations
Execute: CRUD operations
Cleanup: Close DB, terminate container

Wait Strategy:

Wait for PostgreSQL ready message
Ensures container is fully initialized

Key Takeaways

Testcontainers enable realistic integration tests
Real databases catch bugs mocks can't
Automatic cleanup prevents resource leaks

Exercise 14 - Load Testing with vegeta

Create load test script using vegeta to stress test HTTP API.

Requirements

Target endpoint at 100 requests/second
Duration of 30 seconds
Generate report with latency percentiles
Identify performance bottlenecks

Click to see solution

 1#!/bin/bash
 2# load_test.sh
 3
 4# vegeta attack parameters
 5RATE=100
 6DURATION=30s
 7TARGET="http://localhost:8080/api/users"
 8
 9# Create targets file
10cat > targets.txt <<EOF
11GET $TARGET
12Content-Type: application/json
13
14POST $TARGET
15Content-Type: application/json
16@user_payload.json
17EOF
18
19# Create payload
20cat > user_payload.json <<EOF
21{
22  "email": "test@example.com",
23  "name": "Test User"
24}
25EOF
26
27# Run load test
28echo "Starting load test: $RATE req/s for $DURATION"
29vegeta attack \
30  -rate=$RATE \
31  -duration=$DURATION \
32  -targets=targets.txt \
33  | tee results.bin \
34  | vegeta report
35
36# Generate reports
37echo ""
38echo "=== Latency Report ==="
39vegeta report -type=text results.bin
40
41echo ""
42echo "=== Histogram ==="
43vegeta report -type='hist[0,10ms,20ms,50ms,100ms,200ms,500ms,1s]' results.bin
44
45echo ""
46echo "=== JSON Report ==="
47vegeta report -type=json results.bin > report.json
48
49echo ""
50echo "=== Plotting Results ==="
51vegeta plot results.bin > plot.html
52echo "Plot saved to plot.html"
53
54# Cleanup
55rm targets.txt user_payload.json

 1// run
 2// analyze_results.go - Parse vegeta JSON output
 3package main
 4
 5import (
 6	"encoding/json"
 7	"fmt"
 8	"os"
 9)
10
11type VegetaReport struct {
12	Latencies struct {
13		P50   int64 `json:"50th"`
14		P95   int64 `json:"95th"`
15		P99   int64 `json:"99th"`
16		Max   int64 `json:"max"`
17		Mean  int64 `json:"mean"`
18	} `json:"latencies"`
19	Requests    int     `json:"requests"`
20	Success     float64 `json:"success"`
21	Duration    int64   `json:"duration"`
22	Throughput  float64 `json:"throughput"`
23	StatusCodes map[string]int `json:"status_codes"`
24}
25
26func main() {
27	data, _ := os.ReadFile("report.json")
28
29	var report VegetaReport
30	json.Unmarshal(data, &report)
31
32	fmt.Println("=== Load Test Analysis ===")
33	fmt.Printf("Total Requests: %d\n", report.Requests)
34	fmt.Printf("Success Rate: %.2f%%\n", report.Success*100)
35	fmt.Printf("Throughput: %.2f req/s\n\n", report.Throughput)
36
37	fmt.Println("Latency Percentiles:")
38	fmt.Printf("  P50: %d ms\n", report.Latencies.P50/1000000)
39	fmt.Printf("  P95: %d ms\n", report.Latencies.P95/1000000)
40	fmt.Printf("  P99: %d ms\n", report.Latencies.P99/1000000)
41	fmt.Printf("  Max: %d ms\n", report.Latencies.Max/1000000)
42
43	fmt.Println("\nStatus Codes:")
44	for code, count := range report.StatusCodes {
45		fmt.Printf("  %s: %d\n", code, count)
46	}
47
48	// Performance evaluation
49	if report.Success < 0.99 {
50		fmt.Println("\n⚠️  Warning: Success rate below 99%")
51	}
52
53	if report.Latencies.P95 > 200*1000000 {
54		fmt.Println("⚠️  Warning: P95 latency exceeds 200ms")
55	}
56
57	if report.Latencies.P99 > 500*1000000 {
58		fmt.Println("⚠️  Warning: P99 latency exceeds 500ms")
59	}
60}

Explanation

Load Test Configuration:

Rate: 100 requests per second
Duration: 30 seconds
Multiple HTTP methods

Metrics Analyzed:

Latency percentiles
Success rate
Throughput
Status code distribution

Performance Thresholds:

Success rate should be > 99%
P95 latency < 200ms
P99 latency < 500ms

Key Takeaways

Load testing reveals performance under stress
Monitor latency percentiles, not just averages
Set SLOs based on P95/P99 latency

Exercise 15 - Chaos Testing

Implement chaos test that injects random failures to verify system resilience.

Requirements

Randomly fail 10% of requests
Verify circuit breaker opens after failures
Test retry logic handles transient errors
Confirm graceful degradation

Click to see solution

  1package main
  2
  3import (
  4	"errors"
  5	"math/rand"
  6	"net/http"
  7	"sync/atomic"
  8	"testing"
  9	"time"
 10
 11	"github.com/stretchr/testify/assert"
 12)
 13
 14// Chaos middleware that randomly fails requests
 15func ChaosMiddleware(failureRate float64) func(http.Handler) http.Handler {
 16	return func(next http.Handler) http.Handler {
 17		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 18			if rand.Float64() < failureRate {
 19				w.WriteHeader(http.StatusInternalServerError)
 20				w.Write([]byte(`{"error": "chaos monkey struck"}`))
 21				return
 22			}
 23			next.ServeHTTP(w, r)
 24		})
 25	}
 26}
 27
 28// Resilient client with retries
 29type ResilientClient struct {
 30	client      *http.Client
 31	maxRetries  int
 32	retryDelay  time.Duration
 33}
 34
 35func Get(url string) {
 36	var lastErr error
 37
 38	for attempt := 0; attempt <= c.maxRetries; attempt++ {
 39		resp, err := c.client.Get(url)
 40
 41		if err == nil && resp.StatusCode < 500 {
 42			return resp, nil
 43		}
 44
 45		if resp != nil {
 46			resp.Body.Close()
 47		}
 48
 49		lastErr = err
 50		if lastErr == nil {
 51			lastErr = errors.New("server error")
 52		}
 53
 54		if attempt < c.maxRetries {
 55			time.Sleep(c.retryDelay * time.Duration(attempt+1))
 56		}
 57	}
 58
 59	return nil, lastErr
 60}
 61
 62func TestChaosResilience(t *testing.T) {
 63	// Test circuit breaker under chaos
 64	cb := NewCircuitBreaker(5, 10*time.Second)
 65
 66	var successCount, failureCount int32
 67
 68	chaosFunc := func() error {
 69		if rand.Float64() < 0.3 { // 30% failure rate
 70			atomic.AddInt32(&failureCount, 1)
 71			return errors.New("chaos failure")
 72		}
 73		atomic.AddInt32(&successCount, 1)
 74		return nil
 75	}
 76
 77	// Execute many requests
 78	for i := 0; i < 100; i++ {
 79		err := cb.Call(context.Background(), chaosFunc)
 80
 81		// Circuit should open after consecutive failures
 82		if err == ErrCircuitOpen {
 83			t.Logf("Circuit opened after %d total calls", i+1)
 84			break
 85		}
 86
 87		time.Sleep(10 * time.Millisecond)
 88	}
 89
 90	assert.Equal(t, StateOpen, cb.GetState(), "circuit should be open")
 91	t.Logf("Successes: %d, Failures: %d", successCount, failureCount)
 92}
 93
 94func TestRetryResilience(t *testing.T) {
 95	client := &ResilientClient{
 96		client:     &http.Client{Timeout: 5 * time.Second},
 97		maxRetries: 3,
 98		retryDelay: 100 * time.Millisecond,
 99	}
100
101	server := httptest.NewServer(ChaosMiddleware(0.5)(
102		http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
103			w.WriteHeader(http.StatusOK)
104			w.Write([]byte(`{"status": "ok"}`))
105		}),
106	))
107	defer server.Close()
108
109	successCount := 0
110	totalRequests := 50
111
112	for i := 0; i < totalRequests; i++ {
113		resp, err := client.Get(server.URL)
114		if err == nil && resp.StatusCode == 200 {
115			successCount++
116			resp.Body.Close()
117		}
118	}
119
120	successRate := float64(successCount) / float64(totalRequests)
121	t.Logf("Success rate with retries: %.2f%%", successRate*100)
122
123	// With 50% chaos and 3 retries, success rate should be high
124	assert.Greater(t, successRate, 0.85, "retry logic should achieve >85% success")
125}

Explanation

Chaos Testing Principles:

Inject random failures to test resilience
Verify error handling and recovery
Measure system behavior under failure

Chaos Middleware:

Randomly fails percentage of requests
Simulates service instability
Tests downstream resilience

Verification:

Circuit breaker opens after failures
Retry logic improves success rate
System degrades gracefully

Key Takeaways

Chaos testing validates resilience mechanisms
Random failures reveal hidden bugs
Measure success rates under chaos to set SLOs

Comprehensive Key Takeaways

Congratulations on completing all 15 production engineering exercises! You've gained hands-on experience with:

Cloud-Native Development

Docker multi-stage builds for secure, minimal images
Kubernetes deployments with health checks and autoscaling
Service mesh traffic management with Istio
Serverless functions with cold start optimization

Microservices & Communication

gRPC services with Protocol Buffers for type-safe APIs
Event streaming with Kafka for asynchronous communication
Redis caching strategies for performance optimization
Rate limiting algorithms for API protection

Observability

Prometheus metrics for monitoring system health
OpenTelemetry distributed tracing across services
Correlation IDs for request tracking
Structured logging for debugging

Resilience & Testing

Circuit breaker pattern for preventing cascade failures
Integration testing with real dependencies
Load testing to identify performance bottlenecks
Chaos testing to verify resilience under failure

Production Patterns

Infrastructure as code with Kubernetes manifests
Graceful shutdown and health checks
Resource management and autoscaling
Security hardening and non-root containers

Next Steps

You've now completed exercises covering all four sections of The Modern Go Tutorial:

✅ The Go Language - Fundamentals and syntax
✅ Standard Library - Essential packages and patterns
✅ Advanced Topics - Generics, reflection, design patterns, performance
✅ Production Engineering - Cloud-native, observability, testing

Continue your learning:

Build the Section Project: Apply these concepts in the Cloud-Native E-Commerce Platform - a comprehensive microservices system
Explore Capstone Projects: Tackle expert-level projects in Section 7: Capstone Projects
Apply to Real Projects: Use these patterns in production systems
Contribute to Open Source: Practice production engineering in real-world Go projects

You're now equipped with production-ready Go engineering skills. Keep building! 🚀