Error Handling in Go

Why This Matters - Building Robust, Reliable Systems

Error handling is not just about catching problems - it's about building reliable, maintainable systems that handle failures gracefully. Go's explicit error handling forces you to think about failure at every step, creating more resilient code.

Real-world impact: Think about a payment processing system. When a database connection fails, does your application crash with an unhandled exception? Or does it log the error, retry with a different database, and notify operations? The difference impacts system reliability, user experience, and operational costs.

Business value: Proper error handling enables you to:

  • Build reliable systems that recover from failures gracefully
  • Provide clear debugging information for faster problem resolution
  • Implement graceful degradation when components fail
  • Create observable systems with comprehensive error tracking
  • Design predictable APIs that clearly communicate failure modes
  • Meet SLAs and reliability targets by anticipating and handling failures

System reliability: Go's error handling philosophy makes failures visible throughout your codebase, preventing silent failures that could cause production issues.

Learning Objectives

By the end of this tutorial, you will be able to:

  • Understand Go's error handling philosophy and why explicit is better than implicit
  • Master the error interface and create custom error types
  • Implement error wrapping with proper context preservation
  • Use error inspection techniques (errors.Is() and errors.As())
  • Apply production-ready error handling patterns with logging and metrics
  • Design APIs that provide clear, actionable error information
  • Implement graceful degradation and recovery strategies
  • Avoid common error handling pitfalls that lead to production issues
  • Build comprehensive error tracking and monitoring systems

Core Concepts - Understanding Go's Error Philosophy

Explicit vs Implicit Error Handling

Go deliberately avoids exceptions and implicit error handling. Instead, it makes errors explicit, ordinary values that you must handle.

The philosophy: Errors are values, not exceptional conditions. This means:

  • Errors are returned as ordinary function return values
  • You must explicitly check and handle errors
  • Error handling logic is visible in your code flow
  • There's no hidden control flow like try/catch/finally

Why this matters: In languages with exceptions:

 1// Java: Errors can come from anywhere without warning
 2try {
 3    processPayment(amount);
 4    sendReceipt();
 5    updateInventory();
 6    updateAccountBalance();
 7    // Any of these might throw - you must read documentation!
 8} catch (Exception e) {
 9    // What failed? Why? Is this recoverable?
10    // What state are we in now?
11    handleGenericError(e);
12}

Problems with exceptions:

  • Hidden control flow: Any function might throw, but you can't see it in the code
  • State uncertainty: When an exception is thrown, what state is your data in?
  • All-or-nothing: Either everything succeeds or the whole operation fails
  • Generic handling: Catch blocks often handle disparate errors the same way
  • Performance: Exception handling has runtime overhead

Go's explicit approach:

 1// Go: Each operation's potential for failure is explicit
 2err := processPayment(amount)
 3if err != nil {
 4    return fmt.Errorf("payment processing failed: %w", err)
 5}
 6
 7err = sendReceipt()
 8if err != nil {
 9    // Payment succeeded but receipt failed - we know exactly where we are
10    logError("receipt sending failed", err)
11    // Continue or compensate as needed
12}
13
14err = updateInventory()
15if err != nil {
16    return fmt.Errorf("inventory update failed: %w", err)
17}
18
19err = updateAccountBalance()
20if err != nil {
21    return fmt.Errorf("balance update failed: %w", err)
22}

Benefits of explicit handling:

  • Clarity: You can see exactly which functions can fail
  • Local handling: Errors are handled where they occur, with full context
  • Predictable flow: No hidden control transfers or stack unwinding
  • Context preservation: Each step can add its own context
  • Fine-grained control: Different errors at different points handled differently
  • State management: You know exactly what succeeded before the error

The Error Interface: Simplicity and Power

Go's error handling is built on a simple, elegant interface:

1type error interface {
2    Error() string
3}

What this means:

  • Any type with an Error() string method is an error
  • No special syntax needed - errors are just values
  • Flexible implementation - create rich error types with additional methods
  • Interface satisfaction - your types can conform naturally
  • Composition: Errors can wrap other errors, preserving the error chain

Philosophy: Errors are values, not special language constructs. This approach:

  • Eliminates special error handling syntax
  • Allows errors to carry additional data and methods
  • Enables polymorphic error handling through interfaces
  • Keeps the language simple and consistent
  • Makes error handling testable and composable

Standard library error creation:

1// Simple error with fixed message
2err := errors.New("something went wrong")
3
4// Formatted error with dynamic content
5err := fmt.Errorf("failed to process user %s: %v", username, originalErr)
6
7// Error wrapping (Go 1.13+)
8err := fmt.Errorf("failed to connect: %w", originalErr)

Error Wrapping and Unwrapping

Go 1.13 introduced error wrapping, a powerful feature for preserving error chains:

Error wrapping (%w verb):

1if err != nil {
2    return fmt.Errorf("database query failed: %w", err)
3}

Why wrap errors?:

  • Preserve the original error for inspection
  • Add context at each layer of your application
  • Enable error type checking through the chain
  • Build informative error messages with full context

Error inspection:

 1// Check if error is or wraps a specific error
 2if errors.Is(err, sql.ErrNoRows) {
 3    // Handle "no rows" error
 4}
 5
 6// Extract specific error type from chain
 7var netErr *net.OpError
 8if errors.As(err, &netErr) {
 9    // Access network-specific error details
10    fmt.Println("Operation:", netErr.Op)
11    fmt.Println("Network:", netErr.Net)
12}

Practical Examples - From Basics to Production

Example 1: Basic Error Creation and Handling

Let's start with fundamental error handling patterns:

 1// run
 2package main
 3
 4import (
 5    "errors"
 6    "fmt"
 7    "strconv"
 8    "strings"
 9)
10
11// Function that can fail in multiple ways
12func validateAge(age string) (int, error) {
13    if age == "" {
14        return 0, errors.New("age cannot be empty")
15    }
16
17    // Trim whitespace
18    age = strings.TrimSpace(age)
19
20    ageInt, err := strconv.Atoi(age)
21    if err != nil {
22        return 0, fmt.Errorf("invalid age format '%s': %w", age, err)
23    }
24
25    if ageInt < 0 {
26        return 0, fmt.Errorf("age cannot be negative: %d", ageInt)
27    }
28
29    if ageInt > 120 {
30        return 0, fmt.Errorf("age %d seems unrealistic (must be 0-120)", ageInt)
31    }
32
33    return ageInt, nil
34}
35
36func registerUser(name, ageStr string) error {
37    fmt.Printf("Registering user: %s\n", name)
38
39    if name == "" {
40        return fmt.Errorf("user registration failed: name cannot be empty")
41    }
42
43    // Handle validation error with context
44    age, err := validateAge(ageStr)
45    if err != nil {
46        return fmt.Errorf("user registration failed for %s: %w", name, err)
47    }
48
49    fmt.Printf("Successfully registered user: %s, age %d\n", name, age)
50    return nil
51}
52
53func main() {
54    users := []struct {
55        name string
56        age  string
57    }{
58        {"Alice", "25"},
59        {"Bob", "invalid"},
60        {"", "30"},
61        {"Charlie", "-5"},
62        {"Diana", "150"},
63        {"Eve", "  42  "}, // Test whitespace handling
64    }
65
66    fmt.Println("=== User Registration Demo ===\n")
67
68    for i, user := range users {
69        fmt.Printf("--- Test %d ---\n", i+1)
70        err := registerUser(user.name, user.age)
71        if err != nil {
72            fmt.Printf("Error: %v\n", err)
73        } else {
74            fmt.Printf("Success!\n")
75        }
76        fmt.Println()
77    }
78}

What this demonstrates:

  • Simple error creation with errors.New()
  • Error wrapping with fmt.Errorf() and %w verb
  • Context addition at each level
  • Error handling with if err != nil pattern
  • Multiple error types: validation errors, parsing errors, business logic errors
  • Error message formatting with relevant details

Key patterns established:

  • Check errors immediately after calls
  • Add relevant context at each level
  • Use %w to preserve error chains
  • Return early when errors occur
  • Include relevant data in error messages

Example 2: Custom Error Types for Rich Context

Let's create domain-specific error types with additional behavior:

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "time"
  8)
  9
 10// Domain-specific error codes
 11type ErrorCode string
 12
 13const (
 14    ErrCodeValidation      ErrorCode = "VALIDATION_ERROR"
 15    ErrCodeNetwork         ErrorCode = "NETWORK_ERROR"
 16    ErrCodeAuthentication  ErrorCode = "AUTHENTICATION_ERROR"
 17    ErrCodeAuthorization   ErrorCode = "AUTHORIZATION_ERROR"
 18    ErrCodeRateLimit       ErrorCode = "RATE_LIMIT_ERROR"
 19    ErrCodeResourceNotFound ErrorCode = "RESOURCE_NOT_FOUND"
 20    ErrCodeConflict        ErrorCode = "CONFLICT_ERROR"
 21    ErrCodeInternal        ErrorCode = "INTERNAL_ERROR"
 22)
 23
 24// Custom error type with rich context
 25type ServiceError struct {
 26    Code       ErrorCode
 27    Message    string
 28    Timestamp  time.Time
 29    Retryable  bool
 30    StatusCode int // HTTP status code equivalent
 31    Details    map[string]interface{}
 32    Cause      error // Original error
 33}
 34
 35func (e *ServiceError) Error() string {
 36    if e.Cause != nil {
 37        return fmt.Sprintf("[%s] %s: %v", e.Code, e.Message, e.Cause)
 38    }
 39    return fmt.Sprintf("[%s] %s", e.Code, e.Message)
 40}
 41
 42func (e *ServiceError) Unwrap() error {
 43    return e.Cause
 44}
 45
 46// Check if error is retryable
 47func (e *ServiceError) IsRetryable() bool {
 48    return e.Retryable
 49}
 50
 51// Get HTTP status code
 52func (e *ServiceError) HTTPStatus() int {
 53    return e.StatusCode
 54}
 55
 56// Error constructors for different scenarios
 57func NewValidationError(field string, value interface{}, reason string) *ServiceError {
 58    return &ServiceError{
 59        Code:       ErrCodeValidation,
 60        Message:    fmt.Sprintf("validation failed for field '%s': %s", field, reason),
 61        Timestamp:  time.Now(),
 62        Retryable:  false,
 63        StatusCode: 400,
 64        Details: map[string]interface{}{
 65            "field":  field,
 66            "value":  value,
 67            "reason": reason,
 68        },
 69    }
 70}
 71
 72func NewAuthenticationError(username string, reason string) *ServiceError {
 73    return &ServiceError{
 74        Code:       ErrCodeAuthentication,
 75        Message:    fmt.Sprintf("authentication failed for user '%s': %s", username, reason),
 76        Timestamp:  time.Now(),
 77        Retryable:  false,
 78        StatusCode: 401,
 79        Details: map[string]interface{}{
 80            "username": username,
 81            "reason":   reason,
 82        },
 83    }
 84}
 85
 86func NewRateLimitError(resource string, limit int, retryAfter time.Duration) *ServiceError {
 87    return &ServiceError{
 88        Code:       ErrCodeRateLimit,
 89        Message:    fmt.Sprintf("rate limit exceeded for %s", resource),
 90        Timestamp:  time.Now(),
 91        Retryable:  true,
 92        StatusCode: 429,
 93        Details: map[string]interface{}{
 94            "resource":    resource,
 95            "limit":       limit,
 96            "retry_after": retryAfter.String(),
 97        },
 98    }
 99}
100
101func NewNotFoundError(resourceType string, identifier string) *ServiceError {
102    return &ServiceError{
103        Code:       ErrCodeResourceNotFound,
104        Message:    fmt.Sprintf("%s not found: %s", resourceType, identifier),
105        Timestamp:  time.Now(),
106        Retryable:  false,
107        StatusCode: 404,
108        Details: map[string]interface{}{
109            "resource_type": resourceType,
110            "identifier":    identifier,
111        },
112    }
113}
114
115// Example service using custom errors
116type UserService struct {
117    users      map[string]string // username -> password (simplified)
118    rateLimits map[string]int    // username -> attempt count
119}
120
121func NewUserService() *UserService {
122    return &UserService{
123        users: map[string]string{
124            "alice": "password123",
125            "bob":   "secure456",
126        },
127        rateLimits: make(map[string]int),
128    }
129}
130
131func (us *UserService) Login(username, password string) error {
132    // Validate input
133    if username == "" {
134        return NewValidationError("username", username, "cannot be empty")
135    }
136
137    if password == "" {
138        return NewValidationError("password", "***", "cannot be empty")
139    }
140
141    if len(password) < 8 {
142        return NewValidationError("password", "***", "must be at least 8 characters")
143    }
144
145    // Check rate limiting
146    attempts := us.rateLimits[username]
147    if attempts >= 3 {
148        return NewRateLimitError("login", 3, time.Minute*5)
149    }
150
151    // Check if user exists
152    storedPassword, exists := us.users[username]
153    if !exists {
154        us.rateLimits[username]++
155        return NewNotFoundError("user", username)
156    }
157
158    // Verify password
159    if password != storedPassword {
160        us.rateLimits[username]++
161        return NewAuthenticationError(username, "invalid credentials")
162    }
163
164    // Reset rate limit on successful login
165    delete(us.rateLimits, username)
166
167    fmt.Printf("Login successful for user: %s\n", username)
168    return nil
169}
170
171// Error handler that uses error type information
172func handleServiceError(err error) {
173    var serviceErr *ServiceError
174    if errors.As(err, &serviceErr) {
175        fmt.Printf("\n=== Service Error Details ===\n")
176        fmt.Printf("Code: %s\n", serviceErr.Code)
177        fmt.Printf("Message: %s\n", serviceErr.Message)
178        fmt.Printf("HTTP Status: %d\n", serviceErr.StatusCode)
179        fmt.Printf("Retryable: %v\n", serviceErr.Retryable)
180        fmt.Printf("Timestamp: %v\n", serviceErr.Timestamp.Format(time.RFC3339))
181
182        if len(serviceErr.Details) > 0 {
183            fmt.Printf("Details:\n")
184            for key, value := range serviceErr.Details {
185                fmt.Printf("  %s: %v\n", key, value)
186            }
187        }
188
189        // Provide actionable suggestions
190        switch serviceErr.Code {
191        case ErrCodeRateLimit:
192            if retryAfter, ok := serviceErr.Details["retry_after"]; ok {
193                fmt.Printf("\nSuggestion: Retry after %v\n", retryAfter)
194            }
195        case ErrCodeAuthentication:
196            fmt.Printf("\nSuggestion: Check credentials and try again\n")
197        case ErrCodeValidation:
198            fmt.Printf("\nSuggestion: Fix validation errors and resubmit\n")
199        case ErrCodeResourceNotFound:
200            fmt.Printf("\nSuggestion: Verify the resource identifier\n")
201        }
202    } else {
203        fmt.Printf("Generic error: %v\n", err)
204    }
205}
206
207func main() {
208    service := NewUserService()
209
210    fmt.Println("=== Custom Error Types Demo ===\n")
211
212    testCases := []struct {
213        name     string
214        username string
215        password string
216        desc     string
217    }{
218        {"Empty username", "", "password", "Validation error"},
219        {"Short password", "alice", "short", "Validation error"},
220        {"User not found", "charlie", "password123", "Not found error"},
221        {"Wrong password 1", "alice", "wrong", "Authentication error (attempt 1)"},
222        {"Wrong password 2", "alice", "wrong2", "Authentication error (attempt 2)"},
223        {"Wrong password 3", "alice", "wrong3", "Authentication error (attempt 3)"},
224        {"Rate limited", "alice", "password123", "Rate limit error"},
225        {"Valid login", "bob", "secure456", "Successful login"},
226    }
227
228    for i, tc := range testCases {
229        fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
230        err := service.Login(tc.username, tc.password)
231
232        if err != nil {
233            handleServiceError(err)
234        } else {
235            fmt.Println("Success!")
236        }
237
238        fmt.Println()
239        time.Sleep(time.Millisecond * 100)
240    }
241}

What this demonstrates:

  • Custom error types with rich context and behavior
  • Error constructors for consistent error creation
  • Domain-specific error codes for structured error handling
  • Error methods for accessing error properties (IsRetryable, HTTPStatus)
  • Contextual information including timestamps, details maps, and causes
  • Type assertion with errors.As() for specialized error handling
  • Actionable error messages with suggestions for resolution

Production-ready patterns:

  • Structured error information for debugging
  • Retry logic based on error properties
  • Rate limiting information in errors
  • Timestamps for error correlation
  • Business context in errors
  • HTTP status code mapping for web services

Example 3: Error Wrapping and Inspection

Error wrapping preserves context while maintaining access to underlying errors:

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "os"
  8)
  9
 10// Custom error types
 11var (
 12    ErrDatabase     = errors.New("database error")
 13    ErrNotFound     = errors.New("resource not found")
 14    ErrUnauthorized = errors.New("unauthorized access")
 15    ErrInvalidInput = errors.New("invalid input")
 16)
 17
 18// Simulated database layer
 19type Database struct {
 20    data map[string]string
 21}
 22
 23func NewDatabase() *Database {
 24    return &Database{
 25        data: map[string]string{
 26            "user:1": "Alice",
 27            "user:2": "Bob",
 28        },
 29    }
 30}
 31
 32func (db *Database) Get(key string) (string, error) {
 33    value, exists := db.data[key]
 34    if !exists {
 35        return "", fmt.Errorf("key %s: %w", key, ErrNotFound)
 36    }
 37    return value, nil
 38}
 39
 40// Repository layer (wraps database)
 41type UserRepository struct {
 42    db *Database
 43}
 44
 45func NewUserRepository(db *Database) *UserRepository {
 46    return &UserRepository{db: db}
 47}
 48
 49func (r *UserRepository) FindByID(id string) (string, error) {
 50    key := fmt.Sprintf("user:%s", id)
 51    name, err := r.db.Get(key)
 52    if err != nil {
 53        return "", fmt.Errorf("repository: failed to find user %s: %w", id, err)
 54    }
 55    return name, nil
 56}
 57
 58// Service layer (wraps repository)
 59type UserService struct {
 60    repo *UserRepository
 61}
 62
 63func NewUserService(repo *UserRepository) *UserService {
 64    return &UserService{repo: repo}
 65}
 66
 67func (s *UserService) GetUser(id string) (string, error) {
 68    if id == "" {
 69        return "", fmt.Errorf("service: %w", ErrInvalidInput)
 70    }
 71
 72    name, err := s.repo.FindByID(id)
 73    if err != nil {
 74        return "", fmt.Errorf("service: failed to get user: %w", err)
 75    }
 76
 77    return name, nil
 78}
 79
 80// Error inspection and handling
 81func handleError(err error) {
 82    fmt.Printf("\n=== Error Analysis ===\n")
 83    fmt.Printf("Full error message: %v\n\n", err)
 84
 85    // Check for specific sentinel errors
 86    if errors.Is(err, ErrNotFound) {
 87        fmt.Println("✓ Error is or wraps ErrNotFound")
 88        fmt.Println("  Action: Could return 404 to client")
 89    }
 90
 91    if errors.Is(err, ErrInvalidInput) {
 92        fmt.Println("✓ Error is or wraps ErrInvalidInput")
 93        fmt.Println("  Action: Could return 400 to client")
 94    }
 95
 96    if errors.Is(err, ErrUnauthorized) {
 97        fmt.Println("✓ Error is or wraps ErrUnauthorized")
 98        fmt.Println("  Action: Could return 401 to client")
 99    }
100
101    if errors.Is(err, os.ErrNotExist) {
102        fmt.Println("✓ Error is or wraps os.ErrNotExist")
103        fmt.Println("  Action: File system issue")
104    }
105
106    // Unwrap the error chain manually
107    fmt.Println("\nError chain:")
108    currentErr := err
109    depth := 0
110    for currentErr != nil {
111        fmt.Printf("  %d: %v\n", depth, currentErr)
112        currentErr = errors.Unwrap(currentErr)
113        depth++
114    }
115}
116
117func main() {
118    db := NewDatabase()
119    repo := NewUserRepository(db)
120    service := NewUserService(repo)
121
122    fmt.Println("=== Error Wrapping and Inspection Demo ===")
123
124    // Test 1: Successful retrieval
125    fmt.Println("\n--- Test 1: Successful retrieval ---")
126    name, err := service.GetUser("1")
127    if err != nil {
128        handleError(err)
129    } else {
130        fmt.Printf("Success: Found user: %s\n", name)
131    }
132
133    // Test 2: User not found (wrapped through multiple layers)
134    fmt.Println("\n--- Test 2: User not found ---")
135    name, err = service.GetUser("999")
136    if err != nil {
137        handleError(err)
138    }
139
140    // Test 3: Invalid input
141    fmt.Println("\n--- Test 3: Invalid input ---")
142    name, err = service.GetUser("")
143    if err != nil {
144        handleError(err)
145    }
146
147    // Demonstrate error wrapping depth
148    fmt.Println("\n--- Test 4: Multiple wrapping layers ---")
149
150    // Create deeply wrapped error
151    baseErr := errors.New("network timeout")
152    layer1 := fmt.Errorf("connection failed: %w", baseErr)
153    layer2 := fmt.Errorf("database query failed: %w", layer1)
154    layer3 := fmt.Errorf("user lookup failed: %w", layer2)
155
156    fmt.Println("Deeply wrapped error:")
157    handleError(layer3)
158}

What this demonstrates:

  • Error wrapping through multiple application layers
  • Context preservation from database → repository → service
  • Error inspection using errors.Is() for sentinel errors
  • Error unwrapping to traverse the error chain
  • Layered architecture with appropriate error handling at each level
  • Actionable error handling based on error type inspection

Key concepts:

  • Each layer adds its own context to errors
  • Original error remains accessible through the chain
  • errors.Is() works through wrapped errors
  • Error messages build a complete story of what failed
  • Different layers can make different decisions based on error types

Example 4: Advanced Error Handling with Retry Logic

Let's implement production-ready error handling with retry patterns:

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "errors"
  7    "fmt"
  8    "math/rand"
  9    "time"
 10)
 11
 12// Error types for different failure scenarios
 13type RetryableError struct {
 14    Attempt   int
 15    Cause     error
 16    Timestamp time.Time
 17}
 18
 19func (e *RetryableError) Error() string {
 20    return fmt.Sprintf("attempt %d failed at %v: %v",
 21        e.Attempt, e.Timestamp.Format("15:04:05"), e.Cause)
 22}
 23
 24func (e *RetryableError) Unwrap() error {
 25    return e.Cause
 26}
 27
 28type TemporaryError struct {
 29    Reason string
 30}
 31
 32func (e *TemporaryError) Error() string {
 33    return fmt.Sprintf("temporary failure: %s", e.Reason)
 34}
 35
 36func (e *TemporaryError) Temporary() bool {
 37    return true
 38}
 39
 40type PermanentError struct {
 41    Reason string
 42}
 43
 44func (e *PermanentError) Error() string {
 45    return fmt.Sprintf("permanent failure: %s", e.Reason)
 46}
 47
 48// Retry configuration
 49type RetryConfig struct {
 50    MaxAttempts   int
 51    InitialDelay  time.Duration
 52    MaxDelay      time.Duration
 53    BackoffFactor float64
 54    Timeout       time.Duration
 55}
 56
 57func DefaultRetryConfig() RetryConfig {
 58    return RetryConfig{
 59        MaxAttempts:   3,
 60        InitialDelay:  100 * time.Millisecond,
 61        MaxDelay:      10 * time.Second,
 62        BackoffFactor: 2.0,
 63        Timeout:       30 * time.Second,
 64    }
 65}
 66
 67// Check if error is retryable
 68func isRetryable(err error) bool {
 69    // Check for temporary interface
 70    type temporary interface {
 71        Temporary() bool
 72    }
 73
 74    var tempErr temporary
 75    if errors.As(err, &tempErr) {
 76        return tempErr.Temporary()
 77    }
 78
 79    // Check for specific retryable error types
 80    var retryErr *RetryableError
 81    if errors.As(err, &retryErr) {
 82        return true
 83    }
 84
 85    var tempError *TemporaryError
 86    if errors.As(err, &tempError) {
 87        return true
 88    }
 89
 90    // Check for permanent errors
 91    var permErr *PermanentError
 92    if errors.As(err, &permErr) {
 93        return false
 94    }
 95
 96    // Default: assume retryable
 97    return true
 98}
 99
100// Retry with exponential backoff
101func RetryWithBackoff(ctx context.Context, operation func() error, config RetryConfig) error {
102    var lastErr error
103    delay := config.InitialDelay
104
105    // Create timeout context
106    timeoutCtx, cancel := context.WithTimeout(ctx, config.Timeout)
107    defer cancel()
108
109    for attempt := 1; attempt <= config.MaxAttempts; attempt++ {
110        // Check context cancellation
111        select {
112        case <-timeoutCtx.Done():
113            return fmt.Errorf("operation timeout after %d attempts: %w", attempt-1, timeoutCtx.Err())
114        default:
115        }
116
117        // Execute operation
118        lastErr = operation()
119
120        if lastErr == nil {
121            if attempt > 1 {
122                fmt.Printf("✓ Operation succeeded on attempt %d\n", attempt)
123            }
124            return nil
125        }
126
127        // Check if error is retryable
128        if !isRetryable(lastErr) {
129            fmt.Printf("✗ Non-retryable error on attempt %d: %v\n", attempt, lastErr)
130            return &RetryableError{
131                Attempt:   attempt,
132                Cause:     lastErr,
133                Timestamp: time.Now(),
134            }
135        }
136
137        // Don't sleep after last attempt
138        if attempt == config.MaxAttempts {
139            break
140        }
141
142        // Log retry
143        fmt.Printf("⚠ Attempt %d/%d failed, retrying in %v: %v\n",
144            attempt, config.MaxAttempts, delay, lastErr)
145
146        // Wait with exponential backoff
147        select {
148        case <-time.After(delay):
149            // Continue to next attempt
150        case <-timeoutCtx.Done():
151            return fmt.Errorf("timeout during backoff: %w", timeoutCtx.Err())
152        }
153
154        // Calculate next delay
155        delay = time.Duration(float64(delay) * config.BackoffFactor)
156        if delay > config.MaxDelay {
157            delay = config.MaxDelay
158        }
159    }
160
161    return &RetryableError{
162        Attempt:   config.MaxAttempts,
163        Cause:     lastErr,
164        Timestamp: time.Now(),
165    }
166}
167
168// Simulated operations with different failure patterns
169type Service struct {
170    failureRate float64
171}
172
173func NewService(failureRate float64) *Service {
174    return &Service{failureRate: failureRate}
175}
176
177func (s *Service) TemporaryFailure() error {
178    if rand.Float64() < s.failureRate {
179        return &TemporaryError{Reason: "network timeout"}
180    }
181    return nil
182}
183
184func (s *Service) PermanentFailure() error {
185    if rand.Float64() < s.failureRate {
186        return &PermanentError{Reason: "invalid API key"}
187    }
188    return nil
189}
190
191func (s *Service) RandomFailure() error {
192    r := rand.Float64()
193    if r < s.failureRate/2 {
194        return &TemporaryError{Reason: "connection reset"}
195    } else if r < s.failureRate {
196        return errors.New("unknown error")
197    }
198    return nil
199}
200
201func main() {
202    rand.Seed(time.Now().UnixNano())
203    ctx := context.Background()
204
205    fmt.Println("=== Advanced Error Handling with Retry ===\n")
206
207    // Example 1: Temporary failures with retry
208    fmt.Println("--- Example 1: Temporary Failures (60% failure rate) ---")
209    service1 := NewService(0.6)
210    config := DefaultRetryConfig()
211
212    err := RetryWithBackoff(ctx, service1.TemporaryFailure, config)
213    if err != nil {
214        fmt.Printf("Final error: %v\n", err)
215    } else {
216        fmt.Println("Operation succeeded!")
217    }
218
219    time.Sleep(time.Second)
220
221    // Example 2: Permanent failure (no retry)
222    fmt.Println("\n--- Example 2: Permanent Failure ---")
223    service2 := NewService(1.0) // Always fail
224
225    err = RetryWithBackoff(ctx, service2.PermanentFailure, config)
226    if err != nil {
227        fmt.Printf("Final error: %v\n", err)
228
229        // Check error type
230        var retryErr *RetryableError
231        if errors.As(err, &retryErr) {
232            fmt.Printf("Failed after %d attempts\n", retryErr.Attempt)
233        }
234    }
235
236    time.Sleep(time.Second)
237
238    // Example 3: Timeout scenario
239    fmt.Println("\n--- Example 3: Operation Timeout ---")
240    shortConfig := config
241    shortConfig.Timeout = 500 * time.Millisecond
242    shortConfig.InitialDelay = 200 * time.Millisecond
243
244    service3 := NewService(1.0) // Always fail
245    err = RetryWithBackoff(ctx, service3.TemporaryFailure, shortConfig)
246    if err != nil {
247        fmt.Printf("Final error: %v\n", err)
248    }
249
250    time.Sleep(time.Second)
251
252    // Example 4: Success after retries
253    fmt.Println("\n--- Example 4: Success After Retries (30% failure rate) ---")
254    service4 := NewService(0.3)
255
256    err = RetryWithBackoff(ctx, service4.RandomFailure, config)
257    if err != nil {
258        fmt.Printf("Final error: %v\n", err)
259    } else {
260        fmt.Println("Operation succeeded!")
261    }
262
263    // Example 5: Exponential backoff demonstration
264    fmt.Println("\n--- Example 5: Exponential Backoff Calculation ---")
265    fmt.Println("Demonstrating backoff timing:")
266
267    delay := config.InitialDelay
268    for i := 1; i <= 5; i++ {
269        nextDelay := time.Duration(float64(delay) * config.BackoffFactor)
270        if nextDelay > config.MaxDelay {
271            nextDelay = config.MaxDelay
272        }
273        fmt.Printf("Attempt %d: delay = %v, next = %v\n", i, delay, nextDelay)
274        delay = nextDelay
275    }
276}

What this demonstrates:

  • Sophisticated retry logic with exponential backoff
  • Error classification for retryable vs non-retryable errors
  • Context integration for timeouts and cancellation
  • Temporary error detection using custom interfaces
  • Backoff calculations to prevent overwhelming services
  • Detailed logging of retry attempts and outcomes

Production patterns:

  1. Exponential backoff prevents service overload
  2. Context-aware operations respect timeouts
  3. Error classification guides retry decisions
  4. Comprehensive logging for debugging
  5. Configurable retry parameters

Example 5: Production Error Handling System

Let's build a comprehensive error handling framework for production:

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "encoding/json"
  7    "fmt"
  8    "sync"
  9    "time"
 10)
 11
 12// Error severity levels
 13type Severity int
 14
 15const (
 16    SeverityDebug Severity = iota
 17    SeverityInfo
 18    SeverityWarning
 19    SeverityError
 20    SeverityCritical
 21)
 22
 23func (s Severity) String() string {
 24    return []string{"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}[s]
 25}
 26
 27// Structured error for production systems
 28type ProductionError struct {
 29    ID         string                 `json:"id"`
 30    Timestamp  time.Time              `json:"timestamp"`
 31    Severity   Severity               `json:"severity"`
 32    Service    string                 `json:"service"`
 33    Operation  string                 `json:"operation"`
 34    Message    string                 `json:"message"`
 35    Code       string                 `json:"code"`
 36    Context    map[string]interface{} `json:"context,omitempty"`
 37    Cause      error                  `json:"-"`
 38    UserID     string                 `json:"user_id,omitempty"`
 39    RequestID  string                 `json:"request_id,omitempty"`
 40    StackTrace []string               `json:"stack_trace,omitempty"`
 41}
 42
 43func (e *ProductionError) Error() string {
 44    if e.Cause != nil {
 45        return fmt.Sprintf("[%s] %s: %v", e.Code, e.Message, e.Cause)
 46    }
 47    return fmt.Sprintf("[%s] %s", e.Code, e.Message)
 48}
 49
 50func (e *ProductionError) Unwrap() error {
 51    return e.Cause
 52}
 53
 54// Error tracking system
 55type ErrorTracker struct {
 56    errors  chan *ProductionError
 57    metrics map[string]int64
 58    mu      sync.RWMutex
 59}
 60
 61func NewErrorTracker() *ErrorTracker {
 62    return &ErrorTracker{
 63        errors:  make(chan *ProductionError, 100),
 64        metrics: make(map[string]int64),
 65    }
 66}
 67
 68func (et *ErrorTracker) Track(err *ProductionError) {
 69    select {
 70    case et.errors <- err:
 71    default:
 72        fmt.Printf("Warning: Error tracking queue full, dropping error: %v\n", err)
 73    }
 74}
 75
 76func (et *ErrorTracker) Start(ctx context.Context, wg *sync.WaitGroup) {
 77    defer wg.Done()
 78
 79    for {
 80        select {
 81        case <-ctx.Done():
 82            return
 83        case err := <-et.errors:
 84            et.processError(err)
 85        }
 86    }
 87}
 88
 89func (et *ErrorTracker) processError(err *ProductionError) {
 90    // Update metrics
 91    et.mu.Lock()
 92    et.metrics[err.Code]++
 93    et.metrics["total"]++
 94    if err.Severity >= SeverityError {
 95        et.metrics["error_count"]++
 96    }
 97    et.mu.Unlock()
 98
 99    // Log structured error
100    et.logError(err)
101
102    // Send critical alerts
103    if err.Severity >= SeverityCritical {
104        et.sendAlert(err)
105    }
106}
107
108func (et *ErrorTracker) logError(err *ProductionError) {
109    logEntry := map[string]interface{}{
110        "timestamp": err.Timestamp.Format(time.RFC3339),
111        "severity":  err.Severity.String(),
112        "service":   err.Service,
113        "operation": err.Operation,
114        "message":   err.Message,
115        "code":      err.Code,
116        "error_id":  err.ID,
117    }
118
119    if len(err.Context) > 0 {
120        logEntry["context"] = err.Context
121    }
122
123    if err.UserID != "" {
124        logEntry["user_id"] = err.UserID
125    }
126
127    if err.RequestID != "" {
128        logEntry["request_id"] = err.RequestID
129    }
130
131    jsonData, _ := json.Marshal(logEntry)
132    fmt.Printf("LOG: %s\n", string(jsonData))
133}
134
135func (et *ErrorTracker) sendAlert(err *ProductionError) {
136    fmt.Printf("ALERT: Critical error %s - %s\n", err.ID, err.Message)
137    // In production: send to PagerDuty, Slack, email, etc.
138}
139
140func (et *ErrorTracker) GetMetrics() map[string]int64 {
141    et.mu.RLock()
142    defer et.mu.RUnlock()
143
144    metrics := make(map[string]int64)
145    for k, v := range et.metrics {
146        metrics[k] = v
147    }
148    return metrics
149}
150
151// Service with integrated error handling
152type PaymentService struct {
153    tracker *ErrorTracker
154    mu      sync.Mutex
155}
156
157func NewPaymentService(tracker *ErrorTracker) *PaymentService {
158    return &PaymentService{
159        tracker: tracker,
160    }
161}
162
163func (ps *PaymentService) ProcessPayment(userID string, amount float64, requestID string) error {
164    // Input validation
165    if amount <= 0 {
166        err := &ProductionError{
167            ID:          requestID,
168            Timestamp:   time.Now(),
169            Severity:    SeverityError,
170            Service:     "payment-service",
171            Operation:   "process-payment",
172            Message:     "invalid payment amount",
173            Code:        "INVALID_AMOUNT",
174            UserID:      userID,
175            RequestID:   requestID,
176            Context: map[string]interface{}{
177                "amount": amount,
178            },
179        }
180        ps.tracker.Track(err)
181        return err
182    }
183
184    if amount > 10000 {
185        err := &ProductionError{
186            ID:          requestID,
187            Timestamp:   time.Now(),
188            Severity:    SeverityWarning,
189            Service:     "payment-service",
190            Operation:   "process-payment",
191            Message:     "large payment requires additional verification",
192            Code:        "LARGE_PAYMENT",
193            UserID:      userID,
194            RequestID:   requestID,
195            Context: map[string]interface{}{
196                "amount":    amount,
197                "threshold": 10000,
198            },
199        }
200        ps.tracker.Track(err)
201        return err
202    }
203
204    // Simulate processing (20% failure rate)
205    if rand.Intn(10) < 2 {
206        err := &ProductionError{
207            ID:          requestID,
208            Timestamp:   time.Now(),
209            Severity:    SeverityError,
210            Service:     "payment-service",
211            Operation:   "process-payment",
212            Message:     "payment gateway timeout",
213            Code:        "GATEWAY_TIMEOUT",
214            UserID:      userID,
215            RequestID:   requestID,
216            Context: map[string]interface{}{
217                "amount":  amount,
218                "gateway": "stripe",
219            },
220        }
221        ps.tracker.Track(err)
222        return err
223    }
224
225    fmt.Printf("✓ Payment processed: user=%s, amount=%.2f, request=%s\n",
226        userID, amount, requestID)
227    return nil
228}
229
230func generateRequestID() string {
231    return fmt.Sprintf("req_%d", time.Now().UnixNano())
232}
233
234func main() {
235    rand.Seed(time.Now().UnixNano())
236
237    fmt.Println("=== Production Error Handling System ===\n")
238
239    tracker := NewErrorTracker()
240    ctx, cancel := context.WithCancel(context.Background())
241    defer cancel()
242
243    var wg sync.WaitGroup
244
245    // Start error tracker
246    wg.Add(1)
247    go tracker.Start(ctx, &wg)
248
249    // Give tracker time to start
250    time.Sleep(time.Millisecond * 100)
251
252    paymentService := NewPaymentService(tracker)
253
254    // Test scenarios
255    testCases := []struct {
256        userID string
257        amount float64
258        desc   string
259    }{
260        {"user1", 100.0, "Valid payment"},
261        {"user2", -50.0, "Invalid amount (negative)"},
262        {"user3", 0.0, "Invalid amount (zero)"},
263        {"user4", 15000.0, "Large payment (requires verification)"},
264        {"user5", 200.0, "Valid payment (may fail randomly)"},
265        {"user6", 300.0, "Valid payment (may fail randomly)"},
266        {"user7", 150.0, "Valid payment (may fail randomly)"},
267    }
268
269    fmt.Println("Processing payments...\n")
270
271    for i, tc := range testCases {
272        requestID := generateRequestID()
273        fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
274        fmt.Printf("User: %s, Amount: $%.2f\n", tc.userID, tc.amount)
275
276        err := paymentService.ProcessPayment(tc.userID, tc.amount, requestID)
277        if err != nil {
278            fmt.Printf("Error: %v\n", err)
279        }
280
281        fmt.Println()
282        time.Sleep(time.Millisecond * 100)
283    }
284
285    // Let error tracker process remaining errors
286    time.Sleep(time.Second)
287
288    // Show error metrics
289    fmt.Println("=== Error Metrics ===")
290    metrics := tracker.GetMetrics()
291
292    jsonData, _ := json.MarshalIndent(metrics, "", "  ")
293    fmt.Printf("%s\n", string(jsonData))
294
295    // Shutdown
296    cancel()
297
298    // Wait for error tracker to stop
299    done := make(chan struct{})
300    go func() {
301        wg.Wait()
302        close(done)
303    }()
304
305    select {
306    case <-done:
307        fmt.Println("\nError tracker stopped gracefully")
308    case <-time.After(2 * time.Second):
309        fmt.Println("\nError tracker shutdown timed out")
310    }
311}

What this demonstrates:

  • Comprehensive error tracking with structured data
  • Multi-level severity handling for appropriate alerting
  • Metrics collection for error analysis and monitoring
  • Context preservation across service boundaries
  • Production-ready logging with structured JSON output
  • Graceful shutdown for error tracking system
  • Concurrent error processing with channels

Enterprise patterns implemented:

  1. Structured error IDs for correlation
  2. Severity-based routing and alerting
  3. Real-time metrics collection
  4. Context-rich error information
  5. Background error processing
  6. Integration hooks for external systems

Common Pitfalls and How to Avoid Them

Pitfall 1: Silent Failures

The most dangerous error handling mistake:

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "os"
 7)
 8
 9// ❌ WRONG: Silent failures that hide problems
10func badFileRead(filename string) string {
11    data, _ := os.ReadFile(filename) // Error ignored!
12    return string(data)
13}
14
15// ✅ CORRECT: Proper error handling
16func goodFileRead(filename string) (string, error) {
17    data, err := os.ReadFile(filename)
18    if err != nil {
19        return "", fmt.Errorf("failed to read file %s: %w", filename, err)
20    }
21    return string(data), nil
22}
23
24func main() {
25    fmt.Println("=== Silent Failure Pitfall ===\n")
26
27    // Bad example
28    fmt.Println("--- Bad Example (error ignored) ---")
29    result1 := badFileRead("nonexistent.txt")
30    fmt.Printf("Result: '%s' (empty because error was ignored)\n", result1)
31
32    // Good example
33    fmt.Println("\n--- Good Example (error handled) ---")
34    result2, err := goodFileRead("nonexistent.txt")
35    if err != nil {
36        fmt.Printf("Error: %v\n", err)
37    } else {
38        fmt.Printf("Result: %s\n", result2)
39    }
40}

Pitfall 2: Losing Context

 1// run
 2package main
 3
 4import (
 5    "errors"
 6    "fmt"
 7)
 8
 9// ❌ WRONG: Context lost in error chain
10func badProcessing(input string) error {
11    if input == "" {
12        return errors.New("invalid input") // What input? Where?
13    }
14
15    if len(input) > 1000 {
16        return errors.New("input too long") // How long?
17    }
18
19    return nil
20}
21
22// ✅ CORRECT: Preserve context throughout error chain
23func goodProcessing(operation string, input string) error {
24    if input == "" {
25        return fmt.Errorf("%s failed: input cannot be empty", operation)
26    }
27
28    if len(input) > 1000 {
29        return fmt.Errorf("%s failed: input too long (%d chars, max 1000)",
30            operation, len(input))
31    }
32
33    return nil
34}
35
36func main() {
37    fmt.Println("=== Context Loss Pitfall ===\n")
38
39    // Bad example
40    fmt.Println("--- Bad Example (no context) ---")
41    err1 := badProcessing("")
42    fmt.Printf("Error: %v (no context about what or where)\n", err1)
43
44    // Good example
45    fmt.Println("\n--- Good Example (with context) ---")
46    err2 := goodProcessing("user validation", "")
47    fmt.Printf("Error: %v (clear context)\n", err2)
48
49    err3 := goodProcessing("data import", string(make([]byte, 2000)))
50    fmt.Printf("Error: %v (includes details)\n", err3)
51}

Pitfall 3: Panic for Recoverable Errors

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6)
 7
 8// ❌ WRONG: Panic for recoverable errors
 9func badDivision(a, b float64) float64 {
10    if b == 0 {
11        panic("division by zero") // Should return error!
12    }
13    return a / b
14}
15
16// ✅ CORRECT: Return errors for recoverable conditions
17func goodDivision(a, b float64) (float64, error) {
18    if b == 0 {
19        return 0, fmt.Errorf("division by zero: %.2f / %.2f", a, b)
20    }
21    return a / b, nil
22}
23
24func main() {
25    fmt.Println("=== Panic vs Error Pitfall ===\n")
26
27    // Bad example (wrapped in recover to prevent crash)
28    fmt.Println("--- Bad Example (panics) ---")
29    func() {
30        defer func() {
31            if r := recover(); r != nil {
32                fmt.Printf("Recovered from panic: %v\n", r)
33                fmt.Println("(This crashes the program without recover)")
34            }
35        }()
36        result := badDivision(10, 0)
37        fmt.Printf("Result: %.2f\n", result)
38    }()
39
40    // Good example
41    fmt.Println("\n--- Good Example (returns error) ---")
42    result, err := goodDivision(10, 0)
43    if err != nil {
44        fmt.Printf("Error: %v (handled gracefully)\n", err)
45    } else {
46        fmt.Printf("Result: %.2f\n", result)
47    }
48
49    // Successful operation
50    result, err = goodDivision(10, 2)
51    if err != nil {
52        fmt.Printf("Error: %v\n", err)
53    } else {
54        fmt.Printf("Success: 10 / 2 = %.2f\n", result)
55    }
56}

Practice Exercises

Exercise 1: Basic Error Handling

Learning Objectives: Master fundamental error handling patterns including error creation, checking, and context addition.

Difficulty: Beginner

Real-World Context: Input validation is critical in all applications. This exercise teaches you to provide clear, actionable error messages that help users correct their input.

Task: Create a user registration function that validates:

  1. Username (not empty, 3-20 characters, alphanumeric)
  2. Email (not empty, contains @ and .)
  3. Age (not empty, valid number, 18-100)
  4. Return descriptive errors for each validation failure

Requirements:

  • Create separate validation functions for each field
  • Use error wrapping to preserve context
  • Provide clear error messages
  • Test with valid and invalid inputs
Show Solution
  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "regexp"
  8    "strconv"
  9    "strings"
 10)
 11
 12var (
 13    ErrInvalidUsername = errors.New("invalid username")
 14    ErrInvalidEmail    = errors.New("invalid email")
 15    ErrInvalidAge      = errors.New("invalid age")
 16)
 17
 18func validateUsername(username string) error {
 19    if username == "" {
 20        return fmt.Errorf("%w: cannot be empty", ErrInvalidUsername)
 21    }
 22
 23    if len(username) < 3 || len(username) > 20 {
 24        return fmt.Errorf("%w: must be 3-20 characters (got %d)",
 25            ErrInvalidUsername, len(username))
 26    }
 27
 28    matched, _ := regexp.MatchString("^[a-zA-Z0-9]+$", username)
 29    if !matched {
 30        return fmt.Errorf("%w: must be alphanumeric only", ErrInvalidUsername)
 31    }
 32
 33    return nil
 34}
 35
 36func validateEmail(email string) error {
 37    if email == "" {
 38        return fmt.Errorf("%w: cannot be empty", ErrInvalidEmail)
 39    }
 40
 41    if !strings.Contains(email, "@") || !strings.Contains(email, ".") {
 42        return fmt.Errorf("%w: must contain @ and .", ErrInvalidEmail)
 43    }
 44
 45    parts := strings.Split(email, "@")
 46    if len(parts) != 2 {
 47        return fmt.Errorf("%w: invalid format", ErrInvalidEmail)
 48    }
 49
 50    if len(parts[0]) == 0 || len(parts[1]) == 0 {
 51        return fmt.Errorf("%w: missing local or domain part", ErrInvalidEmail)
 52    }
 53
 54    return nil
 55}
 56
 57func validateAge(ageStr string) (int, error) {
 58    if ageStr == "" {
 59        return 0, fmt.Errorf("%w: cannot be empty", ErrInvalidAge)
 60    }
 61
 62    age, err := strconv.Atoi(strings.TrimSpace(ageStr))
 63    if err != nil {
 64        return 0, fmt.Errorf("%w: must be a valid number: %v", ErrInvalidAge, err)
 65    }
 66
 67    if age < 18 {
 68        return 0, fmt.Errorf("%w: must be at least 18 (got %d)", ErrInvalidAge, age)
 69    }
 70
 71    if age > 100 {
 72        return 0, fmt.Errorf("%w: must be at most 100 (got %d)", ErrInvalidAge, age)
 73    }
 74
 75    return age, nil
 76}
 77
 78func registerUser(username, email, ageStr string) error {
 79    // Validate username
 80    if err := validateUsername(username); err != nil {
 81        return fmt.Errorf("registration failed: %w", err)
 82    }
 83
 84    // Validate email
 85    if err := validateEmail(email); err != nil {
 86        return fmt.Errorf("registration failed: %w", err)
 87    }
 88
 89    // Validate age
 90    age, err := validateAge(ageStr)
 91    if err != nil {
 92        return fmt.Errorf("registration failed: %w", err)
 93    }
 94
 95    fmt.Printf("✓ Successfully registered: %s (%s), age %d\n", username, email, age)
 96    return nil
 97}
 98
 99func main() {
100    fmt.Println("=== User Registration Validation ===\n")
101
102    testCases := []struct {
103        username string
104        email    string
105        age      string
106        desc     string
107    }{
108        {"alice", "alice@example.com", "25", "Valid user"},
109        {"", "bob@example.com", "30", "Empty username"},
110        {"ab", "charlie@example.com", "22", "Username too short"},
111        {"verylongusernamethatexceedslimit", "diana@example.com", "28", "Username too long"},
112        {"user@123", "eve@example.com", "35", "Username with special chars"},
113        {"frank", "", "40", "Empty email"},
114        {"grace", "invalidemail", "45", "Invalid email format"},
115        {"henry", "henry@example.com", "", "Empty age"},
116        {"iris", "iris@example.com", "invalid", "Invalid age format"},
117        {"jack", "jack@example.com", "15", "Age too young"},
118        {"kate", "kate@example.com", "150", "Age too old"},
119        {"leo", "leo@example.com", "30", "Valid user"},
120    }
121
122    for i, tc := range testCases {
123        fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
124        fmt.Printf("Input: username=%s, email=%s, age=%s\n", tc.username, tc.email, tc.age)
125
126        err := registerUser(tc.username, tc.email, tc.age)
127        if err != nil {
128            fmt.Printf("✗ Error: %v\n", err)
129
130            // Check error types
131            if errors.Is(err, ErrInvalidUsername) {
132                fmt.Println("  Type: Username validation error")
133            } else if errors.Is(err, ErrInvalidEmail) {
134                fmt.Println("  Type: Email validation error")
135            } else if errors.Is(err, ErrInvalidAge) {
136                fmt.Println("  Type: Age validation error")
137            }
138        }
139
140        fmt.Println()
141    }
142}

Key Concepts:

  • Sentinel errors for error type checking
  • Error wrapping with %w
  • Clear, actionable error messages
  • Error inspection with errors.Is()
  • Input validation patterns

Exercise 2: Custom Error Types

Learning Objectives: Create custom error types with rich context and implement error type assertions.

Difficulty: Intermediate

Real-World Context: APIs need to return structured errors with HTTP status codes, error codes, and detailed information. This exercise demonstrates building production-ready error types.

Task: Build an API error system that:

  1. Defines custom error types for different HTTP status codes
  2. Includes error codes, messages, and status codes
  3. Implements retryability checking
  4. Provides structured error information

Requirements:

  • Create at least 4 different error types (400, 401, 404, 500)
  • Include methods for accessing error properties
  • Implement error wrapping
  • Add detailed context to errors
Show Solution
  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "time"
  8)
  9
 10// Base API error type
 11type APIError struct {
 12    StatusCode int
 13    Code       string
 14    Message    string
 15    Retryable  bool
 16    Timestamp  time.Time
 17    Details    map[string]interface{}
 18}
 19
 20func (e *APIError) Error() string {
 21    return fmt.Sprintf("[HTTP %d] %s: %s", e.StatusCode, e.Code, e.Message)
 22}
 23
 24// Specific error types
 25type BadRequestError struct {
 26    *APIError
 27}
 28
 29func NewBadRequestError(code, message string, details map[string]interface{}) *BadRequestError {
 30    return &BadRequestError{
 31        APIError: &APIError{
 32            StatusCode: 400,
 33            Code:       code,
 34            Message:    message,
 35            Retryable:  false,
 36            Timestamp:  time.Now(),
 37            Details:    details,
 38        },
 39    }
 40}
 41
 42type UnauthorizedError struct {
 43    *APIError
 44}
 45
 46func NewUnauthorizedError(message string) *UnauthorizedError {
 47    return &UnauthorizedError{
 48        APIError: &APIError{
 49            StatusCode: 401,
 50            Code:       "UNAUTHORIZED",
 51            Message:    message,
 52            Retryable:  false,
 53            Timestamp:  time.Now(),
 54        },
 55    }
 56}
 57
 58type NotFoundError struct {
 59    *APIError
 60    ResourceType string
 61    ResourceID   string
 62}
 63
 64func NewNotFoundError(resourceType, resourceID string) *NotFoundError {
 65    return &NotFoundError{
 66        APIError: &APIError{
 67            StatusCode: 404,
 68            Code:       "NOT_FOUND",
 69            Message:    fmt.Sprintf("%s not found: %s", resourceType, resourceID),
 70            Retryable:  false,
 71            Timestamp:  time.Now(),
 72            Details: map[string]interface{}{
 73                "resource_type": resourceType,
 74                "resource_id":   resourceID,
 75            },
 76        },
 77        ResourceType: resourceType,
 78        ResourceID:   resourceID,
 79    }
 80}
 81
 82type InternalServerError struct {
 83    *APIError
 84    Cause error
 85}
 86
 87func NewInternalServerError(message string, cause error) *InternalServerError {
 88    return &InternalServerError{
 89        APIError: &APIError{
 90            StatusCode: 500,
 91            Code:       "INTERNAL_ERROR",
 92            Message:    message,
 93            Retryable:  true,
 94            Timestamp:  time.Now(),
 95        },
 96        Cause: cause,
 97    }
 98}
 99
100func (e *InternalServerError) Unwrap() error {
101    return e.Cause
102}
103
104// API service
105type UserAPI struct {
106    users map[string]string
107}
108
109func NewUserAPI() *UserAPI {
110    return &UserAPI{
111        users: map[string]string{
112            "1": "Alice",
113            "2": "Bob",
114        },
115    }
116}
117
118func (api *UserAPI) GetUser(id, token string) (string, error) {
119    // Check authentication
120    if token == "" {
121        return "", NewUnauthorizedError("missing authentication token")
122    }
123
124    if token != "valid-token" {
125        return "", NewUnauthorizedError("invalid authentication token")
126    }
127
128    // Validate input
129    if id == "" {
130        return "", NewBadRequestError(
131            "INVALID_ID",
132            "user ID cannot be empty",
133            map[string]interface{}{
134                "field": "id",
135                "value": id,
136            },
137        )
138    }
139
140    // Check if user exists
141    name, exists := api.users[id]
142    if !exists {
143        return "", NewNotFoundError("User", id)
144    }
145
146    // Simulate internal error
147    if id == "999" {
148        return "", NewInternalServerError(
149            "database connection failed",
150            errors.New("connection timeout"),
151        )
152    }
153
154    return name, nil
155}
156
157func handleAPIError(err error) {
158    fmt.Println("\n=== Error Details ===")
159
160    // Try specific error types
161    var badReq *BadRequestError
162    var unauth *UnauthorizedError
163    var notFound *NotFoundError
164    var internal *InternalServerError
165
166    switch {
167    case errors.As(err, &badReq):
168        fmt.Printf("Type: Bad Request\n")
169        fmt.Printf("Status: %d\n", badReq.StatusCode)
170        fmt.Printf("Code: %s\n", badReq.Code)
171        fmt.Printf("Message: %s\n", badReq.Message)
172        fmt.Printf("Retryable: %v\n", badReq.Retryable)
173        if len(badReq.Details) > 0 {
174            fmt.Printf("Details: %v\n", badReq.Details)
175        }
176
177    case errors.As(err, &unauth):
178        fmt.Printf("Type: Unauthorized\n")
179        fmt.Printf("Status: %d\n", unauth.StatusCode)
180        fmt.Printf("Message: %s\n", unauth.Message)
181        fmt.Println("Action: Provide valid authentication")
182
183    case errors.As(err, &notFound):
184        fmt.Printf("Type: Not Found\n")
185        fmt.Printf("Status: %d\n", notFound.StatusCode)
186        fmt.Printf("Resource: %s (ID: %s)\n", notFound.ResourceType, notFound.ResourceID)
187        fmt.Println("Action: Check resource identifier")
188
189    case errors.As(err, &internal):
190        fmt.Printf("Type: Internal Server Error\n")
191        fmt.Printf("Status: %d\n", internal.StatusCode)
192        fmt.Printf("Message: %s\n", internal.Message)
193        fmt.Printf("Retryable: %v\n", internal.Retryable)
194        if internal.Cause != nil {
195            fmt.Printf("Cause: %v\n", internal.Cause)
196        }
197        fmt.Println("Action: Retry operation")
198
199    default:
200        fmt.Printf("Unknown error: %v\n", err)
201    }
202}
203
204func main() {
205    fmt.Println("=== API Error Types Demo ===")
206
207    api := NewUserAPI()
208
209    testCases := []struct {
210        id    string
211        token string
212        desc  string
213    }{
214        {"1", "valid-token", "Valid request"},
215        {"1", "", "Missing token"},
216        {"1", "invalid", "Invalid token"},
217        {"", "valid-token", "Empty ID"},
218        {"999", "valid-token", "Internal error"},
219        {"nonexistent", "valid-token", "User not found"},
220    }
221
222    for i, tc := range testCases {
223        fmt.Printf("\n--- Test %d: %s ---\n", i+1, tc.desc)
224        fmt.Printf("Request: GET /users/%s (token: %s)\n", tc.id, tc.token)
225
226        name, err := api.GetUser(tc.id, tc.token)
227        if err != nil {
228            handleAPIError(err)
229        } else {
230            fmt.Printf("\n✓ Success: User found: %s\n", name)
231        }
232    }
233}

Key Concepts:

  • Custom error types with embedded base type
  • HTTP status code mapping
  • Error type assertions with errors.As()
  • Retryability flags for error handling
  • Structured error details

Exercise 3: Error Wrapping Chain

Learning Objectives: Master error wrapping through multiple application layers and implement error chain inspection.

Difficulty: Intermediate

Real-World Context: Multi-tier applications need to preserve context as errors propagate from database → repository → service → controller. This exercise demonstrates layered error handling.

Task: Build a 3-layer application (database, repository, service) where:

  1. Each layer wraps errors with its own context
  2. Sentinel errors are defined at the database layer
  3. Error inspection works through the entire chain
  4. Error messages build a complete story

Requirements:

  • Implement 3 distinct layers
  • Use error wrapping with %w
  • Create sentinel errors for common cases
  • Test error chain inspection
Show Solution
  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7)
  8
  9// Sentinel errors at database layer
 10var (
 11    ErrNotFound      = errors.New("record not found")
 12    ErrDuplicateKey  = errors.New("duplicate key")
 13    ErrConnection    = errors.New("database connection failed")
 14)
 15
 16// Database layer
 17type Database struct {
 18    records map[string]string
 19}
 20
 21func NewDatabase() *Database {
 22    return &Database{
 23        records: map[string]string{
 24            "1": "Alice",
 25            "2": "Bob",
 26        },
 27    }
 28}
 29
 30func (db *Database) Get(id string) (string, error) {
 31    record, exists := db.records[id]
 32    if !exists {
 33        return "", fmt.Errorf("database: %w (id: %s)", ErrNotFound, id)
 34    }
 35    return record, nil
 36}
 37
 38func (db *Database) Insert(id, name string) error {
 39    if _, exists := db.records[id]; exists {
 40        return fmt.Errorf("database: %w (id: %s)", ErrDuplicateKey, id)
 41    }
 42    db.records[id] = name
 43    return nil
 44}
 45
 46// Repository layer
 47type UserRepository struct {
 48    db *Database
 49}
 50
 51func NewUserRepository(db *Database) *UserRepository {
 52    return &UserRepository{db: db}
 53}
 54
 55func (r *UserRepository) FindByID(id string) (string, error) {
 56    name, err := r.db.Get(id)
 57    if err != nil {
 58        return "", fmt.Errorf("repository: failed to find user %s: %w", id, err)
 59    }
 60    return name, nil
 61}
 62
 63func (r *UserRepository) Create(id, name string) error {
 64    if err := r.db.Insert(id, name); err != nil {
 65        return fmt.Errorf("repository: failed to create user %s: %w", id, err)
 66    }
 67    return nil
 68}
 69
 70// Service layer
 71type UserService struct {
 72    repo *UserRepository
 73}
 74
 75func NewUserService(repo *UserRepository) *UserService {
 76    return &UserService{repo: repo}
 77}
 78
 79func (s *UserService) GetUser(id string) (string, error) {
 80    if id == "" {
 81        return "", fmt.Errorf("service: invalid user ID")
 82    }
 83
 84    name, err := s.repo.FindByID(id)
 85    if err != nil {
 86        return "", fmt.Errorf("service: failed to get user: %w", err)
 87    }
 88
 89    return name, nil
 90}
 91
 92func (s *UserService) CreateUser(id, name string) error {
 93    if id == "" || name == "" {
 94        return fmt.Errorf("service: invalid input (id: %s, name: %s)", id, name)
 95    }
 96
 97    if err := s.repo.Create(id, name); err != nil {
 98        return fmt.Errorf("service: failed to create user: %w", err)
 99    }
100
101    return nil
102}
103
104// Error analysis
105func analyzeError(err error) {
106    fmt.Println("\n=== Error Analysis ===")
107    fmt.Printf("Full error message:\n%v\n\n", err)
108
109    // Check for sentinel errors
110    checks := map[string]error{
111        "ErrNotFound":     ErrNotFound,
112        "ErrDuplicateKey": ErrDuplicateKey,
113        "ErrConnection":   ErrConnection,
114    }
115
116    fmt.Println("Sentinel error checks:")
117    for name, sentinel := range checks {
118        if errors.Is(err, sentinel) {
119            fmt.Printf("✓ Error chain contains %s\n", name)
120        }
121    }
122
123    // Unwrap the error chain
124    fmt.Println("\nError chain (from outermost to innermost):")
125    currentErr := err
126    depth := 0
127    for currentErr != nil {
128        indent := ""
129        for i := 0; i < depth; i++ {
130            indent += "  "
131        }
132        fmt.Printf("%s%d: %v\n", indent, depth, currentErr)
133        currentErr = errors.Unwrap(currentErr)
134        depth++
135    }
136}
137
138func main() {
139    fmt.Println("=== Error Wrapping Chain Demo ===")
140
141    db := NewDatabase()
142    repo := NewUserRepository(db)
143    service := NewUserService(repo)
144
145    // Test 1: Successful operation
146    fmt.Println("\n--- Test 1: Successful GetUser ---")
147    name, err := service.GetUser("1")
148    if err != nil {
149        analyzeError(err)
150    } else {
151        fmt.Printf("✓ Success: Found user: %s\n", name)
152    }
153
154    // Test 2: Not found error (wrapped through all layers)
155    fmt.Println("\n--- Test 2: User Not Found ---")
156    name, err = service.GetUser("999")
157    if err != nil {
158        analyzeError(err)
159
160        // Demonstrate error-based logic
161        if errors.Is(err, ErrNotFound) {
162            fmt.Println("\nAction: Could return HTTP 404")
163        }
164    }
165
166    // Test 3: Duplicate key error
167    fmt.Println("\n--- Test 3: Duplicate Key ---")
168    err = service.CreateUser("1", "Charlie") // ID 1 already exists
169    if err != nil {
170        analyzeError(err)
171
172        if errors.Is(err, ErrDuplicateKey) {
173            fmt.Println("\nAction: Could return HTTP 409 Conflict")
174        }
175    }
176
177    // Test 4: Invalid input (no sentinel error)
178    fmt.Println("\n--- Test 4: Invalid Input ---")
179    err = service.CreateUser("", "")
180    if err != nil {
181        analyzeError(err)
182        fmt.Println("\nAction: Could return HTTP 400 Bad Request")
183    }
184
185    // Test 5: Successful create
186    fmt.Println("\n--- Test 5: Successful CreateUser ---")
187    err = service.CreateUser("3", "Charlie")
188    if err != nil {
189        analyzeError(err)
190    } else {
191        fmt.Println("✓ Success: User created")
192
193        // Verify
194        name, _ = service.GetUser("3")
195        fmt.Printf("✓ Verification: Found user: %s\n", name)
196    }
197}

Key Concepts:

  • Multi-layer error wrapping
  • Sentinel errors for common cases
  • Error chain preservation
  • Error inspection with errors.Is()
  • Context-rich error messages
  • Layer-specific error handling

Exercise 4: Retry Logic with Error Classification

Learning Objectives: Implement retry logic that classifies errors as retryable or permanent, with exponential backoff.

Difficulty: Advanced

Real-World Context: External API calls, database operations, and network requests often fail temporarily. This exercise teaches you to build resilient systems that retry transient failures while failing fast on permanent errors.

Task: Build a retry system that:

  1. Classifies errors as temporary or permanent
  2. Implements exponential backoff for retries
  3. Respects context timeouts
  4. Tracks retry attempts and timing
  5. Provides detailed logging

Requirements:

  • Support configurable retry parameters
  • Implement exponential backoff calculation
  • Handle context cancellation
  • Track success/failure statistics
Show Solution
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "errors"
  7    "fmt"
  8    "math/rand"
  9    "time"
 10)
 11
 12// Error types
 13type TemporaryError struct {
 14    Msg string
 15}
 16
 17func (e *TemporaryError) Error() string {
 18    return fmt.Sprintf("temporary: %s", e.Msg)
 19}
 20
 21func (e *TemporaryError) Temporary() bool {
 22    return true
 23}
 24
 25type PermanentError struct {
 26    Msg string
 27}
 28
 29func (e *PermanentError) Error() string {
 30    return fmt.Sprintf("permanent: %s", e.Msg)
 31}
 32
 33// Retry configuration
 34type RetryConfig struct {
 35    MaxAttempts   int
 36    InitialDelay  time.Duration
 37    MaxDelay      time.Duration
 38    Multiplier    float64
 39}
 40
 41func DefaultRetryConfig() RetryConfig {
 42    return RetryConfig{
 43        MaxAttempts:  5,
 44        InitialDelay: 100 * time.Millisecond,
 45        MaxDelay:     5 * time.Second,
 46        Multiplier:   2.0,
 47    }
 48}
 49
 50// Retry statistics
 51type RetryStats struct {
 52    TotalAttempts int
 53    Successes     int
 54    Failures      int
 55    TotalDelay    time.Duration
 56}
 57
 58// Check if error is retryable
 59func isRetryable(err error) bool {
 60    type temporary interface {
 61        Temporary() bool
 62    }
 63
 64    var tempErr temporary
 65    if errors.As(err, &tempErr) && tempErr.Temporary() {
 66        return true
 67    }
 68
 69    var permErr *PermanentError
 70    if errors.As(err, &permErr) {
 71        return false
 72    }
 73
 74    // Default: assume retryable
 75    return true
 76}
 77
 78// Retry with exponential backoff
 79func RetryWithBackoff(ctx context.Context, operation func() error, config RetryConfig) (*RetryStats, error) {
 80    stats := &RetryStats{}
 81    delay := config.InitialDelay
 82
 83    for attempt := 1; attempt <= config.MaxAttempts; attempt++ {
 84        stats.TotalAttempts++
 85
 86        // Execute operation
 87        startTime := time.Now()
 88        err := operation()
 89        elapsed := time.Since(startTime)
 90
 91        if err == nil {
 92            stats.Successes++
 93            if attempt > 1 {
 94                fmt.Printf("✓ Success on attempt %d/%d (took %v)\n",
 95                    attempt, config.MaxAttempts, elapsed)
 96            }
 97            return stats, nil
 98        }
 99
100        // Check if error is retryable
101        if !isRetryable(err) {
102            stats.Failures++
103            fmt.Printf("✗ Permanent error on attempt %d: %v\n", attempt, err)
104            return stats, err
105        }
106
107        // Check if this was the last attempt
108        if attempt == config.MaxAttempts {
109            stats.Failures++
110            fmt.Printf("✗ All %d attempts exhausted: %v\n", config.MaxAttempts, err)
111            return stats, fmt.Errorf("max retries exceeded: %w", err)
112        }
113
114        // Log retry
115        fmt.Printf("⚠ Attempt %d/%d failed, retrying in %v: %v\n",
116            attempt, config.MaxAttempts, delay, err)
117
118        // Wait with exponential backoff
119        select {
120        case <-time.After(delay):
121            stats.TotalDelay += delay
122        case <-ctx.Done():
123            stats.Failures++
124            return stats, fmt.Errorf("context cancelled during retry: %w", ctx.Err())
125        }
126
127        // Calculate next delay
128        nextDelay := time.Duration(float64(delay) * config.Multiplier)
129        if nextDelay > config.MaxDelay {
130            nextDelay = config.MaxDelay
131        }
132        delay = nextDelay
133    }
134
135    stats.Failures++
136    return stats, errors.New("retry logic error: should not reach here")
137}
138
139// Simulated operations
140func simulatedOperation(failureRate float64, failureType string) func() error {
141    attempts := 0
142
143    return func() error {
144        attempts++
145
146        if rand.Float64() < failureRate {
147            switch failureType {
148            case "temporary":
149                return &TemporaryError{Msg: fmt.Sprintf("network timeout (attempt %d)", attempts)}
150            case "permanent":
151                return &PermanentError{Msg: "authentication failed"}
152            default:
153                return fmt.Errorf("unknown error (attempt %d)", attempts)
154            }
155        }
156
157        return nil
158    }
159}
160
161func main() {
162    rand.Seed(time.Now().UnixNano())
163    ctx := context.Background()
164
165    fmt.Println("=== Retry Logic with Error Classification ===\n")
166
167    config := DefaultRetryConfig()
168
169    // Test 1: Temporary failures with eventual success
170    fmt.Println("--- Test 1: Temporary Failures (50% rate) ---")
171    op1 := simulatedOperation(0.5, "temporary")
172    stats1, err := RetryWithBackoff(ctx, op1, config)
173
174    fmt.Printf("\nStatistics:\n")
175    fmt.Printf("  Total attempts: %d\n", stats1.TotalAttempts)
176    fmt.Printf("  Successes: %d\n", stats1.Successes)
177    fmt.Printf("  Failures: %d\n", stats1.Failures)
178    fmt.Printf("  Total delay: %v\n", stats1.TotalDelay)
179    if err != nil {
180        fmt.Printf("  Final error: %v\n", err)
181    }
182
183    time.Sleep(time.Second)
184
185    // Test 2: Permanent failure (immediate stop)
186    fmt.Println("\n--- Test 2: Permanent Failure ---")
187    op2 := simulatedOperation(1.0, "permanent")
188    stats2, err := RetryWithBackoff(ctx, op2, config)
189
190    fmt.Printf("\nStatistics:\n")
191    fmt.Printf("  Total attempts: %d\n", stats2.TotalAttempts)
192    fmt.Printf("  Failures: %d\n", stats2.Failures)
193    fmt.Printf("  Final error: %v\n", err)
194
195    time.Sleep(time.Second)
196
197    // Test 3: All attempts fail
198    fmt.Println("\n--- Test 3: All Attempts Fail (100% temporary failure) ---")
199    op3 := simulatedOperation(1.0, "temporary")
200    stats3, err := RetryWithBackoff(ctx, op3, config)
201
202    fmt.Printf("\nStatistics:\n")
203    fmt.Printf("  Total attempts: %d\n", stats3.TotalAttempts)
204    fmt.Printf("  Failures: %d\n", stats3.Failures)
205    fmt.Printf("  Total delay: %v\n", stats3.TotalDelay)
206    fmt.Printf("  Final error: %v\n", err)
207
208    time.Sleep(time.Second)
209
210    // Test 4: Context timeout
211    fmt.Println("\n--- Test 4: Context Timeout ---")
212    timeoutCtx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
213    defer cancel()
214
215    op4 := simulatedOperation(1.0, "temporary")
216    stats4, err := RetryWithBackoff(timeoutCtx, op4, config)
217
218    fmt.Printf("\nStatistics:\n")
219    fmt.Printf("  Total attempts: %d\n", stats4.TotalAttempts)
220    fmt.Printf("  Failures: %d\n", stats4.Failures)
221    fmt.Printf("  Total delay: %v\n", stats4.TotalDelay)
222    fmt.Printf("  Final error: %v\n", err)
223
224    // Test 5: Demonstrate backoff calculation
225    fmt.Println("\n--- Test 5: Backoff Timing Demo ---")
226    fmt.Println("Demonstrating exponential backoff delays:")
227
228    currentDelay := config.InitialDelay
229    for i := 1; i <= config.MaxAttempts; i++ {
230        fmt.Printf("Attempt %d: delay = %v\n", i, currentDelay)
231
232        nextDelay := time.Duration(float64(currentDelay) * config.Multiplier)
233        if nextDelay > config.MaxDelay {
234            nextDelay = config.MaxDelay
235        }
236        currentDelay = nextDelay
237    }
238}

Key Concepts:

  • Error classification (temporary vs permanent)
  • Exponential backoff algorithm
  • Context-aware retry logic
  • Retry statistics tracking
  • Configurable retry behavior
  • Early termination on permanent errors

Exercise 5: Comprehensive Error Monitoring System

Learning Objectives: Build a production-ready error monitoring system with structured logging, metrics collection, and alerting.

Difficulty: Advanced

Real-World Context: Production systems need comprehensive error tracking for debugging, monitoring, and alerting. This exercise demonstrates building an enterprise-grade error monitoring system.

Task: Implement an error monitoring system that:

  1. Tracks all errors with structured metadata
  2. Collects metrics (error counts, rates, types)
  3. Implements severity-based handling
  4. Provides error analytics and reporting
  5. Simulates alerting for critical errors

Requirements:

  • Structured error logging with JSON
  • Real-time metrics collection
  • Severity-based routing
  • Background error processing
  • Graceful shutdown handling
Show Solution
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "encoding/json"
  7    "fmt"
  8    "math/rand"
  9    "sync"
 10    "time"
 11)
 12
 13// Severity levels
 14type Severity int
 15
 16const (
 17    SeverityDebug Severity = iota
 18    SeverityInfo
 19    SeverityWarning
 20    SeverityError
 21    SeverityCritical
 22)
 23
 24func (s Severity) String() string {
 25    return []string{"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}[s]
 26}
 27
 28// Structured error
 29type MonitoredError struct {
 30    ID        string                 `json:"id"`
 31    Timestamp time.Time              `json:"timestamp"`
 32    Severity  Severity               `json:"severity"`
 33    Service   string                 `json:"service"`
 34    Operation string                 `json:"operation"`
 35    Message   string                 `json:"message"`
 36    Code      string                 `json:"code"`
 37    Context   map[string]interface{} `json:"context,omitempty"`
 38    UserID    string                 `json:"user_id,omitempty"`
 39}
 40
 41// Error monitoring system
 42type ErrorMonitor struct {
 43    errors  chan *MonitoredError
 44    metrics map[string]int64
 45    mu      sync.RWMutex
 46}
 47
 48func NewErrorMonitor() *ErrorMonitor {
 49    return &ErrorMonitor{
 50        errors:  make(chan *MonitoredError, 100),
 51        metrics: make(map[string]int64),
 52    }
 53}
 54
 55func (em *ErrorMonitor) Track(err *MonitoredError) {
 56    select {
 57    case em.errors <- err:
 58    default:
 59        fmt.Printf("Warning: Error queue full, dropping error %s\n", err.ID)
 60    }
 61}
 62
 63func (em *ErrorMonitor) Start(ctx context.Context, wg *sync.WaitGroup) {
 64    defer wg.Done()
 65
 66    for {
 67        select {
 68        case <-ctx.Done():
 69            fmt.Println("Error monitor shutting down...")
 70            return
 71        case err := <-em.errors:
 72            em.processError(err)
 73        }
 74    }
 75}
 76
 77func (em *ErrorMonitor) processError(err *MonitoredError) {
 78    // Update metrics
 79    em.mu.Lock()
 80    em.metrics["total"]++
 81    em.metrics[err.Code]++
 82    em.metrics[err.Severity.String()]++
 83    em.mu.Unlock()
 84
 85    // Log error
 86    em.logError(err)
 87
 88    // Alert on critical errors
 89    if err.Severity >= SeverityCritical {
 90        em.sendAlert(err)
 91    }
 92}
 93
 94func (em *ErrorMonitor) logError(err *MonitoredError) {
 95    logEntry := map[string]interface{}{
 96        "id":        err.ID,
 97        "timestamp": err.Timestamp.Format(time.RFC3339),
 98        "severity":  err.Severity.String(),
 99        "service":   err.Service,
100        "operation": err.Operation,
101        "message":   err.Message,
102        "code":      err.Code,
103    }
104
105    if err.Context != nil {
106        logEntry["context"] = err.Context
107    }
108
109    if err.UserID != "" {
110        logEntry["user_id"] = err.UserID
111    }
112
113    jsonData, _ := json.Marshal(logEntry)
114    fmt.Printf("LOG: %s\n", string(jsonData))
115}
116
117func (em *ErrorMonitor) sendAlert(err *MonitoredError) {
118    fmt.Printf("ALERT: [%s] %s - %s (Error ID: %s)\n",
119        err.Severity.String(), err.Service, err.Message, err.ID)
120}
121
122func (em *ErrorMonitor) GetMetrics() map[string]int64 {
123    em.mu.RLock()
124    defer em.mu.RUnlock()
125
126    metrics := make(map[string]int64)
127    for k, v := range em.metrics {
128        metrics[k] = v
129    }
130    return metrics
131}
132
133func (em *ErrorMonitor) GetReport() string {
134    em.mu.RLock()
135    defer em.mu.RUnlock()
136
137    total := em.metrics["total"]
138    if total == 0 {
139        return "No errors recorded"
140    }
141
142    report := fmt.Sprintf("=== Error Report ===\n")
143    report += fmt.Sprintf("Total errors: %d\n\n", total)
144
145    report += "By severity:\n"
146    for i := SeverityDebug; i <= SeverityCritical; i++ {
147        count := em.metrics[i.String()]
148        if count > 0 {
149            pct := float64(count) / float64(total) * 100
150            report += fmt.Sprintf("  %s: %d (%.1f%%)\n", i.String(), count, pct)
151        }
152    }
153
154    report += "\nTop error codes:\n"
155    codes := make(map[string]int64)
156    for k, v := range em.metrics {
157        if k != "total" && k != SeverityDebug.String() &&
158           k != SeverityInfo.String() && k != SeverityWarning.String() &&
159           k != SeverityError.String() && k != SeverityCritical.String() {
160            codes[k] = v
161        }
162    }
163
164    // Print top 5 codes
165    count := 0
166    for code, num := range codes {
167        if count >= 5 {
168            break
169        }
170        pct := float64(num) / float64(total) * 100
171        report += fmt.Sprintf("  %s: %d (%.1f%%)\n", code, num, pct)
172        count++
173    }
174
175    return report
176}
177
178// Application service
179type OrderService struct {
180    monitor *ErrorMonitor
181}
182
183func NewOrderService(monitor *ErrorMonitor) *OrderService {
184    return &OrderService{monitor: monitor}
185}
186
187func (os *OrderService) ProcessOrder(orderID, userID string, amount float64) error {
188    // Generate request ID
189    requestID := fmt.Sprintf("req_%d", time.Now().UnixNano())
190
191    // Validation error
192    if amount <= 0 {
193        err := &MonitoredError{
194            ID:        requestID,
195            Timestamp: time.Now(),
196            Severity:  SeverityError,
197            Service:   "order-service",
198            Operation: "process-order",
199            Message:   "invalid order amount",
200            Code:      "INVALID_AMOUNT",
201            UserID:    userID,
202            Context: map[string]interface{}{
203                "order_id": orderID,
204                "amount":   amount,
205            },
206        }
207        os.monitor.Track(err)
208        return err
209    }
210
211    // Warning for large orders
212    if amount > 1000 {
213        err := &MonitoredError{
214            ID:        requestID,
215            Timestamp: time.Now(),
216            Severity:  SeverityWarning,
217            Service:   "order-service",
218            Operation: "process-order",
219            Message:   "large order detected",
220            Code:      "LARGE_ORDER",
221            UserID:    userID,
222            Context: map[string]interface{}{
223                "order_id": orderID,
224                "amount":   amount,
225            },
226        }
227        os.monitor.Track(err)
228    }
229
230    // Simulate random failures
231    r := rand.Float64()
232    if r < 0.1 {
233        // Critical error (10%)
234        err := &MonitoredError{
235            ID:        requestID,
236            Timestamp: time.Now(),
237            Severity:  SeverityCritical,
238            Service:   "order-service",
239            Operation: "process-order",
240            Message:   "payment gateway unavailable",
241            Code:      "GATEWAY_DOWN",
242            UserID:    userID,
243            Context: map[string]interface{}{
244                "order_id": orderID,
245                "amount":   amount,
246            },
247        }
248        os.monitor.Track(err)
249        return err
250    } else if r < 0.3 {
251        // Regular error (20%)
252        err := &MonitoredError{
253            ID:        requestID,
254            Timestamp: time.Now(),
255            Severity:  SeverityError,
256            Service:   "order-service",
257            Operation: "process-order",
258            Message:   "inventory check failed",
259            Code:      "INVENTORY_ERROR",
260            UserID:    userID,
261            Context: map[string]interface{}{
262                "order_id": orderID,
263                "amount":   amount,
264            },
265        }
266        os.monitor.Track(err)
267        return err
268    }
269
270    // Success
271    fmt.Printf("✓ Order processed: %s (user: %s, amount: $%.2f)\n", orderID, userID, amount)
272    return nil
273}
274
275func main() {
276    rand.Seed(time.Now().UnixNano())
277
278    fmt.Println("=== Comprehensive Error Monitoring System ===\n")
279
280    monitor := NewErrorMonitor()
281    ctx, cancel := context.WithCancel(context.Background())
282    defer cancel()
283
284    var wg sync.WaitGroup
285
286    // Start error monitor
287    wg.Add(1)
288    go monitor.Start(ctx, &wg)
289
290    time.Sleep(time.Millisecond * 100)
291
292    orderService := NewOrderService(monitor)
293
294    // Simulate various orders
295    fmt.Println("Processing orders...\n")
296
297    orders := []struct {
298        id     string
299        userID string
300        amount float64
301    }{
302        {"order1", "user1", 100},
303        {"order2", "user2", -50},     // Invalid
304        {"order3", "user3", 1500},    // Large
305        {"order4", "user4", 200},
306        {"order5", "user5", 300},
307        {"order6", "user6", 150},
308        {"order7", "user7", 2000},    // Large
309        {"order8", "user8", 250},
310        {"order9", "user9", 0},       // Invalid
311        {"order10", "user10", 175},
312    }
313
314    for _, order := range orders {
315        orderService.ProcessOrder(order.id, order.userID, order.amount)
316        time.Sleep(time.Millisecond * 100)
317    }
318
319    // Allow time for error processing
320    time.Sleep(time.Second)
321
322    // Print metrics
323    fmt.Println("\n" + monitor.GetReport())
324
325    // Detailed metrics
326    fmt.Println("\n=== Detailed Metrics ===")
327    metrics := monitor.GetMetrics()
328    for key, value := range metrics {
329        fmt.Printf("%s: %d\n", key, value)
330    }
331
332    // Shutdown
333    fmt.Println("\nShutting down...")
334    cancel()
335
336    done := make(chan struct{})
337    go func() {
338        wg.Wait()
339        close(done)
340    }()
341
342    select {
343    case <-done:
344        fmt.Println("Shutdown complete")
345    case <-time.After(2 * time.Second):
346        fmt.Println("Shutdown timeout")
347    }
348}

Key Concepts:

  • Structured error logging with JSON
  • Real-time metrics collection
  • Severity-based error handling
  • Background error processing
  • Error analytics and reporting
  • Graceful shutdown
  • Production monitoring patterns

Summary

Key Takeaways

Error handling philosophy:

  • Explicit over implicit: Errors must be handled where they occur
  • Values over exceptions: Errors are ordinary values, not special constructs
  • Context preservation: Each layer adds relevant context while preserving the cause
  • Graceful degradation: Handle failures without crashing when possible

Essential patterns:

  • Immediate checking: if err != nil after every function that can fail
  • Error wrapping: Use %w to preserve error chains with fmt.Errorf
  • Custom error types: Create structured errors with business context
  • Error inspection: Use errors.Is() and errors.As() for type-safe handling
  • Retry logic: Implement exponential backoff for transient failures
  • Context integration: Use context for timeouts and cancellation

Production considerations:

  • Structured logging: Log errors with context, timestamps, and correlation IDs
  • Metrics collection: Track error rates, types, and patterns
  • Severity levels: Route errors appropriately (debug, info, warning, error, critical)
  • Alerting: Escalate critical errors to operations teams
  • Error analytics: Build dashboards and reports from error data

Next Steps

Continue your Go learning journey with these topics:

  1. Testing Error Paths - Master testing patterns for error handling
  2. Distributed Error Handling - Handle errors in microservices
  3. Monitoring and Observability - Build comprehensive error tracking
  4. Resilience Patterns - Circuit breakers, bulkheads, retries
  5. Error Design Principles - Design clear, actionable error APIs

Production Readiness

You now have the foundation for building production-ready Go applications with robust error handling. The patterns covered here are used in:

  • Web services and APIs with clear error responses
  • Microservice architectures with proper error propagation
  • Database systems with transaction error handling
  • File processing systems with graceful failure handling
  • Background job systems with comprehensive error tracking

Remember: Good error handling is not about preventing errors - it's about handling them gracefully. Master Go's explicit error handling model, and you'll build systems that are more reliable, debuggable, and maintainable.