Error Handling in Go | The Modern Go Tutorial

Why This Matters - Building Robust, Reliable Systems

Error handling is not just about catching problems - it's about building reliable, maintainable systems that handle failures gracefully. Go's explicit error handling forces you to think about failure at every step, creating more resilient code.

Real-world impact: Think about a payment processing system. When a database connection fails, does your application crash with an unhandled exception? Or does it log the error, retry with a different database, and notify operations? The difference impacts system reliability, user experience, and operational costs.

Business value: Proper error handling enables you to:

Build reliable systems that recover from failures gracefully
Provide clear debugging information for faster problem resolution
Implement graceful degradation when components fail
Create observable systems with comprehensive error tracking
Design predictable APIs that clearly communicate failure modes
Meet SLAs and reliability targets by anticipating and handling failures

System reliability: Go's error handling philosophy makes failures visible throughout your codebase, preventing silent failures that could cause production issues.

Learning Objectives

By the end of this tutorial, you will be able to:

Understand Go's error handling philosophy and why explicit is better than implicit
Master the error interface and create custom error types
Implement error wrapping with proper context preservation
Use error inspection techniques (errors.Is() and errors.As())
Apply production-ready error handling patterns with logging and metrics
Design APIs that provide clear, actionable error information
Implement graceful degradation and recovery strategies
Avoid common error handling pitfalls that lead to production issues
Build comprehensive error tracking and monitoring systems

Core Concepts - Understanding Go's Error Philosophy

Explicit vs Implicit Error Handling

Go deliberately avoids exceptions and implicit error handling. Instead, it makes errors explicit, ordinary values that you must handle.

The philosophy: Errors are values, not exceptional conditions. This means:

Errors are returned as ordinary function return values
You must explicitly check and handle errors
Error handling logic is visible in your code flow
There's no hidden control flow like try/catch/finally

Why this matters: In languages with exceptions:

 1// Java: Errors can come from anywhere without warning
 2try {
 3    processPayment(amount);
 4    sendReceipt();
 5    updateInventory();
 6    updateAccountBalance();
 7    // Any of these might throw - you must read documentation!
 8} catch (Exception e) {
 9    // What failed? Why? Is this recoverable?
10    // What state are we in now?
11    handleGenericError(e);
12}

Problems with exceptions:

Hidden control flow: Any function might throw, but you can't see it in the code
State uncertainty: When an exception is thrown, what state is your data in?
All-or-nothing: Either everything succeeds or the whole operation fails
Generic handling: Catch blocks often handle disparate errors the same way
Performance: Exception handling has runtime overhead

Go's explicit approach:

 1// Go: Each operation's potential for failure is explicit
 2err := processPayment(amount)
 3if err != nil {
 4    return fmt.Errorf("payment processing failed: %w", err)
 5}
 6
 7err = sendReceipt()
 8if err != nil {
 9    // Payment succeeded but receipt failed - we know exactly where we are
10    logError("receipt sending failed", err)
11    // Continue or compensate as needed
12}
13
14err = updateInventory()
15if err != nil {
16    return fmt.Errorf("inventory update failed: %w", err)
17}
18
19err = updateAccountBalance()
20if err != nil {
21    return fmt.Errorf("balance update failed: %w", err)
22}

Benefits of explicit handling:

Clarity: You can see exactly which functions can fail
Local handling: Errors are handled where they occur, with full context
Predictable flow: No hidden control transfers or stack unwinding
Context preservation: Each step can add its own context
Fine-grained control: Different errors at different points handled differently
State management: You know exactly what succeeded before the error

The Error Interface: Simplicity and Power

Go's error handling is built on a simple, elegant interface:

1type error interface {
2    Error() string
3}

What this means:

Any type with an Error() string method is an error
No special syntax needed - errors are just values
Flexible implementation - create rich error types with additional methods
Interface satisfaction - your types can conform naturally
Composition: Errors can wrap other errors, preserving the error chain

Philosophy: Errors are values, not special language constructs. This approach:

Eliminates special error handling syntax
Allows errors to carry additional data and methods
Enables polymorphic error handling through interfaces
Keeps the language simple and consistent
Makes error handling testable and composable

Standard library error creation:

1// Simple error with fixed message
2err := errors.New("something went wrong")
3
4// Formatted error with dynamic content
5err := fmt.Errorf("failed to process user %s: %v", username, originalErr)
6
7// Error wrapping (Go 1.13+)
8err := fmt.Errorf("failed to connect: %w", originalErr)

Error Wrapping and Unwrapping

Go 1.13 introduced error wrapping, a powerful feature for preserving error chains:

Error wrapping (%w verb):

1if err != nil {
2    return fmt.Errorf("database query failed: %w", err)
3}

Why wrap errors?:

Preserve the original error for inspection
Add context at each layer of your application
Enable error type checking through the chain
Build informative error messages with full context

Error inspection:

 1// Check if error is or wraps a specific error
 2if errors.Is(err, sql.ErrNoRows) {
 3    // Handle "no rows" error
 4}
 5
 6// Extract specific error type from chain
 7var netErr *net.OpError
 8if errors.As(err, &netErr) {
 9    // Access network-specific error details
10    fmt.Println("Operation:", netErr.Op)
11    fmt.Println("Network:", netErr.Net)
12}

Practical Examples - From Basics to Production

Example 1: Basic Error Creation and Handling

Let's start with fundamental error handling patterns:

 1// run
 2package main
 3
 4import (
 5    "errors"
 6    "fmt"
 7    "strconv"
 8    "strings"
 9)
10
11// Function that can fail in multiple ways
12func validateAge(age string) (int, error) {
13    if age == "" {
14        return 0, errors.New("age cannot be empty")
15    }
16
17    // Trim whitespace
18    age = strings.TrimSpace(age)
19
20    ageInt, err := strconv.Atoi(age)
21    if err != nil {
22        return 0, fmt.Errorf("invalid age format '%s': %w", age, err)
23    }
24
25    if ageInt < 0 {
26        return 0, fmt.Errorf("age cannot be negative: %d", ageInt)
27    }
28
29    if ageInt > 120 {
30        return 0, fmt.Errorf("age %d seems unrealistic (must be 0-120)", ageInt)
31    }
32
33    return ageInt, nil
34}
35
36func registerUser(name, ageStr string) error {
37    fmt.Printf("Registering user: %s\n", name)
38
39    if name == "" {
40        return fmt.Errorf("user registration failed: name cannot be empty")
41    }
42
43    // Handle validation error with context
44    age, err := validateAge(ageStr)
45    if err != nil {
46        return fmt.Errorf("user registration failed for %s: %w", name, err)
47    }
48
49    fmt.Printf("Successfully registered user: %s, age %d\n", name, age)
50    return nil
51}
52
53func main() {
54    users := []struct {
55        name string
56        age  string
57    }{
58        {"Alice", "25"},
59        {"Bob", "invalid"},
60        {"", "30"},
61        {"Charlie", "-5"},
62        {"Diana", "150"},
63        {"Eve", "  42  "}, // Test whitespace handling
64    }
65
66    fmt.Println("=== User Registration Demo ===\n")
67
68    for i, user := range users {
69        fmt.Printf("--- Test %d ---\n", i+1)
70        err := registerUser(user.name, user.age)
71        if err != nil {
72            fmt.Printf("Error: %v\n", err)
73        } else {
74            fmt.Printf("Success!\n")
75        }
76        fmt.Println()
77    }
78}

What this demonstrates:

Simple error creation with errors.New()
Error wrapping with fmt.Errorf() and %w verb
Context addition at each level
Error handling with if err != nil pattern
Multiple error types: validation errors, parsing errors, business logic errors
Error message formatting with relevant details

Key patterns established:

Check errors immediately after calls
Add relevant context at each level
Use %w to preserve error chains
Return early when errors occur
Include relevant data in error messages

Example 2: Custom Error Types for Rich Context

Let's create domain-specific error types with additional behavior:

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "time"
  8)
  9
 10// Domain-specific error codes
 11type ErrorCode string
 12
 13const (
 14    ErrCodeValidation      ErrorCode = "VALIDATION_ERROR"
 15    ErrCodeNetwork         ErrorCode = "NETWORK_ERROR"
 16    ErrCodeAuthentication  ErrorCode = "AUTHENTICATION_ERROR"
 17    ErrCodeAuthorization   ErrorCode = "AUTHORIZATION_ERROR"
 18    ErrCodeRateLimit       ErrorCode = "RATE_LIMIT_ERROR"
 19    ErrCodeResourceNotFound ErrorCode = "RESOURCE_NOT_FOUND"
 20    ErrCodeConflict        ErrorCode = "CONFLICT_ERROR"
 21    ErrCodeInternal        ErrorCode = "INTERNAL_ERROR"
 22)
 23
 24// Custom error type with rich context
 25type ServiceError struct {
 26    Code       ErrorCode
 27    Message    string
 28    Timestamp  time.Time
 29    Retryable  bool
 30    StatusCode int // HTTP status code equivalent
 31    Details    map[string]interface{}
 32    Cause      error // Original error
 33}
 34
 35func (e *ServiceError) Error() string {
 36    if e.Cause != nil {
 37        return fmt.Sprintf("[%s] %s: %v", e.Code, e.Message, e.Cause)
 38    }
 39    return fmt.Sprintf("[%s] %s", e.Code, e.Message)
 40}
 41
 42func (e *ServiceError) Unwrap() error {
 43    return e.Cause
 44}
 45
 46// Check if error is retryable
 47func (e *ServiceError) IsRetryable() bool {
 48    return e.Retryable
 49}
 50
 51// Get HTTP status code
 52func (e *ServiceError) HTTPStatus() int {
 53    return e.StatusCode
 54}
 55
 56// Error constructors for different scenarios
 57func NewValidationError(field string, value interface{}, reason string) *ServiceError {
 58    return &ServiceError{
 59        Code:       ErrCodeValidation,
 60        Message:    fmt.Sprintf("validation failed for field '%s': %s", field, reason),
 61        Timestamp:  time.Now(),
 62        Retryable:  false,
 63        StatusCode: 400,
 64        Details: map[string]interface{}{
 65            "field":  field,
 66            "value":  value,
 67            "reason": reason,
 68        },
 69    }
 70}
 71
 72func NewAuthenticationError(username string, reason string) *ServiceError {
 73    return &ServiceError{
 74        Code:       ErrCodeAuthentication,
 75        Message:    fmt.Sprintf("authentication failed for user '%s': %s", username, reason),
 76        Timestamp:  time.Now(),
 77        Retryable:  false,
 78        StatusCode: 401,
 79        Details: map[string]interface{}{
 80            "username": username,
 81            "reason":   reason,
 82        },
 83    }
 84}
 85
 86func NewRateLimitError(resource string, limit int, retryAfter time.Duration) *ServiceError {
 87    return &ServiceError{
 88        Code:       ErrCodeRateLimit,
 89        Message:    fmt.Sprintf("rate limit exceeded for %s", resource),
 90        Timestamp:  time.Now(),
 91        Retryable:  true,
 92        StatusCode: 429,
 93        Details: map[string]interface{}{
 94            "resource":    resource,
 95            "limit":       limit,
 96            "retry_after": retryAfter.String(),
 97        },
 98    }
 99}
100
101func NewNotFoundError(resourceType string, identifier string) *ServiceError {
102    return &ServiceError{
103        Code:       ErrCodeResourceNotFound,
104        Message:    fmt.Sprintf("%s not found: %s", resourceType, identifier),
105        Timestamp:  time.Now(),
106        Retryable:  false,
107        StatusCode: 404,
108        Details: map[string]interface{}{
109            "resource_type": resourceType,
110            "identifier":    identifier,
111        },
112    }
113}
114
115// Example service using custom errors
116type UserService struct {
117    users      map[string]string // username -> password (simplified)
118    rateLimits map[string]int    // username -> attempt count
119}
120
121func NewUserService() *UserService {
122    return &UserService{
123        users: map[string]string{
124            "alice": "password123",
125            "bob":   "secure456",
126        },
127        rateLimits: make(map[string]int),
128    }
129}
130
131func (us *UserService) Login(username, password string) error {
132    // Validate input
133    if username == "" {
134        return NewValidationError("username", username, "cannot be empty")
135    }
136
137    if password == "" {
138        return NewValidationError("password", "***", "cannot be empty")
139    }
140
141    if len(password) < 8 {
142        return NewValidationError("password", "***", "must be at least 8 characters")
143    }
144
145    // Check rate limiting
146    attempts := us.rateLimits[username]
147    if attempts >= 3 {
148        return NewRateLimitError("login", 3, time.Minute*5)
149    }
150
151    // Check if user exists
152    storedPassword, exists := us.users[username]
153    if !exists {
154        us.rateLimits[username]++
155        return NewNotFoundError("user", username)
156    }
157
158    // Verify password
159    if password != storedPassword {
160        us.rateLimits[username]++
161        return NewAuthenticationError(username, "invalid credentials")
162    }
163
164    // Reset rate limit on successful login
165    delete(us.rateLimits, username)
166
167    fmt.Printf("Login successful for user: %s\n", username)
168    return nil
169}
170
171// Error handler that uses error type information
172func handleServiceError(err error) {
173    var serviceErr *ServiceError
174    if errors.As(err, &serviceErr) {
175        fmt.Printf("\n=== Service Error Details ===\n")
176        fmt.Printf("Code: %s\n", serviceErr.Code)
177        fmt.Printf("Message: %s\n", serviceErr.Message)
178        fmt.Printf("HTTP Status: %d\n", serviceErr.StatusCode)
179        fmt.Printf("Retryable: %v\n", serviceErr.Retryable)
180        fmt.Printf("Timestamp: %v\n", serviceErr.Timestamp.Format(time.RFC3339))
181
182        if len(serviceErr.Details) > 0 {
183            fmt.Printf("Details:\n")
184            for key, value := range serviceErr.Details {
185                fmt.Printf("  %s: %v\n", key, value)
186            }
187        }
188
189        // Provide actionable suggestions
190        switch serviceErr.Code {
191        case ErrCodeRateLimit:
192            if retryAfter, ok := serviceErr.Details["retry_after"]; ok {
193                fmt.Printf("\nSuggestion: Retry after %v\n", retryAfter)
194            }
195        case ErrCodeAuthentication:
196            fmt.Printf("\nSuggestion: Check credentials and try again\n")
197        case ErrCodeValidation:
198            fmt.Printf("\nSuggestion: Fix validation errors and resubmit\n")
199        case ErrCodeResourceNotFound:
200            fmt.Printf("\nSuggestion: Verify the resource identifier\n")
201        }
202    } else {
203        fmt.Printf("Generic error: %v\n", err)
204    }
205}
206
207func main() {
208    service := NewUserService()
209
210    fmt.Println("=== Custom Error Types Demo ===\n")
211
212    testCases := []struct {
213        name     string
214        username string
215        password string
216        desc     string
217    }{
218        {"Empty username", "", "password", "Validation error"},
219        {"Short password", "alice", "short", "Validation error"},
220        {"User not found", "charlie", "password123", "Not found error"},
221        {"Wrong password 1", "alice", "wrong", "Authentication error (attempt 1)"},
222        {"Wrong password 2", "alice", "wrong2", "Authentication error (attempt 2)"},
223        {"Wrong password 3", "alice", "wrong3", "Authentication error (attempt 3)"},
224        {"Rate limited", "alice", "password123", "Rate limit error"},
225        {"Valid login", "bob", "secure456", "Successful login"},
226    }
227
228    for i, tc := range testCases {
229        fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
230        err := service.Login(tc.username, tc.password)
231
232        if err != nil {
233            handleServiceError(err)
234        } else {
235            fmt.Println("Success!")
236        }
237
238        fmt.Println()
239        time.Sleep(time.Millisecond * 100)
240    }
241}

What this demonstrates:

Custom error types with rich context and behavior
Error constructors for consistent error creation
Domain-specific error codes for structured error handling
Error methods for accessing error properties (IsRetryable, HTTPStatus)
Contextual information including timestamps, details maps, and causes
Type assertion with errors.As() for specialized error handling
Actionable error messages with suggestions for resolution

Production-ready patterns:

Structured error information for debugging
Retry logic based on error properties
Rate limiting information in errors
Timestamps for error correlation
Business context in errors
HTTP status code mapping for web services

Example 3: Error Wrapping and Inspection

Error wrapping preserves context while maintaining access to underlying errors:

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "os"
  8)
  9
 10// Custom error types
 11var (
 12    ErrDatabase     = errors.New("database error")
 13    ErrNotFound     = errors.New("resource not found")
 14    ErrUnauthorized = errors.New("unauthorized access")
 15    ErrInvalidInput = errors.New("invalid input")
 16)
 17
 18// Simulated database layer
 19type Database struct {
 20    data map[string]string
 21}
 22
 23func NewDatabase() *Database {
 24    return &Database{
 25        data: map[string]string{
 26            "user:1": "Alice",
 27            "user:2": "Bob",
 28        },
 29    }
 30}
 31
 32func (db *Database) Get(key string) (string, error) {
 33    value, exists := db.data[key]
 34    if !exists {
 35        return "", fmt.Errorf("key %s: %w", key, ErrNotFound)
 36    }
 37    return value, nil
 38}
 39
 40// Repository layer (wraps database)
 41type UserRepository struct {
 42    db *Database
 43}
 44
 45func NewUserRepository(db *Database) *UserRepository {
 46    return &UserRepository{db: db}
 47}
 48
 49func (r *UserRepository) FindByID(id string) (string, error) {
 50    key := fmt.Sprintf("user:%s", id)
 51    name, err := r.db.Get(key)
 52    if err != nil {
 53        return "", fmt.Errorf("repository: failed to find user %s: %w", id, err)
 54    }
 55    return name, nil
 56}
 57
 58// Service layer (wraps repository)
 59type UserService struct {
 60    repo *UserRepository
 61}
 62
 63func NewUserService(repo *UserRepository) *UserService {
 64    return &UserService{repo: repo}
 65}
 66
 67func (s *UserService) GetUser(id string) (string, error) {
 68    if id == "" {
 69        return "", fmt.Errorf("service: %w", ErrInvalidInput)
 70    }
 71
 72    name, err := s.repo.FindByID(id)
 73    if err != nil {
 74        return "", fmt.Errorf("service: failed to get user: %w", err)
 75    }
 76
 77    return name, nil
 78}
 79
 80// Error inspection and handling
 81func handleError(err error) {
 82    fmt.Printf("\n=== Error Analysis ===\n")
 83    fmt.Printf("Full error message: %v\n\n", err)
 84
 85    // Check for specific sentinel errors
 86    if errors.Is(err, ErrNotFound) {
 87        fmt.Println("✓ Error is or wraps ErrNotFound")
 88        fmt.Println("  Action: Could return 404 to client")
 89    }
 90
 91    if errors.Is(err, ErrInvalidInput) {
 92        fmt.Println("✓ Error is or wraps ErrInvalidInput")
 93        fmt.Println("  Action: Could return 400 to client")
 94    }
 95
 96    if errors.Is(err, ErrUnauthorized) {
 97        fmt.Println("✓ Error is or wraps ErrUnauthorized")
 98        fmt.Println("  Action: Could return 401 to client")
 99    }
100
101    if errors.Is(err, os.ErrNotExist) {
102        fmt.Println("✓ Error is or wraps os.ErrNotExist")
103        fmt.Println("  Action: File system issue")
104    }
105
106    // Unwrap the error chain manually
107    fmt.Println("\nError chain:")
108    currentErr := err
109    depth := 0
110    for currentErr != nil {
111        fmt.Printf("  %d: %v\n", depth, currentErr)
112        currentErr = errors.Unwrap(currentErr)
113        depth++
114    }
115}
116
117func main() {
118    db := NewDatabase()
119    repo := NewUserRepository(db)
120    service := NewUserService(repo)
121
122    fmt.Println("=== Error Wrapping and Inspection Demo ===")
123
124    // Test 1: Successful retrieval
125    fmt.Println("\n--- Test 1: Successful retrieval ---")
126    name, err := service.GetUser("1")
127    if err != nil {
128        handleError(err)
129    } else {
130        fmt.Printf("Success: Found user: %s\n", name)
131    }
132
133    // Test 2: User not found (wrapped through multiple layers)
134    fmt.Println("\n--- Test 2: User not found ---")
135    name, err = service.GetUser("999")
136    if err != nil {
137        handleError(err)
138    }
139
140    // Test 3: Invalid input
141    fmt.Println("\n--- Test 3: Invalid input ---")
142    name, err = service.GetUser("")
143    if err != nil {
144        handleError(err)
145    }
146
147    // Demonstrate error wrapping depth
148    fmt.Println("\n--- Test 4: Multiple wrapping layers ---")
149
150    // Create deeply wrapped error
151    baseErr := errors.New("network timeout")
152    layer1 := fmt.Errorf("connection failed: %w", baseErr)
153    layer2 := fmt.Errorf("database query failed: %w", layer1)
154    layer3 := fmt.Errorf("user lookup failed: %w", layer2)
155
156    fmt.Println("Deeply wrapped error:")
157    handleError(layer3)
158}

What this demonstrates:

Error wrapping through multiple application layers
Context preservation from database → repository → service
Error inspection using errors.Is() for sentinel errors
Error unwrapping to traverse the error chain
Layered architecture with appropriate error handling at each level
Actionable error handling based on error type inspection

Key concepts:

Each layer adds its own context to errors
Original error remains accessible through the chain
errors.Is() works through wrapped errors
Error messages build a complete story of what failed
Different layers can make different decisions based on error types

Example 4: Advanced Error Handling with Retry Logic

Let's implement production-ready error handling with retry patterns:

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "errors"
  7    "fmt"
  8    "math/rand"
  9    "time"
 10)
 11
 12// Error types for different failure scenarios
 13type RetryableError struct {
 14    Attempt   int
 15    Cause     error
 16    Timestamp time.Time
 17}
 18
 19func (e *RetryableError) Error() string {
 20    return fmt.Sprintf("attempt %d failed at %v: %v",
 21        e.Attempt, e.Timestamp.Format("15:04:05"), e.Cause)
 22}
 23
 24func (e *RetryableError) Unwrap() error {
 25    return e.Cause
 26}
 27
 28type TemporaryError struct {
 29    Reason string
 30}
 31
 32func (e *TemporaryError) Error() string {
 33    return fmt.Sprintf("temporary failure: %s", e.Reason)
 34}
 35
 36func (e *TemporaryError) Temporary() bool {
 37    return true
 38}
 39
 40type PermanentError struct {
 41    Reason string
 42}
 43
 44func (e *PermanentError) Error() string {
 45    return fmt.Sprintf("permanent failure: %s", e.Reason)
 46}
 47
 48// Retry configuration
 49type RetryConfig struct {
 50    MaxAttempts   int
 51    InitialDelay  time.Duration
 52    MaxDelay      time.Duration
 53    BackoffFactor float64
 54    Timeout       time.Duration
 55}
 56
 57func DefaultRetryConfig() RetryConfig {
 58    return RetryConfig{
 59        MaxAttempts:   3,
 60        InitialDelay:  100 * time.Millisecond,
 61        MaxDelay:      10 * time.Second,
 62        BackoffFactor: 2.0,
 63        Timeout:       30 * time.Second,
 64    }
 65}
 66
 67// Check if error is retryable
 68func isRetryable(err error) bool {
 69    // Check for temporary interface
 70    type temporary interface {
 71        Temporary() bool
 72    }
 73
 74    var tempErr temporary
 75    if errors.As(err, &tempErr) {
 76        return tempErr.Temporary()
 77    }
 78
 79    // Check for specific retryable error types
 80    var retryErr *RetryableError
 81    if errors.As(err, &retryErr) {
 82        return true
 83    }
 84
 85    var tempError *TemporaryError
 86    if errors.As(err, &tempError) {
 87        return true
 88    }
 89
 90    // Check for permanent errors
 91    var permErr *PermanentError
 92    if errors.As(err, &permErr) {
 93        return false
 94    }
 95
 96    // Default: assume retryable
 97    return true
 98}
 99
100// Retry with exponential backoff
101func RetryWithBackoff(ctx context.Context, operation func() error, config RetryConfig) error {
102    var lastErr error
103    delay := config.InitialDelay
104
105    // Create timeout context
106    timeoutCtx, cancel := context.WithTimeout(ctx, config.Timeout)
107    defer cancel()
108
109    for attempt := 1; attempt <= config.MaxAttempts; attempt++ {
110        // Check context cancellation
111        select {
112        case <-timeoutCtx.Done():
113            return fmt.Errorf("operation timeout after %d attempts: %w", attempt-1, timeoutCtx.Err())
114        default:
115        }
116
117        // Execute operation
118        lastErr = operation()
119
120        if lastErr == nil {
121            if attempt > 1 {
122                fmt.Printf("✓ Operation succeeded on attempt %d\n", attempt)
123            }
124            return nil
125        }
126
127        // Check if error is retryable
128        if !isRetryable(lastErr) {
129            fmt.Printf("✗ Non-retryable error on attempt %d: %v\n", attempt, lastErr)
130            return &RetryableError{
131                Attempt:   attempt,
132                Cause:     lastErr,
133                Timestamp: time.Now(),
134            }
135        }
136
137        // Don't sleep after last attempt
138        if attempt == config.MaxAttempts {
139            break
140        }
141
142        // Log retry
143        fmt.Printf("⚠ Attempt %d/%d failed, retrying in %v: %v\n",
144            attempt, config.MaxAttempts, delay, lastErr)
145
146        // Wait with exponential backoff
147        select {
148        case <-time.After(delay):
149            // Continue to next attempt
150        case <-timeoutCtx.Done():
151            return fmt.Errorf("timeout during backoff: %w", timeoutCtx.Err())
152        }
153
154        // Calculate next delay
155        delay = time.Duration(float64(delay) * config.BackoffFactor)
156        if delay > config.MaxDelay {
157            delay = config.MaxDelay
158        }
159    }
160
161    return &RetryableError{
162        Attempt:   config.MaxAttempts,
163        Cause:     lastErr,
164        Timestamp: time.Now(),
165    }
166}
167
168// Simulated operations with different failure patterns
169type Service struct {
170    failureRate float64
171}
172
173func NewService(failureRate float64) *Service {
174    return &Service{failureRate: failureRate}
175}
176
177func (s *Service) TemporaryFailure() error {
178    if rand.Float64() < s.failureRate {
179        return &TemporaryError{Reason: "network timeout"}
180    }
181    return nil
182}
183
184func (s *Service) PermanentFailure() error {
185    if rand.Float64() < s.failureRate {
186        return &PermanentError{Reason: "invalid API key"}
187    }
188    return nil
189}
190
191func (s *Service) RandomFailure() error {
192    r := rand.Float64()
193    if r < s.failureRate/2 {
194        return &TemporaryError{Reason: "connection reset"}
195    } else if r < s.failureRate {
196        return errors.New("unknown error")
197    }
198    return nil
199}
200
201func main() {
202    rand.Seed(time.Now().UnixNano())
203    ctx := context.Background()
204
205    fmt.Println("=== Advanced Error Handling with Retry ===\n")
206
207    // Example 1: Temporary failures with retry
208    fmt.Println("--- Example 1: Temporary Failures (60% failure rate) ---")
209    service1 := NewService(0.6)
210    config := DefaultRetryConfig()
211
212    err := RetryWithBackoff(ctx, service1.TemporaryFailure, config)
213    if err != nil {
214        fmt.Printf("Final error: %v\n", err)
215    } else {
216        fmt.Println("Operation succeeded!")
217    }
218
219    time.Sleep(time.Second)
220
221    // Example 2: Permanent failure (no retry)
222    fmt.Println("\n--- Example 2: Permanent Failure ---")
223    service2 := NewService(1.0) // Always fail
224
225    err = RetryWithBackoff(ctx, service2.PermanentFailure, config)
226    if err != nil {
227        fmt.Printf("Final error: %v\n", err)
228
229        // Check error type
230        var retryErr *RetryableError
231        if errors.As(err, &retryErr) {
232            fmt.Printf("Failed after %d attempts\n", retryErr.Attempt)
233        }
234    }
235
236    time.Sleep(time.Second)
237
238    // Example 3: Timeout scenario
239    fmt.Println("\n--- Example 3: Operation Timeout ---")
240    shortConfig := config
241    shortConfig.Timeout = 500 * time.Millisecond
242    shortConfig.InitialDelay = 200 * time.Millisecond
243
244    service3 := NewService(1.0) // Always fail
245    err = RetryWithBackoff(ctx, service3.TemporaryFailure, shortConfig)
246    if err != nil {
247        fmt.Printf("Final error: %v\n", err)
248    }
249
250    time.Sleep(time.Second)
251
252    // Example 4: Success after retries
253    fmt.Println("\n--- Example 4: Success After Retries (30% failure rate) ---")
254    service4 := NewService(0.3)
255
256    err = RetryWithBackoff(ctx, service4.RandomFailure, config)
257    if err != nil {
258        fmt.Printf("Final error: %v\n", err)
259    } else {
260        fmt.Println("Operation succeeded!")
261    }
262
263    // Example 5: Exponential backoff demonstration
264    fmt.Println("\n--- Example 5: Exponential Backoff Calculation ---")
265    fmt.Println("Demonstrating backoff timing:")
266
267    delay := config.InitialDelay
268    for i := 1; i <= 5; i++ {
269        nextDelay := time.Duration(float64(delay) * config.BackoffFactor)
270        if nextDelay > config.MaxDelay {
271            nextDelay = config.MaxDelay
272        }
273        fmt.Printf("Attempt %d: delay = %v, next = %v\n", i, delay, nextDelay)
274        delay = nextDelay
275    }
276}

What this demonstrates:

Sophisticated retry logic with exponential backoff
Error classification for retryable vs non-retryable errors
Context integration for timeouts and cancellation
Temporary error detection using custom interfaces
Backoff calculations to prevent overwhelming services
Detailed logging of retry attempts and outcomes

Production patterns:

Exponential backoff prevents service overload
Context-aware operations respect timeouts
Error classification guides retry decisions
Comprehensive logging for debugging
Configurable retry parameters

Example 5: Production Error Handling System

Let's build a comprehensive error handling framework for production:

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "encoding/json"
  7    "fmt"
  8    "sync"
  9    "time"
 10)
 11
 12// Error severity levels
 13type Severity int
 14
 15const (
 16    SeverityDebug Severity = iota
 17    SeverityInfo
 18    SeverityWarning
 19    SeverityError
 20    SeverityCritical
 21)
 22
 23func (s Severity) String() string {
 24    return []string{"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}[s]
 25}
 26
 27// Structured error for production systems
 28type ProductionError struct {
 29    ID         string                 `json:"id"`
 30    Timestamp  time.Time              `json:"timestamp"`
 31    Severity   Severity               `json:"severity"`
 32    Service    string                 `json:"service"`
 33    Operation  string                 `json:"operation"`
 34    Message    string                 `json:"message"`
 35    Code       string                 `json:"code"`
 36    Context    map[string]interface{} `json:"context,omitempty"`
 37    Cause      error                  `json:"-"`
 38    UserID     string                 `json:"user_id,omitempty"`
 39    RequestID  string                 `json:"request_id,omitempty"`
 40    StackTrace []string               `json:"stack_trace,omitempty"`
 41}
 42
 43func (e *ProductionError) Error() string {
 44    if e.Cause != nil {
 45        return fmt.Sprintf("[%s] %s: %v", e.Code, e.Message, e.Cause)
 46    }
 47    return fmt.Sprintf("[%s] %s", e.Code, e.Message)
 48}
 49
 50func (e *ProductionError) Unwrap() error {
 51    return e.Cause
 52}
 53
 54// Error tracking system
 55type ErrorTracker struct {
 56    errors  chan *ProductionError
 57    metrics map[string]int64
 58    mu      sync.RWMutex
 59}
 60
 61func NewErrorTracker() *ErrorTracker {
 62    return &ErrorTracker{
 63        errors:  make(chan *ProductionError, 100),
 64        metrics: make(map[string]int64),
 65    }
 66}
 67
 68func (et *ErrorTracker) Track(err *ProductionError) {
 69    select {
 70    case et.errors <- err:
 71    default:
 72        fmt.Printf("Warning: Error tracking queue full, dropping error: %v\n", err)
 73    }
 74}
 75
 76func (et *ErrorTracker) Start(ctx context.Context, wg *sync.WaitGroup) {
 77    defer wg.Done()
 78
 79    for {
 80        select {
 81        case <-ctx.Done():
 82            return
 83        case err := <-et.errors:
 84            et.processError(err)
 85        }
 86    }
 87}
 88
 89func (et *ErrorTracker) processError(err *ProductionError) {
 90    // Update metrics
 91    et.mu.Lock()
 92    et.metrics[err.Code]++
 93    et.metrics["total"]++
 94    if err.Severity >= SeverityError {
 95        et.metrics["error_count"]++
 96    }
 97    et.mu.Unlock()
 98
 99    // Log structured error
100    et.logError(err)
101
102    // Send critical alerts
103    if err.Severity >= SeverityCritical {
104        et.sendAlert(err)
105    }
106}
107
108func (et *ErrorTracker) logError(err *ProductionError) {
109    logEntry := map[string]interface{}{
110        "timestamp": err.Timestamp.Format(time.RFC3339),
111        "severity":  err.Severity.String(),
112        "service":   err.Service,
113        "operation": err.Operation,
114        "message":   err.Message,
115        "code":      err.Code,
116        "error_id":  err.ID,
117    }
118
119    if len(err.Context) > 0 {
120        logEntry["context"] = err.Context
121    }
122
123    if err.UserID != "" {
124        logEntry["user_id"] = err.UserID
125    }
126
127    if err.RequestID != "" {
128        logEntry["request_id"] = err.RequestID
129    }
130
131    jsonData, _ := json.Marshal(logEntry)
132    fmt.Printf("LOG: %s\n", string(jsonData))
133}
134
135func (et *ErrorTracker) sendAlert(err *ProductionError) {
136    fmt.Printf("ALERT: Critical error %s - %s\n", err.ID, err.Message)
137    // In production: send to PagerDuty, Slack, email, etc.
138}
139
140func (et *ErrorTracker) GetMetrics() map[string]int64 {
141    et.mu.RLock()
142    defer et.mu.RUnlock()
143
144    metrics := make(map[string]int64)
145    for k, v := range et.metrics {
146        metrics[k] = v
147    }
148    return metrics
149}
150
151// Service with integrated error handling
152type PaymentService struct {
153    tracker *ErrorTracker
154    mu      sync.Mutex
155}
156
157func NewPaymentService(tracker *ErrorTracker) *PaymentService {
158    return &PaymentService{
159        tracker: tracker,
160    }
161}
162
163func (ps *PaymentService) ProcessPayment(userID string, amount float64, requestID string) error {
164    // Input validation
165    if amount <= 0 {
166        err := &ProductionError{
167            ID:          requestID,
168            Timestamp:   time.Now(),
169            Severity:    SeverityError,
170            Service:     "payment-service",
171            Operation:   "process-payment",
172            Message:     "invalid payment amount",
173            Code:        "INVALID_AMOUNT",
174            UserID:      userID,
175            RequestID:   requestID,
176            Context: map[string]interface{}{
177                "amount": amount,
178            },
179        }
180        ps.tracker.Track(err)
181        return err
182    }
183
184    if amount > 10000 {
185        err := &ProductionError{
186            ID:          requestID,
187            Timestamp:   time.Now(),
188            Severity:    SeverityWarning,
189            Service:     "payment-service",
190            Operation:   "process-payment",
191            Message:     "large payment requires additional verification",
192            Code:        "LARGE_PAYMENT",
193            UserID:      userID,
194            RequestID:   requestID,
195            Context: map[string]interface{}{
196                "amount":    amount,
197                "threshold": 10000,
198            },
199        }
200        ps.tracker.Track(err)
201        return err
202    }
203
204    // Simulate processing (20% failure rate)
205    if rand.Intn(10) < 2 {
206        err := &ProductionError{
207            ID:          requestID,
208            Timestamp:   time.Now(),
209            Severity:    SeverityError,
210            Service:     "payment-service",
211            Operation:   "process-payment",
212            Message:     "payment gateway timeout",
213            Code:        "GATEWAY_TIMEOUT",
214            UserID:      userID,
215            RequestID:   requestID,
216            Context: map[string]interface{}{
217                "amount":  amount,
218                "gateway": "stripe",
219            },
220        }
221        ps.tracker.Track(err)
222        return err
223    }
224
225    fmt.Printf("✓ Payment processed: user=%s, amount=%.2f, request=%s\n",
226        userID, amount, requestID)
227    return nil
228}
229
230func generateRequestID() string {
231    return fmt.Sprintf("req_%d", time.Now().UnixNano())
232}
233
234func main() {
235    rand.Seed(time.Now().UnixNano())
236
237    fmt.Println("=== Production Error Handling System ===\n")
238
239    tracker := NewErrorTracker()
240    ctx, cancel := context.WithCancel(context.Background())
241    defer cancel()
242
243    var wg sync.WaitGroup
244
245    // Start error tracker
246    wg.Add(1)
247    go tracker.Start(ctx, &wg)
248
249    // Give tracker time to start
250    time.Sleep(time.Millisecond * 100)
251
252    paymentService := NewPaymentService(tracker)
253
254    // Test scenarios
255    testCases := []struct {
256        userID string
257        amount float64
258        desc   string
259    }{
260        {"user1", 100.0, "Valid payment"},
261        {"user2", -50.0, "Invalid amount (negative)"},
262        {"user3", 0.0, "Invalid amount (zero)"},
263        {"user4", 15000.0, "Large payment (requires verification)"},
264        {"user5", 200.0, "Valid payment (may fail randomly)"},
265        {"user6", 300.0, "Valid payment (may fail randomly)"},
266        {"user7", 150.0, "Valid payment (may fail randomly)"},
267    }
268
269    fmt.Println("Processing payments...\n")
270
271    for i, tc := range testCases {
272        requestID := generateRequestID()
273        fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
274        fmt.Printf("User: %s, Amount: $%.2f\n", tc.userID, tc.amount)
275
276        err := paymentService.ProcessPayment(tc.userID, tc.amount, requestID)
277        if err != nil {
278            fmt.Printf("Error: %v\n", err)
279        }
280
281        fmt.Println()
282        time.Sleep(time.Millisecond * 100)
283    }
284
285    // Let error tracker process remaining errors
286    time.Sleep(time.Second)
287
288    // Show error metrics
289    fmt.Println("=== Error Metrics ===")
290    metrics := tracker.GetMetrics()
291
292    jsonData, _ := json.MarshalIndent(metrics, "", "  ")
293    fmt.Printf("%s\n", string(jsonData))
294
295    // Shutdown
296    cancel()
297
298    // Wait for error tracker to stop
299    done := make(chan struct{})
300    go func() {
301        wg.Wait()
302        close(done)
303    }()
304
305    select {
306    case <-done:
307        fmt.Println("\nError tracker stopped gracefully")
308    case <-time.After(2 * time.Second):
309        fmt.Println("\nError tracker shutdown timed out")
310    }
311}

What this demonstrates:

Comprehensive error tracking with structured data
Multi-level severity handling for appropriate alerting
Metrics collection for error analysis and monitoring
Context preservation across service boundaries
Production-ready logging with structured JSON output
Graceful shutdown for error tracking system
Concurrent error processing with channels

Enterprise patterns implemented:

Structured error IDs for correlation
Severity-based routing and alerting
Real-time metrics collection
Context-rich error information
Background error processing
Integration hooks for external systems

Common Pitfalls and How to Avoid Them

Pitfall 1: Silent Failures

The most dangerous error handling mistake:

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "os"
 7)
 8
 9// ❌ WRONG: Silent failures that hide problems
10func badFileRead(filename string) string {
11    data, _ := os.ReadFile(filename) // Error ignored!
12    return string(data)
13}
14
15// ✅ CORRECT: Proper error handling
16func goodFileRead(filename string) (string, error) {
17    data, err := os.ReadFile(filename)
18    if err != nil {
19        return "", fmt.Errorf("failed to read file %s: %w", filename, err)
20    }
21    return string(data), nil
22}
23
24func main() {
25    fmt.Println("=== Silent Failure Pitfall ===\n")
26
27    // Bad example
28    fmt.Println("--- Bad Example (error ignored) ---")
29    result1 := badFileRead("nonexistent.txt")
30    fmt.Printf("Result: '%s' (empty because error was ignored)\n", result1)
31
32    // Good example
33    fmt.Println("\n--- Good Example (error handled) ---")
34    result2, err := goodFileRead("nonexistent.txt")
35    if err != nil {
36        fmt.Printf("Error: %v\n", err)
37    } else {
38        fmt.Printf("Result: %s\n", result2)
39    }
40}

Pitfall 2: Losing Context

 1// run
 2package main
 3
 4import (
 5    "errors"
 6    "fmt"
 7)
 8
 9// ❌ WRONG: Context lost in error chain
10func badProcessing(input string) error {
11    if input == "" {
12        return errors.New("invalid input") // What input? Where?
13    }
14
15    if len(input) > 1000 {
16        return errors.New("input too long") // How long?
17    }
18
19    return nil
20}
21
22// ✅ CORRECT: Preserve context throughout error chain
23func goodProcessing(operation string, input string) error {
24    if input == "" {
25        return fmt.Errorf("%s failed: input cannot be empty", operation)
26    }
27
28    if len(input) > 1000 {
29        return fmt.Errorf("%s failed: input too long (%d chars, max 1000)",
30            operation, len(input))
31    }
32
33    return nil
34}
35
36func main() {
37    fmt.Println("=== Context Loss Pitfall ===\n")
38
39    // Bad example
40    fmt.Println("--- Bad Example (no context) ---")
41    err1 := badProcessing("")
42    fmt.Printf("Error: %v (no context about what or where)\n", err1)
43
44    // Good example
45    fmt.Println("\n--- Good Example (with context) ---")
46    err2 := goodProcessing("user validation", "")
47    fmt.Printf("Error: %v (clear context)\n", err2)
48
49    err3 := goodProcessing("data import", string(make([]byte, 2000)))
50    fmt.Printf("Error: %v (includes details)\n", err3)
51}

Pitfall 3: Panic for Recoverable Errors

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6)
 7
 8// ❌ WRONG: Panic for recoverable errors
 9func badDivision(a, b float64) float64 {
10    if b == 0 {
11        panic("division by zero") // Should return error!
12    }
13    return a / b
14}
15
16// ✅ CORRECT: Return errors for recoverable conditions
17func goodDivision(a, b float64) (float64, error) {
18    if b == 0 {
19        return 0, fmt.Errorf("division by zero: %.2f / %.2f", a, b)
20    }
21    return a / b, nil
22}
23
24func main() {
25    fmt.Println("=== Panic vs Error Pitfall ===\n")
26
27    // Bad example (wrapped in recover to prevent crash)
28    fmt.Println("--- Bad Example (panics) ---")
29    func() {
30        defer func() {
31            if r := recover(); r != nil {
32                fmt.Printf("Recovered from panic: %v\n", r)
33                fmt.Println("(This crashes the program without recover)")
34            }
35        }()
36        result := badDivision(10, 0)
37        fmt.Printf("Result: %.2f\n", result)
38    }()
39
40    // Good example
41    fmt.Println("\n--- Good Example (returns error) ---")
42    result, err := goodDivision(10, 0)
43    if err != nil {
44        fmt.Printf("Error: %v (handled gracefully)\n", err)
45    } else {
46        fmt.Printf("Result: %.2f\n", result)
47    }
48
49    // Successful operation
50    result, err = goodDivision(10, 2)
51    if err != nil {
52        fmt.Printf("Error: %v\n", err)
53    } else {
54        fmt.Printf("Success: 10 / 2 = %.2f\n", result)
55    }
56}

Practice Exercises

Exercise 1: Basic Error Handling

Learning Objectives: Master fundamental error handling patterns including error creation, checking, and context addition.

Difficulty: Beginner

Real-World Context: Input validation is critical in all applications. This exercise teaches you to provide clear, actionable error messages that help users correct their input.

Task: Create a user registration function that validates:

Username (not empty, 3-20 characters, alphanumeric)
Email (not empty, contains @ and .)
Age (not empty, valid number, 18-100)
Return descriptive errors for each validation failure

Requirements:

Create separate validation functions for each field
Use error wrapping to preserve context
Provide clear error messages
Test with valid and invalid inputs

Show Solution

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "regexp"
  8    "strconv"
  9    "strings"
 10)
 11
 12var (
 13    ErrInvalidUsername = errors.New("invalid username")
 14    ErrInvalidEmail    = errors.New("invalid email")
 15    ErrInvalidAge      = errors.New("invalid age")
 16)
 17
 18func validateUsername(username string) error {
 19    if username == "" {
 20        return fmt.Errorf("%w: cannot be empty", ErrInvalidUsername)
 21    }
 22
 23    if len(username) < 3 || len(username) > 20 {
 24        return fmt.Errorf("%w: must be 3-20 characters (got %d)",
 25            ErrInvalidUsername, len(username))
 26    }
 27
 28    matched, _ := regexp.MatchString("^[a-zA-Z0-9]+$", username)
 29    if !matched {
 30        return fmt.Errorf("%w: must be alphanumeric only", ErrInvalidUsername)
 31    }
 32
 33    return nil
 34}
 35
 36func validateEmail(email string) error {
 37    if email == "" {
 38        return fmt.Errorf("%w: cannot be empty", ErrInvalidEmail)
 39    }
 40
 41    if !strings.Contains(email, "@") || !strings.Contains(email, ".") {
 42        return fmt.Errorf("%w: must contain @ and .", ErrInvalidEmail)
 43    }
 44
 45    parts := strings.Split(email, "@")
 46    if len(parts) != 2 {
 47        return fmt.Errorf("%w: invalid format", ErrInvalidEmail)
 48    }
 49
 50    if len(parts[0]) == 0 || len(parts[1]) == 0 {
 51        return fmt.Errorf("%w: missing local or domain part", ErrInvalidEmail)
 52    }
 53
 54    return nil
 55}
 56
 57func validateAge(ageStr string) (int, error) {
 58    if ageStr == "" {
 59        return 0, fmt.Errorf("%w: cannot be empty", ErrInvalidAge)
 60    }
 61
 62    age, err := strconv.Atoi(strings.TrimSpace(ageStr))
 63    if err != nil {
 64        return 0, fmt.Errorf("%w: must be a valid number: %v", ErrInvalidAge, err)
 65    }
 66
 67    if age < 18 {
 68        return 0, fmt.Errorf("%w: must be at least 18 (got %d)", ErrInvalidAge, age)
 69    }
 70
 71    if age > 100 {
 72        return 0, fmt.Errorf("%w: must be at most 100 (got %d)", ErrInvalidAge, age)
 73    }
 74
 75    return age, nil
 76}
 77
 78func registerUser(username, email, ageStr string) error {
 79    // Validate username
 80    if err := validateUsername(username); err != nil {
 81        return fmt.Errorf("registration failed: %w", err)
 82    }
 83
 84    // Validate email
 85    if err := validateEmail(email); err != nil {
 86        return fmt.Errorf("registration failed: %w", err)
 87    }
 88
 89    // Validate age
 90    age, err := validateAge(ageStr)
 91    if err != nil {
 92        return fmt.Errorf("registration failed: %w", err)
 93    }
 94
 95    fmt.Printf("✓ Successfully registered: %s (%s), age %d\n", username, email, age)
 96    return nil
 97}
 98
 99func main() {
100    fmt.Println("=== User Registration Validation ===\n")
101
102    testCases := []struct {
103        username string
104        email    string
105        age      string
106        desc     string
107    }{
108        {"alice", "alice@example.com", "25", "Valid user"},
109        {"", "bob@example.com", "30", "Empty username"},
110        {"ab", "charlie@example.com", "22", "Username too short"},
111        {"verylongusernamethatexceedslimit", "diana@example.com", "28", "Username too long"},
112        {"user@123", "eve@example.com", "35", "Username with special chars"},
113        {"frank", "", "40", "Empty email"},
114        {"grace", "invalidemail", "45", "Invalid email format"},
115        {"henry", "henry@example.com", "", "Empty age"},
116        {"iris", "iris@example.com", "invalid", "Invalid age format"},
117        {"jack", "jack@example.com", "15", "Age too young"},
118        {"kate", "kate@example.com", "150", "Age too old"},
119        {"leo", "leo@example.com", "30", "Valid user"},
120    }
121
122    for i, tc := range testCases {
123        fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
124        fmt.Printf("Input: username=%s, email=%s, age=%s\n", tc.username, tc.email, tc.age)
125
126        err := registerUser(tc.username, tc.email, tc.age)
127        if err != nil {
128            fmt.Printf("✗ Error: %v\n", err)
129
130            // Check error types
131            if errors.Is(err, ErrInvalidUsername) {
132                fmt.Println("  Type: Username validation error")
133            } else if errors.Is(err, ErrInvalidEmail) {
134                fmt.Println("  Type: Email validation error")
135            } else if errors.Is(err, ErrInvalidAge) {
136                fmt.Println("  Type: Age validation error")
137            }
138        }
139
140        fmt.Println()
141    }
142}

Key Concepts:

Sentinel errors for error type checking
Error wrapping with %w
Clear, actionable error messages
Error inspection with errors.Is()
Input validation patterns

Exercise 2: Custom Error Types

Learning Objectives: Create custom error types with rich context and implement error type assertions.

Difficulty: Intermediate

Real-World Context: APIs need to return structured errors with HTTP status codes, error codes, and detailed information. This exercise demonstrates building production-ready error types.

Task: Build an API error system that:

Defines custom error types for different HTTP status codes
Includes error codes, messages, and status codes
Implements retryability checking
Provides structured error information

Requirements:

Create at least 4 different error types (400, 401, 404, 500)
Include methods for accessing error properties
Implement error wrapping
Add detailed context to errors

Show Solution

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7    "time"
  8)
  9
 10// Base API error type
 11type APIError struct {
 12    StatusCode int
 13    Code       string
 14    Message    string
 15    Retryable  bool
 16    Timestamp  time.Time
 17    Details    map[string]interface{}
 18}
 19
 20func (e *APIError) Error() string {
 21    return fmt.Sprintf("[HTTP %d] %s: %s", e.StatusCode, e.Code, e.Message)
 22}
 23
 24// Specific error types
 25type BadRequestError struct {
 26    *APIError
 27}
 28
 29func NewBadRequestError(code, message string, details map[string]interface{}) *BadRequestError {
 30    return &BadRequestError{
 31        APIError: &APIError{
 32            StatusCode: 400,
 33            Code:       code,
 34            Message:    message,
 35            Retryable:  false,
 36            Timestamp:  time.Now(),
 37            Details:    details,
 38        },
 39    }
 40}
 41
 42type UnauthorizedError struct {
 43    *APIError
 44}
 45
 46func NewUnauthorizedError(message string) *UnauthorizedError {
 47    return &UnauthorizedError{
 48        APIError: &APIError{
 49            StatusCode: 401,
 50            Code:       "UNAUTHORIZED",
 51            Message:    message,
 52            Retryable:  false,
 53            Timestamp:  time.Now(),
 54        },
 55    }
 56}
 57
 58type NotFoundError struct {
 59    *APIError
 60    ResourceType string
 61    ResourceID   string
 62}
 63
 64func NewNotFoundError(resourceType, resourceID string) *NotFoundError {
 65    return &NotFoundError{
 66        APIError: &APIError{
 67            StatusCode: 404,
 68            Code:       "NOT_FOUND",
 69            Message:    fmt.Sprintf("%s not found: %s", resourceType, resourceID),
 70            Retryable:  false,
 71            Timestamp:  time.Now(),
 72            Details: map[string]interface{}{
 73                "resource_type": resourceType,
 74                "resource_id":   resourceID,
 75            },
 76        },
 77        ResourceType: resourceType,
 78        ResourceID:   resourceID,
 79    }
 80}
 81
 82type InternalServerError struct {
 83    *APIError
 84    Cause error
 85}
 86
 87func NewInternalServerError(message string, cause error) *InternalServerError {
 88    return &InternalServerError{
 89        APIError: &APIError{
 90            StatusCode: 500,
 91            Code:       "INTERNAL_ERROR",
 92            Message:    message,
 93            Retryable:  true,
 94            Timestamp:  time.Now(),
 95        },
 96        Cause: cause,
 97    }
 98}
 99
100func (e *InternalServerError) Unwrap() error {
101    return e.Cause
102}
103
104// API service
105type UserAPI struct {
106    users map[string]string
107}
108
109func NewUserAPI() *UserAPI {
110    return &UserAPI{
111        users: map[string]string{
112            "1": "Alice",
113            "2": "Bob",
114        },
115    }
116}
117
118func (api *UserAPI) GetUser(id, token string) (string, error) {
119    // Check authentication
120    if token == "" {
121        return "", NewUnauthorizedError("missing authentication token")
122    }
123
124    if token != "valid-token" {
125        return "", NewUnauthorizedError("invalid authentication token")
126    }
127
128    // Validate input
129    if id == "" {
130        return "", NewBadRequestError(
131            "INVALID_ID",
132            "user ID cannot be empty",
133            map[string]interface{}{
134                "field": "id",
135                "value": id,
136            },
137        )
138    }
139
140    // Check if user exists
141    name, exists := api.users[id]
142    if !exists {
143        return "", NewNotFoundError("User", id)
144    }
145
146    // Simulate internal error
147    if id == "999" {
148        return "", NewInternalServerError(
149            "database connection failed",
150            errors.New("connection timeout"),
151        )
152    }
153
154    return name, nil
155}
156
157func handleAPIError(err error) {
158    fmt.Println("\n=== Error Details ===")
159
160    // Try specific error types
161    var badReq *BadRequestError
162    var unauth *UnauthorizedError
163    var notFound *NotFoundError
164    var internal *InternalServerError
165
166    switch {
167    case errors.As(err, &badReq):
168        fmt.Printf("Type: Bad Request\n")
169        fmt.Printf("Status: %d\n", badReq.StatusCode)
170        fmt.Printf("Code: %s\n", badReq.Code)
171        fmt.Printf("Message: %s\n", badReq.Message)
172        fmt.Printf("Retryable: %v\n", badReq.Retryable)
173        if len(badReq.Details) > 0 {
174            fmt.Printf("Details: %v\n", badReq.Details)
175        }
176
177    case errors.As(err, &unauth):
178        fmt.Printf("Type: Unauthorized\n")
179        fmt.Printf("Status: %d\n", unauth.StatusCode)
180        fmt.Printf("Message: %s\n", unauth.Message)
181        fmt.Println("Action: Provide valid authentication")
182
183    case errors.As(err, &notFound):
184        fmt.Printf("Type: Not Found\n")
185        fmt.Printf("Status: %d\n", notFound.StatusCode)
186        fmt.Printf("Resource: %s (ID: %s)\n", notFound.ResourceType, notFound.ResourceID)
187        fmt.Println("Action: Check resource identifier")
188
189    case errors.As(err, &internal):
190        fmt.Printf("Type: Internal Server Error\n")
191        fmt.Printf("Status: %d\n", internal.StatusCode)
192        fmt.Printf("Message: %s\n", internal.Message)
193        fmt.Printf("Retryable: %v\n", internal.Retryable)
194        if internal.Cause != nil {
195            fmt.Printf("Cause: %v\n", internal.Cause)
196        }
197        fmt.Println("Action: Retry operation")
198
199    default:
200        fmt.Printf("Unknown error: %v\n", err)
201    }
202}
203
204func main() {
205    fmt.Println("=== API Error Types Demo ===")
206
207    api := NewUserAPI()
208
209    testCases := []struct {
210        id    string
211        token string
212        desc  string
213    }{
214        {"1", "valid-token", "Valid request"},
215        {"1", "", "Missing token"},
216        {"1", "invalid", "Invalid token"},
217        {"", "valid-token", "Empty ID"},
218        {"999", "valid-token", "Internal error"},
219        {"nonexistent", "valid-token", "User not found"},
220    }
221
222    for i, tc := range testCases {
223        fmt.Printf("\n--- Test %d: %s ---\n", i+1, tc.desc)
224        fmt.Printf("Request: GET /users/%s (token: %s)\n", tc.id, tc.token)
225
226        name, err := api.GetUser(tc.id, tc.token)
227        if err != nil {
228            handleAPIError(err)
229        } else {
230            fmt.Printf("\n✓ Success: User found: %s\n", name)
231        }
232    }
233}

Key Concepts:

Custom error types with embedded base type
HTTP status code mapping
Error type assertions with errors.As()
Retryability flags for error handling
Structured error details

Exercise 3: Error Wrapping Chain

Learning Objectives: Master error wrapping through multiple application layers and implement error chain inspection.

Difficulty: Intermediate

Real-World Context: Multi-tier applications need to preserve context as errors propagate from database → repository → service → controller. This exercise demonstrates layered error handling.

Task: Build a 3-layer application (database, repository, service) where:

Each layer wraps errors with its own context
Sentinel errors are defined at the database layer
Error inspection works through the entire chain
Error messages build a complete story

Requirements:

Implement 3 distinct layers
Use error wrapping with %w
Create sentinel errors for common cases
Test error chain inspection

Show Solution

  1// run
  2package main
  3
  4import (
  5    "errors"
  6    "fmt"
  7)
  8
  9// Sentinel errors at database layer
 10var (
 11    ErrNotFound      = errors.New("record not found")
 12    ErrDuplicateKey  = errors.New("duplicate key")
 13    ErrConnection    = errors.New("database connection failed")
 14)
 15
 16// Database layer
 17type Database struct {
 18    records map[string]string
 19}
 20
 21func NewDatabase() *Database {
 22    return &Database{
 23        records: map[string]string{
 24            "1": "Alice",
 25            "2": "Bob",
 26        },
 27    }
 28}
 29
 30func (db *Database) Get(id string) (string, error) {
 31    record, exists := db.records[id]
 32    if !exists {
 33        return "", fmt.Errorf("database: %w (id: %s)", ErrNotFound, id)
 34    }
 35    return record, nil
 36}
 37
 38func (db *Database) Insert(id, name string) error {
 39    if _, exists := db.records[id]; exists {
 40        return fmt.Errorf("database: %w (id: %s)", ErrDuplicateKey, id)
 41    }
 42    db.records[id] = name
 43    return nil
 44}
 45
 46// Repository layer
 47type UserRepository struct {
 48    db *Database
 49}
 50
 51func NewUserRepository(db *Database) *UserRepository {
 52    return &UserRepository{db: db}
 53}
 54
 55func (r *UserRepository) FindByID(id string) (string, error) {
 56    name, err := r.db.Get(id)
 57    if err != nil {
 58        return "", fmt.Errorf("repository: failed to find user %s: %w", id, err)
 59    }
 60    return name, nil
 61}
 62
 63func (r *UserRepository) Create(id, name string) error {
 64    if err := r.db.Insert(id, name); err != nil {
 65        return fmt.Errorf("repository: failed to create user %s: %w", id, err)
 66    }
 67    return nil
 68}
 69
 70// Service layer
 71type UserService struct {
 72    repo *UserRepository
 73}
 74
 75func NewUserService(repo *UserRepository) *UserService {
 76    return &UserService{repo: repo}
 77}
 78
 79func (s *UserService) GetUser(id string) (string, error) {
 80    if id == "" {
 81        return "", fmt.Errorf("service: invalid user ID")
 82    }
 83
 84    name, err := s.repo.FindByID(id)
 85    if err != nil {
 86        return "", fmt.Errorf("service: failed to get user: %w", err)
 87    }
 88
 89    return name, nil
 90}
 91
 92func (s *UserService) CreateUser(id, name string) error {
 93    if id == "" || name == "" {
 94        return fmt.Errorf("service: invalid input (id: %s, name: %s)", id, name)
 95    }
 96
 97    if err := s.repo.Create(id, name); err != nil {
 98        return fmt.Errorf("service: failed to create user: %w", err)
 99    }
100
101    return nil
102}
103
104// Error analysis
105func analyzeError(err error) {
106    fmt.Println("\n=== Error Analysis ===")
107    fmt.Printf("Full error message:\n%v\n\n", err)
108
109    // Check for sentinel errors
110    checks := map[string]error{
111        "ErrNotFound":     ErrNotFound,
112        "ErrDuplicateKey": ErrDuplicateKey,
113        "ErrConnection":   ErrConnection,
114    }
115
116    fmt.Println("Sentinel error checks:")
117    for name, sentinel := range checks {
118        if errors.Is(err, sentinel) {
119            fmt.Printf("✓ Error chain contains %s\n", name)
120        }
121    }
122
123    // Unwrap the error chain
124    fmt.Println("\nError chain (from outermost to innermost):")
125    currentErr := err
126    depth := 0
127    for currentErr != nil {
128        indent := ""
129        for i := 0; i < depth; i++ {
130            indent += "  "
131        }
132        fmt.Printf("%s%d: %v\n", indent, depth, currentErr)
133        currentErr = errors.Unwrap(currentErr)
134        depth++
135    }
136}
137
138func main() {
139    fmt.Println("=== Error Wrapping Chain Demo ===")
140
141    db := NewDatabase()
142    repo := NewUserRepository(db)
143    service := NewUserService(repo)
144
145    // Test 1: Successful operation
146    fmt.Println("\n--- Test 1: Successful GetUser ---")
147    name, err := service.GetUser("1")
148    if err != nil {
149        analyzeError(err)
150    } else {
151        fmt.Printf("✓ Success: Found user: %s\n", name)
152    }
153
154    // Test 2: Not found error (wrapped through all layers)
155    fmt.Println("\n--- Test 2: User Not Found ---")
156    name, err = service.GetUser("999")
157    if err != nil {
158        analyzeError(err)
159
160        // Demonstrate error-based logic
161        if errors.Is(err, ErrNotFound) {
162            fmt.Println("\nAction: Could return HTTP 404")
163        }
164    }
165
166    // Test 3: Duplicate key error
167    fmt.Println("\n--- Test 3: Duplicate Key ---")
168    err = service.CreateUser("1", "Charlie") // ID 1 already exists
169    if err != nil {
170        analyzeError(err)
171
172        if errors.Is(err, ErrDuplicateKey) {
173            fmt.Println("\nAction: Could return HTTP 409 Conflict")
174        }
175    }
176
177    // Test 4: Invalid input (no sentinel error)
178    fmt.Println("\n--- Test 4: Invalid Input ---")
179    err = service.CreateUser("", "")
180    if err != nil {
181        analyzeError(err)
182        fmt.Println("\nAction: Could return HTTP 400 Bad Request")
183    }
184
185    // Test 5: Successful create
186    fmt.Println("\n--- Test 5: Successful CreateUser ---")
187    err = service.CreateUser("3", "Charlie")
188    if err != nil {
189        analyzeError(err)
190    } else {
191        fmt.Println("✓ Success: User created")
192
193        // Verify
194        name, _ = service.GetUser("3")
195        fmt.Printf("✓ Verification: Found user: %s\n", name)
196    }
197}

Key Concepts:

Multi-layer error wrapping
Sentinel errors for common cases
Error chain preservation
Error inspection with errors.Is()
Context-rich error messages
Layer-specific error handling

Exercise 4: Retry Logic with Error Classification

Learning Objectives: Implement retry logic that classifies errors as retryable or permanent, with exponential backoff.

Difficulty: Advanced

Real-World Context: External API calls, database operations, and network requests often fail temporarily. This exercise teaches you to build resilient systems that retry transient failures while failing fast on permanent errors.

Task: Build a retry system that:

Classifies errors as temporary or permanent
Implements exponential backoff for retries
Respects context timeouts
Tracks retry attempts and timing
Provides detailed logging

Requirements:

Support configurable retry parameters
Implement exponential backoff calculation
Handle context cancellation
Track success/failure statistics

Show Solution

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "errors"
  7    "fmt"
  8    "math/rand"
  9    "time"
 10)
 11
 12// Error types
 13type TemporaryError struct {
 14    Msg string
 15}
 16
 17func (e *TemporaryError) Error() string {
 18    return fmt.Sprintf("temporary: %s", e.Msg)
 19}
 20
 21func (e *TemporaryError) Temporary() bool {
 22    return true
 23}
 24
 25type PermanentError struct {
 26    Msg string
 27}
 28
 29func (e *PermanentError) Error() string {
 30    return fmt.Sprintf("permanent: %s", e.Msg)
 31}
 32
 33// Retry configuration
 34type RetryConfig struct {
 35    MaxAttempts   int
 36    InitialDelay  time.Duration
 37    MaxDelay      time.Duration
 38    Multiplier    float64
 39}
 40
 41func DefaultRetryConfig() RetryConfig {
 42    return RetryConfig{
 43        MaxAttempts:  5,
 44        InitialDelay: 100 * time.Millisecond,
 45        MaxDelay:     5 * time.Second,
 46        Multiplier:   2.0,
 47    }
 48}
 49
 50// Retry statistics
 51type RetryStats struct {
 52    TotalAttempts int
 53    Successes     int
 54    Failures      int
 55    TotalDelay    time.Duration
 56}
 57
 58// Check if error is retryable
 59func isRetryable(err error) bool {
 60    type temporary interface {
 61        Temporary() bool
 62    }
 63
 64    var tempErr temporary
 65    if errors.As(err, &tempErr) && tempErr.Temporary() {
 66        return true
 67    }
 68
 69    var permErr *PermanentError
 70    if errors.As(err, &permErr) {
 71        return false
 72    }
 73
 74    // Default: assume retryable
 75    return true
 76}
 77
 78// Retry with exponential backoff
 79func RetryWithBackoff(ctx context.Context, operation func() error, config RetryConfig) (*RetryStats, error) {
 80    stats := &RetryStats{}
 81    delay := config.InitialDelay
 82
 83    for attempt := 1; attempt <= config.MaxAttempts; attempt++ {
 84        stats.TotalAttempts++
 85
 86        // Execute operation
 87        startTime := time.Now()
 88        err := operation()
 89        elapsed := time.Since(startTime)
 90
 91        if err == nil {
 92            stats.Successes++
 93            if attempt > 1 {
 94                fmt.Printf("✓ Success on attempt %d/%d (took %v)\n",
 95                    attempt, config.MaxAttempts, elapsed)
 96            }
 97            return stats, nil
 98        }
 99
100        // Check if error is retryable
101        if !isRetryable(err) {
102            stats.Failures++
103            fmt.Printf("✗ Permanent error on attempt %d: %v\n", attempt, err)
104            return stats, err
105        }
106
107        // Check if this was the last attempt
108        if attempt == config.MaxAttempts {
109            stats.Failures++
110            fmt.Printf("✗ All %d attempts exhausted: %v\n", config.MaxAttempts, err)
111            return stats, fmt.Errorf("max retries exceeded: %w", err)
112        }
113
114        // Log retry
115        fmt.Printf("⚠ Attempt %d/%d failed, retrying in %v: %v\n",
116            attempt, config.MaxAttempts, delay, err)
117
118        // Wait with exponential backoff
119        select {
120        case <-time.After(delay):
121            stats.TotalDelay += delay
122        case <-ctx.Done():
123            stats.Failures++
124            return stats, fmt.Errorf("context cancelled during retry: %w", ctx.Err())
125        }
126
127        // Calculate next delay
128        nextDelay := time.Duration(float64(delay) * config.Multiplier)
129        if nextDelay > config.MaxDelay {
130            nextDelay = config.MaxDelay
131        }
132        delay = nextDelay
133    }
134
135    stats.Failures++
136    return stats, errors.New("retry logic error: should not reach here")
137}
138
139// Simulated operations
140func simulatedOperation(failureRate float64, failureType string) func() error {
141    attempts := 0
142
143    return func() error {
144        attempts++
145
146        if rand.Float64() < failureRate {
147            switch failureType {
148            case "temporary":
149                return &TemporaryError{Msg: fmt.Sprintf("network timeout (attempt %d)", attempts)}
150            case "permanent":
151                return &PermanentError{Msg: "authentication failed"}
152            default:
153                return fmt.Errorf("unknown error (attempt %d)", attempts)
154            }
155        }
156
157        return nil
158    }
159}
160
161func main() {
162    rand.Seed(time.Now().UnixNano())
163    ctx := context.Background()
164
165    fmt.Println("=== Retry Logic with Error Classification ===\n")
166
167    config := DefaultRetryConfig()
168
169    // Test 1: Temporary failures with eventual success
170    fmt.Println("--- Test 1: Temporary Failures (50% rate) ---")
171    op1 := simulatedOperation(0.5, "temporary")
172    stats1, err := RetryWithBackoff(ctx, op1, config)
173
174    fmt.Printf("\nStatistics:\n")
175    fmt.Printf("  Total attempts: %d\n", stats1.TotalAttempts)
176    fmt.Printf("  Successes: %d\n", stats1.Successes)
177    fmt.Printf("  Failures: %d\n", stats1.Failures)
178    fmt.Printf("  Total delay: %v\n", stats1.TotalDelay)
179    if err != nil {
180        fmt.Printf("  Final error: %v\n", err)
181    }
182
183    time.Sleep(time.Second)
184
185    // Test 2: Permanent failure (immediate stop)
186    fmt.Println("\n--- Test 2: Permanent Failure ---")
187    op2 := simulatedOperation(1.0, "permanent")
188    stats2, err := RetryWithBackoff(ctx, op2, config)
189
190    fmt.Printf("\nStatistics:\n")
191    fmt.Printf("  Total attempts: %d\n", stats2.TotalAttempts)
192    fmt.Printf("  Failures: %d\n", stats2.Failures)
193    fmt.Printf("  Final error: %v\n", err)
194
195    time.Sleep(time.Second)
196
197    // Test 3: All attempts fail
198    fmt.Println("\n--- Test 3: All Attempts Fail (100% temporary failure) ---")
199    op3 := simulatedOperation(1.0, "temporary")
200    stats3, err := RetryWithBackoff(ctx, op3, config)
201
202    fmt.Printf("\nStatistics:\n")
203    fmt.Printf("  Total attempts: %d\n", stats3.TotalAttempts)
204    fmt.Printf("  Failures: %d\n", stats3.Failures)
205    fmt.Printf("  Total delay: %v\n", stats3.TotalDelay)
206    fmt.Printf("  Final error: %v\n", err)
207
208    time.Sleep(time.Second)
209
210    // Test 4: Context timeout
211    fmt.Println("\n--- Test 4: Context Timeout ---")
212    timeoutCtx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
213    defer cancel()
214
215    op4 := simulatedOperation(1.0, "temporary")
216    stats4, err := RetryWithBackoff(timeoutCtx, op4, config)
217
218    fmt.Printf("\nStatistics:\n")
219    fmt.Printf("  Total attempts: %d\n", stats4.TotalAttempts)
220    fmt.Printf("  Failures: %d\n", stats4.Failures)
221    fmt.Printf("  Total delay: %v\n", stats4.TotalDelay)
222    fmt.Printf("  Final error: %v\n", err)
223
224    // Test 5: Demonstrate backoff calculation
225    fmt.Println("\n--- Test 5: Backoff Timing Demo ---")
226    fmt.Println("Demonstrating exponential backoff delays:")
227
228    currentDelay := config.InitialDelay
229    for i := 1; i <= config.MaxAttempts; i++ {
230        fmt.Printf("Attempt %d: delay = %v\n", i, currentDelay)
231
232        nextDelay := time.Duration(float64(currentDelay) * config.Multiplier)
233        if nextDelay > config.MaxDelay {
234            nextDelay = config.MaxDelay
235        }
236        currentDelay = nextDelay
237    }
238}

Key Concepts:

Error classification (temporary vs permanent)
Exponential backoff algorithm
Context-aware retry logic
Retry statistics tracking
Configurable retry behavior
Early termination on permanent errors

Exercise 5: Comprehensive Error Monitoring System

Learning Objectives: Build a production-ready error monitoring system with structured logging, metrics collection, and alerting.

Difficulty: Advanced

Real-World Context: Production systems need comprehensive error tracking for debugging, monitoring, and alerting. This exercise demonstrates building an enterprise-grade error monitoring system.

Task: Implement an error monitoring system that:

Tracks all errors with structured metadata
Collects metrics (error counts, rates, types)
Implements severity-based handling
Provides error analytics and reporting
Simulates alerting for critical errors

Requirements:

Structured error logging with JSON
Real-time metrics collection
Severity-based routing
Background error processing
Graceful shutdown handling

Show Solution

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "encoding/json"
  7    "fmt"
  8    "math/rand"
  9    "sync"
 10    "time"
 11)
 12
 13// Severity levels
 14type Severity int
 15
 16const (
 17    SeverityDebug Severity = iota
 18    SeverityInfo
 19    SeverityWarning
 20    SeverityError
 21    SeverityCritical
 22)
 23
 24func (s Severity) String() string {
 25    return []string{"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}[s]
 26}
 27
 28// Structured error
 29type MonitoredError struct {
 30    ID        string                 `json:"id"`
 31    Timestamp time.Time              `json:"timestamp"`
 32    Severity  Severity               `json:"severity"`
 33    Service   string                 `json:"service"`
 34    Operation string                 `json:"operation"`
 35    Message   string                 `json:"message"`
 36    Code      string                 `json:"code"`
 37    Context   map[string]interface{} `json:"context,omitempty"`
 38    UserID    string                 `json:"user_id,omitempty"`
 39}
 40
 41// Error monitoring system
 42type ErrorMonitor struct {
 43    errors  chan *MonitoredError
 44    metrics map[string]int64
 45    mu      sync.RWMutex
 46}
 47
 48func NewErrorMonitor() *ErrorMonitor {
 49    return &ErrorMonitor{
 50        errors:  make(chan *MonitoredError, 100),
 51        metrics: make(map[string]int64),
 52    }
 53}
 54
 55func (em *ErrorMonitor) Track(err *MonitoredError) {
 56    select {
 57    case em.errors <- err:
 58    default:
 59        fmt.Printf("Warning: Error queue full, dropping error %s\n", err.ID)
 60    }
 61}
 62
 63func (em *ErrorMonitor) Start(ctx context.Context, wg *sync.WaitGroup) {
 64    defer wg.Done()
 65
 66    for {
 67        select {
 68        case <-ctx.Done():
 69            fmt.Println("Error monitor shutting down...")
 70            return
 71        case err := <-em.errors:
 72            em.processError(err)
 73        }
 74    }
 75}
 76
 77func (em *ErrorMonitor) processError(err *MonitoredError) {
 78    // Update metrics
 79    em.mu.Lock()
 80    em.metrics["total"]++
 81    em.metrics[err.Code]++
 82    em.metrics[err.Severity.String()]++
 83    em.mu.Unlock()
 84
 85    // Log error
 86    em.logError(err)
 87
 88    // Alert on critical errors
 89    if err.Severity >= SeverityCritical {
 90        em.sendAlert(err)
 91    }
 92}
 93
 94func (em *ErrorMonitor) logError(err *MonitoredError) {
 95    logEntry := map[string]interface{}{
 96        "id":        err.ID,
 97        "timestamp": err.Timestamp.Format(time.RFC3339),
 98        "severity":  err.Severity.String(),
 99        "service":   err.Service,
100        "operation": err.Operation,
101        "message":   err.Message,
102        "code":      err.Code,
103    }
104
105    if err.Context != nil {
106        logEntry["context"] = err.Context
107    }
108
109    if err.UserID != "" {
110        logEntry["user_id"] = err.UserID
111    }
112
113    jsonData, _ := json.Marshal(logEntry)
114    fmt.Printf("LOG: %s\n", string(jsonData))
115}
116
117func (em *ErrorMonitor) sendAlert(err *MonitoredError) {
118    fmt.Printf("ALERT: [%s] %s - %s (Error ID: %s)\n",
119        err.Severity.String(), err.Service, err.Message, err.ID)
120}
121
122func (em *ErrorMonitor) GetMetrics() map[string]int64 {
123    em.mu.RLock()
124    defer em.mu.RUnlock()
125
126    metrics := make(map[string]int64)
127    for k, v := range em.metrics {
128        metrics[k] = v
129    }
130    return metrics
131}
132
133func (em *ErrorMonitor) GetReport() string {
134    em.mu.RLock()
135    defer em.mu.RUnlock()
136
137    total := em.metrics["total"]
138    if total == 0 {
139        return "No errors recorded"
140    }
141
142    report := fmt.Sprintf("=== Error Report ===\n")
143    report += fmt.Sprintf("Total errors: %d\n\n", total)
144
145    report += "By severity:\n"
146    for i := SeverityDebug; i <= SeverityCritical; i++ {
147        count := em.metrics[i.String()]
148        if count > 0 {
149            pct := float64(count) / float64(total) * 100
150            report += fmt.Sprintf("  %s: %d (%.1f%%)\n", i.String(), count, pct)
151        }
152    }
153
154    report += "\nTop error codes:\n"
155    codes := make(map[string]int64)
156    for k, v := range em.metrics {
157        if k != "total" && k != SeverityDebug.String() &&
158           k != SeverityInfo.String() && k != SeverityWarning.String() &&
159           k != SeverityError.String() && k != SeverityCritical.String() {
160            codes[k] = v
161        }
162    }
163
164    // Print top 5 codes
165    count := 0
166    for code, num := range codes {
167        if count >= 5 {
168            break
169        }
170        pct := float64(num) / float64(total) * 100
171        report += fmt.Sprintf("  %s: %d (%.1f%%)\n", code, num, pct)
172        count++
173    }
174
175    return report
176}
177
178// Application service
179type OrderService struct {
180    monitor *ErrorMonitor
181}
182
183func NewOrderService(monitor *ErrorMonitor) *OrderService {
184    return &OrderService{monitor: monitor}
185}
186
187func (os *OrderService) ProcessOrder(orderID, userID string, amount float64) error {
188    // Generate request ID
189    requestID := fmt.Sprintf("req_%d", time.Now().UnixNano())
190
191    // Validation error
192    if amount <= 0 {
193        err := &MonitoredError{
194            ID:        requestID,
195            Timestamp: time.Now(),
196            Severity:  SeverityError,
197            Service:   "order-service",
198            Operation: "process-order",
199            Message:   "invalid order amount",
200            Code:      "INVALID_AMOUNT",
201            UserID:    userID,
202            Context: map[string]interface{}{
203                "order_id": orderID,
204                "amount":   amount,
205            },
206        }
207        os.monitor.Track(err)
208        return err
209    }
210
211    // Warning for large orders
212    if amount > 1000 {
213        err := &MonitoredError{
214            ID:        requestID,
215            Timestamp: time.Now(),
216            Severity:  SeverityWarning,
217            Service:   "order-service",
218            Operation: "process-order",
219            Message:   "large order detected",
220            Code:      "LARGE_ORDER",
221            UserID:    userID,
222            Context: map[string]interface{}{
223                "order_id": orderID,
224                "amount":   amount,
225            },
226        }
227        os.monitor.Track(err)
228    }
229
230    // Simulate random failures
231    r := rand.Float64()
232    if r < 0.1 {
233        // Critical error (10%)
234        err := &MonitoredError{
235            ID:        requestID,
236            Timestamp: time.Now(),
237            Severity:  SeverityCritical,
238            Service:   "order-service",
239            Operation: "process-order",
240            Message:   "payment gateway unavailable",
241            Code:      "GATEWAY_DOWN",
242            UserID:    userID,
243            Context: map[string]interface{}{
244                "order_id": orderID,
245                "amount":   amount,
246            },
247        }
248        os.monitor.Track(err)
249        return err
250    } else if r < 0.3 {
251        // Regular error (20%)
252        err := &MonitoredError{
253            ID:        requestID,
254            Timestamp: time.Now(),
255            Severity:  SeverityError,
256            Service:   "order-service",
257            Operation: "process-order",
258            Message:   "inventory check failed",
259            Code:      "INVENTORY_ERROR",
260            UserID:    userID,
261            Context: map[string]interface{}{
262                "order_id": orderID,
263                "amount":   amount,
264            },
265        }
266        os.monitor.Track(err)
267        return err
268    }
269
270    // Success
271    fmt.Printf("✓ Order processed: %s (user: %s, amount: $%.2f)\n", orderID, userID, amount)
272    return nil
273}
274
275func main() {
276    rand.Seed(time.Now().UnixNano())
277
278    fmt.Println("=== Comprehensive Error Monitoring System ===\n")
279
280    monitor := NewErrorMonitor()
281    ctx, cancel := context.WithCancel(context.Background())
282    defer cancel()
283
284    var wg sync.WaitGroup
285
286    // Start error monitor
287    wg.Add(1)
288    go monitor.Start(ctx, &wg)
289
290    time.Sleep(time.Millisecond * 100)
291
292    orderService := NewOrderService(monitor)
293
294    // Simulate various orders
295    fmt.Println("Processing orders...\n")
296
297    orders := []struct {
298        id     string
299        userID string
300        amount float64
301    }{
302        {"order1", "user1", 100},
303        {"order2", "user2", -50},     // Invalid
304        {"order3", "user3", 1500},    // Large
305        {"order4", "user4", 200},
306        {"order5", "user5", 300},
307        {"order6", "user6", 150},
308        {"order7", "user7", 2000},    // Large
309        {"order8", "user8", 250},
310        {"order9", "user9", 0},       // Invalid
311        {"order10", "user10", 175},
312    }
313
314    for _, order := range orders {
315        orderService.ProcessOrder(order.id, order.userID, order.amount)
316        time.Sleep(time.Millisecond * 100)
317    }
318
319    // Allow time for error processing
320    time.Sleep(time.Second)
321
322    // Print metrics
323    fmt.Println("\n" + monitor.GetReport())
324
325    // Detailed metrics
326    fmt.Println("\n=== Detailed Metrics ===")
327    metrics := monitor.GetMetrics()
328    for key, value := range metrics {
329        fmt.Printf("%s: %d\n", key, value)
330    }
331
332    // Shutdown
333    fmt.Println("\nShutting down...")
334    cancel()
335
336    done := make(chan struct{})
337    go func() {
338        wg.Wait()
339        close(done)
340    }()
341
342    select {
343    case <-done:
344        fmt.Println("Shutdown complete")
345    case <-time.After(2 * time.Second):
346        fmt.Println("Shutdown timeout")
347    }
348}

Key Concepts:

Structured error logging with JSON
Real-time metrics collection
Severity-based error handling
Background error processing
Error analytics and reporting
Graceful shutdown
Production monitoring patterns

Summary

Key Takeaways

Error handling philosophy:

Explicit over implicit: Errors must be handled where they occur
Values over exceptions: Errors are ordinary values, not special constructs
Context preservation: Each layer adds relevant context while preserving the cause
Graceful degradation: Handle failures without crashing when possible

Essential patterns:

Immediate checking: if err != nil after every function that can fail
Error wrapping: Use %w to preserve error chains with fmt.Errorf
Custom error types: Create structured errors with business context
Error inspection: Use errors.Is() and errors.As() for type-safe handling
Retry logic: Implement exponential backoff for transient failures
Context integration: Use context for timeouts and cancellation

Production considerations:

Structured logging: Log errors with context, timestamps, and correlation IDs
Metrics collection: Track error rates, types, and patterns
Severity levels: Route errors appropriately (debug, info, warning, error, critical)
Alerting: Escalate critical errors to operations teams
Error analytics: Build dashboards and reports from error data

Next Steps

Continue your Go learning journey with these topics:

Testing Error Paths - Master testing patterns for error handling
Distributed Error Handling - Handle errors in microservices
Monitoring and Observability - Build comprehensive error tracking
Resilience Patterns - Circuit breakers, bulkheads, retries
Error Design Principles - Design clear, actionable error APIs

Production Readiness

You now have the foundation for building production-ready Go applications with robust error handling. The patterns covered here are used in:

Web services and APIs with clear error responses
Microservice architectures with proper error propagation
Database systems with transaction error handling
File processing systems with graceful failure handling
Background job systems with comprehensive error tracking

Remember: Good error handling is not about preventing errors - it's about handling them gracefully. Master Go's explicit error handling model, and you'll build systems that are more reliable, debuggable, and maintainable.