Why This Matters - Building Robust, Reliable Systems
Error handling is not just about catching problems - it's about building reliable, maintainable systems that handle failures gracefully. Go's explicit error handling forces you to think about failure at every step, creating more resilient code.
Real-world impact: Think about a payment processing system. When a database connection fails, does your application crash with an unhandled exception? Or does it log the error, retry with a different database, and notify operations? The difference impacts system reliability, user experience, and operational costs.
Business value: Proper error handling enables you to:
- Build reliable systems that recover from failures gracefully
- Provide clear debugging information for faster problem resolution
- Implement graceful degradation when components fail
- Create observable systems with comprehensive error tracking
- Design predictable APIs that clearly communicate failure modes
- Meet SLAs and reliability targets by anticipating and handling failures
System reliability: Go's error handling philosophy makes failures visible throughout your codebase, preventing silent failures that could cause production issues.
Learning Objectives
By the end of this tutorial, you will be able to:
- Understand Go's error handling philosophy and why explicit is better than implicit
- Master the error interface and create custom error types
- Implement error wrapping with proper context preservation
- Use error inspection techniques (
errors.Is()anderrors.As()) - Apply production-ready error handling patterns with logging and metrics
- Design APIs that provide clear, actionable error information
- Implement graceful degradation and recovery strategies
- Avoid common error handling pitfalls that lead to production issues
- Build comprehensive error tracking and monitoring systems
Core Concepts - Understanding Go's Error Philosophy
Explicit vs Implicit Error Handling
Go deliberately avoids exceptions and implicit error handling. Instead, it makes errors explicit, ordinary values that you must handle.
The philosophy: Errors are values, not exceptional conditions. This means:
- Errors are returned as ordinary function return values
- You must explicitly check and handle errors
- Error handling logic is visible in your code flow
- There's no hidden control flow like try/catch/finally
Why this matters: In languages with exceptions:
1// Java: Errors can come from anywhere without warning
2try {
3 processPayment(amount);
4 sendReceipt();
5 updateInventory();
6 updateAccountBalance();
7 // Any of these might throw - you must read documentation!
8} catch (Exception e) {
9 // What failed? Why? Is this recoverable?
10 // What state are we in now?
11 handleGenericError(e);
12}
Problems with exceptions:
- Hidden control flow: Any function might throw, but you can't see it in the code
- State uncertainty: When an exception is thrown, what state is your data in?
- All-or-nothing: Either everything succeeds or the whole operation fails
- Generic handling: Catch blocks often handle disparate errors the same way
- Performance: Exception handling has runtime overhead
Go's explicit approach:
1// Go: Each operation's potential for failure is explicit
2err := processPayment(amount)
3if err != nil {
4 return fmt.Errorf("payment processing failed: %w", err)
5}
6
7err = sendReceipt()
8if err != nil {
9 // Payment succeeded but receipt failed - we know exactly where we are
10 logError("receipt sending failed", err)
11 // Continue or compensate as needed
12}
13
14err = updateInventory()
15if err != nil {
16 return fmt.Errorf("inventory update failed: %w", err)
17}
18
19err = updateAccountBalance()
20if err != nil {
21 return fmt.Errorf("balance update failed: %w", err)
22}
Benefits of explicit handling:
- Clarity: You can see exactly which functions can fail
- Local handling: Errors are handled where they occur, with full context
- Predictable flow: No hidden control transfers or stack unwinding
- Context preservation: Each step can add its own context
- Fine-grained control: Different errors at different points handled differently
- State management: You know exactly what succeeded before the error
The Error Interface: Simplicity and Power
Go's error handling is built on a simple, elegant interface:
1type error interface {
2 Error() string
3}
What this means:
- Any type with an
Error() stringmethod is an error - No special syntax needed - errors are just values
- Flexible implementation - create rich error types with additional methods
- Interface satisfaction - your types can conform naturally
- Composition: Errors can wrap other errors, preserving the error chain
Philosophy: Errors are values, not special language constructs. This approach:
- Eliminates special error handling syntax
- Allows errors to carry additional data and methods
- Enables polymorphic error handling through interfaces
- Keeps the language simple and consistent
- Makes error handling testable and composable
Standard library error creation:
1// Simple error with fixed message
2err := errors.New("something went wrong")
3
4// Formatted error with dynamic content
5err := fmt.Errorf("failed to process user %s: %v", username, originalErr)
6
7// Error wrapping (Go 1.13+)
8err := fmt.Errorf("failed to connect: %w", originalErr)
Error Wrapping and Unwrapping
Go 1.13 introduced error wrapping, a powerful feature for preserving error chains:
Error wrapping (%w verb):
1if err != nil {
2 return fmt.Errorf("database query failed: %w", err)
3}
Why wrap errors?:
- Preserve the original error for inspection
- Add context at each layer of your application
- Enable error type checking through the chain
- Build informative error messages with full context
Error inspection:
1// Check if error is or wraps a specific error
2if errors.Is(err, sql.ErrNoRows) {
3 // Handle "no rows" error
4}
5
6// Extract specific error type from chain
7var netErr *net.OpError
8if errors.As(err, &netErr) {
9 // Access network-specific error details
10 fmt.Println("Operation:", netErr.Op)
11 fmt.Println("Network:", netErr.Net)
12}
Practical Examples - From Basics to Production
Example 1: Basic Error Creation and Handling
Let's start with fundamental error handling patterns:
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7 "strconv"
8 "strings"
9)
10
11// Function that can fail in multiple ways
12func validateAge(age string) (int, error) {
13 if age == "" {
14 return 0, errors.New("age cannot be empty")
15 }
16
17 // Trim whitespace
18 age = strings.TrimSpace(age)
19
20 ageInt, err := strconv.Atoi(age)
21 if err != nil {
22 return 0, fmt.Errorf("invalid age format '%s': %w", age, err)
23 }
24
25 if ageInt < 0 {
26 return 0, fmt.Errorf("age cannot be negative: %d", ageInt)
27 }
28
29 if ageInt > 120 {
30 return 0, fmt.Errorf("age %d seems unrealistic (must be 0-120)", ageInt)
31 }
32
33 return ageInt, nil
34}
35
36func registerUser(name, ageStr string) error {
37 fmt.Printf("Registering user: %s\n", name)
38
39 if name == "" {
40 return fmt.Errorf("user registration failed: name cannot be empty")
41 }
42
43 // Handle validation error with context
44 age, err := validateAge(ageStr)
45 if err != nil {
46 return fmt.Errorf("user registration failed for %s: %w", name, err)
47 }
48
49 fmt.Printf("Successfully registered user: %s, age %d\n", name, age)
50 return nil
51}
52
53func main() {
54 users := []struct {
55 name string
56 age string
57 }{
58 {"Alice", "25"},
59 {"Bob", "invalid"},
60 {"", "30"},
61 {"Charlie", "-5"},
62 {"Diana", "150"},
63 {"Eve", " 42 "}, // Test whitespace handling
64 }
65
66 fmt.Println("=== User Registration Demo ===\n")
67
68 for i, user := range users {
69 fmt.Printf("--- Test %d ---\n", i+1)
70 err := registerUser(user.name, user.age)
71 if err != nil {
72 fmt.Printf("Error: %v\n", err)
73 } else {
74 fmt.Printf("Success!\n")
75 }
76 fmt.Println()
77 }
78}
What this demonstrates:
- Simple error creation with
errors.New() - Error wrapping with
fmt.Errorf()and%wverb - Context addition at each level
- Error handling with
if err != nilpattern - Multiple error types: validation errors, parsing errors, business logic errors
- Error message formatting with relevant details
Key patterns established:
- Check errors immediately after calls
- Add relevant context at each level
- Use
%wto preserve error chains - Return early when errors occur
- Include relevant data in error messages
Example 2: Custom Error Types for Rich Context
Let's create domain-specific error types with additional behavior:
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7 "time"
8)
9
10// Domain-specific error codes
11type ErrorCode string
12
13const (
14 ErrCodeValidation ErrorCode = "VALIDATION_ERROR"
15 ErrCodeNetwork ErrorCode = "NETWORK_ERROR"
16 ErrCodeAuthentication ErrorCode = "AUTHENTICATION_ERROR"
17 ErrCodeAuthorization ErrorCode = "AUTHORIZATION_ERROR"
18 ErrCodeRateLimit ErrorCode = "RATE_LIMIT_ERROR"
19 ErrCodeResourceNotFound ErrorCode = "RESOURCE_NOT_FOUND"
20 ErrCodeConflict ErrorCode = "CONFLICT_ERROR"
21 ErrCodeInternal ErrorCode = "INTERNAL_ERROR"
22)
23
24// Custom error type with rich context
25type ServiceError struct {
26 Code ErrorCode
27 Message string
28 Timestamp time.Time
29 Retryable bool
30 StatusCode int // HTTP status code equivalent
31 Details map[string]interface{}
32 Cause error // Original error
33}
34
35func (e *ServiceError) Error() string {
36 if e.Cause != nil {
37 return fmt.Sprintf("[%s] %s: %v", e.Code, e.Message, e.Cause)
38 }
39 return fmt.Sprintf("[%s] %s", e.Code, e.Message)
40}
41
42func (e *ServiceError) Unwrap() error {
43 return e.Cause
44}
45
46// Check if error is retryable
47func (e *ServiceError) IsRetryable() bool {
48 return e.Retryable
49}
50
51// Get HTTP status code
52func (e *ServiceError) HTTPStatus() int {
53 return e.StatusCode
54}
55
56// Error constructors for different scenarios
57func NewValidationError(field string, value interface{}, reason string) *ServiceError {
58 return &ServiceError{
59 Code: ErrCodeValidation,
60 Message: fmt.Sprintf("validation failed for field '%s': %s", field, reason),
61 Timestamp: time.Now(),
62 Retryable: false,
63 StatusCode: 400,
64 Details: map[string]interface{}{
65 "field": field,
66 "value": value,
67 "reason": reason,
68 },
69 }
70}
71
72func NewAuthenticationError(username string, reason string) *ServiceError {
73 return &ServiceError{
74 Code: ErrCodeAuthentication,
75 Message: fmt.Sprintf("authentication failed for user '%s': %s", username, reason),
76 Timestamp: time.Now(),
77 Retryable: false,
78 StatusCode: 401,
79 Details: map[string]interface{}{
80 "username": username,
81 "reason": reason,
82 },
83 }
84}
85
86func NewRateLimitError(resource string, limit int, retryAfter time.Duration) *ServiceError {
87 return &ServiceError{
88 Code: ErrCodeRateLimit,
89 Message: fmt.Sprintf("rate limit exceeded for %s", resource),
90 Timestamp: time.Now(),
91 Retryable: true,
92 StatusCode: 429,
93 Details: map[string]interface{}{
94 "resource": resource,
95 "limit": limit,
96 "retry_after": retryAfter.String(),
97 },
98 }
99}
100
101func NewNotFoundError(resourceType string, identifier string) *ServiceError {
102 return &ServiceError{
103 Code: ErrCodeResourceNotFound,
104 Message: fmt.Sprintf("%s not found: %s", resourceType, identifier),
105 Timestamp: time.Now(),
106 Retryable: false,
107 StatusCode: 404,
108 Details: map[string]interface{}{
109 "resource_type": resourceType,
110 "identifier": identifier,
111 },
112 }
113}
114
115// Example service using custom errors
116type UserService struct {
117 users map[string]string // username -> password (simplified)
118 rateLimits map[string]int // username -> attempt count
119}
120
121func NewUserService() *UserService {
122 return &UserService{
123 users: map[string]string{
124 "alice": "password123",
125 "bob": "secure456",
126 },
127 rateLimits: make(map[string]int),
128 }
129}
130
131func (us *UserService) Login(username, password string) error {
132 // Validate input
133 if username == "" {
134 return NewValidationError("username", username, "cannot be empty")
135 }
136
137 if password == "" {
138 return NewValidationError("password", "***", "cannot be empty")
139 }
140
141 if len(password) < 8 {
142 return NewValidationError("password", "***", "must be at least 8 characters")
143 }
144
145 // Check rate limiting
146 attempts := us.rateLimits[username]
147 if attempts >= 3 {
148 return NewRateLimitError("login", 3, time.Minute*5)
149 }
150
151 // Check if user exists
152 storedPassword, exists := us.users[username]
153 if !exists {
154 us.rateLimits[username]++
155 return NewNotFoundError("user", username)
156 }
157
158 // Verify password
159 if password != storedPassword {
160 us.rateLimits[username]++
161 return NewAuthenticationError(username, "invalid credentials")
162 }
163
164 // Reset rate limit on successful login
165 delete(us.rateLimits, username)
166
167 fmt.Printf("Login successful for user: %s\n", username)
168 return nil
169}
170
171// Error handler that uses error type information
172func handleServiceError(err error) {
173 var serviceErr *ServiceError
174 if errors.As(err, &serviceErr) {
175 fmt.Printf("\n=== Service Error Details ===\n")
176 fmt.Printf("Code: %s\n", serviceErr.Code)
177 fmt.Printf("Message: %s\n", serviceErr.Message)
178 fmt.Printf("HTTP Status: %d\n", serviceErr.StatusCode)
179 fmt.Printf("Retryable: %v\n", serviceErr.Retryable)
180 fmt.Printf("Timestamp: %v\n", serviceErr.Timestamp.Format(time.RFC3339))
181
182 if len(serviceErr.Details) > 0 {
183 fmt.Printf("Details:\n")
184 for key, value := range serviceErr.Details {
185 fmt.Printf(" %s: %v\n", key, value)
186 }
187 }
188
189 // Provide actionable suggestions
190 switch serviceErr.Code {
191 case ErrCodeRateLimit:
192 if retryAfter, ok := serviceErr.Details["retry_after"]; ok {
193 fmt.Printf("\nSuggestion: Retry after %v\n", retryAfter)
194 }
195 case ErrCodeAuthentication:
196 fmt.Printf("\nSuggestion: Check credentials and try again\n")
197 case ErrCodeValidation:
198 fmt.Printf("\nSuggestion: Fix validation errors and resubmit\n")
199 case ErrCodeResourceNotFound:
200 fmt.Printf("\nSuggestion: Verify the resource identifier\n")
201 }
202 } else {
203 fmt.Printf("Generic error: %v\n", err)
204 }
205}
206
207func main() {
208 service := NewUserService()
209
210 fmt.Println("=== Custom Error Types Demo ===\n")
211
212 testCases := []struct {
213 name string
214 username string
215 password string
216 desc string
217 }{
218 {"Empty username", "", "password", "Validation error"},
219 {"Short password", "alice", "short", "Validation error"},
220 {"User not found", "charlie", "password123", "Not found error"},
221 {"Wrong password 1", "alice", "wrong", "Authentication error (attempt 1)"},
222 {"Wrong password 2", "alice", "wrong2", "Authentication error (attempt 2)"},
223 {"Wrong password 3", "alice", "wrong3", "Authentication error (attempt 3)"},
224 {"Rate limited", "alice", "password123", "Rate limit error"},
225 {"Valid login", "bob", "secure456", "Successful login"},
226 }
227
228 for i, tc := range testCases {
229 fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
230 err := service.Login(tc.username, tc.password)
231
232 if err != nil {
233 handleServiceError(err)
234 } else {
235 fmt.Println("Success!")
236 }
237
238 fmt.Println()
239 time.Sleep(time.Millisecond * 100)
240 }
241}
What this demonstrates:
- Custom error types with rich context and behavior
- Error constructors for consistent error creation
- Domain-specific error codes for structured error handling
- Error methods for accessing error properties (IsRetryable, HTTPStatus)
- Contextual information including timestamps, details maps, and causes
- Type assertion with
errors.As()for specialized error handling - Actionable error messages with suggestions for resolution
Production-ready patterns:
- Structured error information for debugging
- Retry logic based on error properties
- Rate limiting information in errors
- Timestamps for error correlation
- Business context in errors
- HTTP status code mapping for web services
Example 3: Error Wrapping and Inspection
Error wrapping preserves context while maintaining access to underlying errors:
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7 "os"
8)
9
10// Custom error types
11var (
12 ErrDatabase = errors.New("database error")
13 ErrNotFound = errors.New("resource not found")
14 ErrUnauthorized = errors.New("unauthorized access")
15 ErrInvalidInput = errors.New("invalid input")
16)
17
18// Simulated database layer
19type Database struct {
20 data map[string]string
21}
22
23func NewDatabase() *Database {
24 return &Database{
25 data: map[string]string{
26 "user:1": "Alice",
27 "user:2": "Bob",
28 },
29 }
30}
31
32func (db *Database) Get(key string) (string, error) {
33 value, exists := db.data[key]
34 if !exists {
35 return "", fmt.Errorf("key %s: %w", key, ErrNotFound)
36 }
37 return value, nil
38}
39
40// Repository layer (wraps database)
41type UserRepository struct {
42 db *Database
43}
44
45func NewUserRepository(db *Database) *UserRepository {
46 return &UserRepository{db: db}
47}
48
49func (r *UserRepository) FindByID(id string) (string, error) {
50 key := fmt.Sprintf("user:%s", id)
51 name, err := r.db.Get(key)
52 if err != nil {
53 return "", fmt.Errorf("repository: failed to find user %s: %w", id, err)
54 }
55 return name, nil
56}
57
58// Service layer (wraps repository)
59type UserService struct {
60 repo *UserRepository
61}
62
63func NewUserService(repo *UserRepository) *UserService {
64 return &UserService{repo: repo}
65}
66
67func (s *UserService) GetUser(id string) (string, error) {
68 if id == "" {
69 return "", fmt.Errorf("service: %w", ErrInvalidInput)
70 }
71
72 name, err := s.repo.FindByID(id)
73 if err != nil {
74 return "", fmt.Errorf("service: failed to get user: %w", err)
75 }
76
77 return name, nil
78}
79
80// Error inspection and handling
81func handleError(err error) {
82 fmt.Printf("\n=== Error Analysis ===\n")
83 fmt.Printf("Full error message: %v\n\n", err)
84
85 // Check for specific sentinel errors
86 if errors.Is(err, ErrNotFound) {
87 fmt.Println("✓ Error is or wraps ErrNotFound")
88 fmt.Println(" Action: Could return 404 to client")
89 }
90
91 if errors.Is(err, ErrInvalidInput) {
92 fmt.Println("✓ Error is or wraps ErrInvalidInput")
93 fmt.Println(" Action: Could return 400 to client")
94 }
95
96 if errors.Is(err, ErrUnauthorized) {
97 fmt.Println("✓ Error is or wraps ErrUnauthorized")
98 fmt.Println(" Action: Could return 401 to client")
99 }
100
101 if errors.Is(err, os.ErrNotExist) {
102 fmt.Println("✓ Error is or wraps os.ErrNotExist")
103 fmt.Println(" Action: File system issue")
104 }
105
106 // Unwrap the error chain manually
107 fmt.Println("\nError chain:")
108 currentErr := err
109 depth := 0
110 for currentErr != nil {
111 fmt.Printf(" %d: %v\n", depth, currentErr)
112 currentErr = errors.Unwrap(currentErr)
113 depth++
114 }
115}
116
117func main() {
118 db := NewDatabase()
119 repo := NewUserRepository(db)
120 service := NewUserService(repo)
121
122 fmt.Println("=== Error Wrapping and Inspection Demo ===")
123
124 // Test 1: Successful retrieval
125 fmt.Println("\n--- Test 1: Successful retrieval ---")
126 name, err := service.GetUser("1")
127 if err != nil {
128 handleError(err)
129 } else {
130 fmt.Printf("Success: Found user: %s\n", name)
131 }
132
133 // Test 2: User not found (wrapped through multiple layers)
134 fmt.Println("\n--- Test 2: User not found ---")
135 name, err = service.GetUser("999")
136 if err != nil {
137 handleError(err)
138 }
139
140 // Test 3: Invalid input
141 fmt.Println("\n--- Test 3: Invalid input ---")
142 name, err = service.GetUser("")
143 if err != nil {
144 handleError(err)
145 }
146
147 // Demonstrate error wrapping depth
148 fmt.Println("\n--- Test 4: Multiple wrapping layers ---")
149
150 // Create deeply wrapped error
151 baseErr := errors.New("network timeout")
152 layer1 := fmt.Errorf("connection failed: %w", baseErr)
153 layer2 := fmt.Errorf("database query failed: %w", layer1)
154 layer3 := fmt.Errorf("user lookup failed: %w", layer2)
155
156 fmt.Println("Deeply wrapped error:")
157 handleError(layer3)
158}
What this demonstrates:
- Error wrapping through multiple application layers
- Context preservation from database → repository → service
- Error inspection using
errors.Is()for sentinel errors - Error unwrapping to traverse the error chain
- Layered architecture with appropriate error handling at each level
- Actionable error handling based on error type inspection
Key concepts:
- Each layer adds its own context to errors
- Original error remains accessible through the chain
errors.Is()works through wrapped errors- Error messages build a complete story of what failed
- Different layers can make different decisions based on error types
Example 4: Advanced Error Handling with Retry Logic
Let's implement production-ready error handling with retry patterns:
1// run
2package main
3
4import (
5 "context"
6 "errors"
7 "fmt"
8 "math/rand"
9 "time"
10)
11
12// Error types for different failure scenarios
13type RetryableError struct {
14 Attempt int
15 Cause error
16 Timestamp time.Time
17}
18
19func (e *RetryableError) Error() string {
20 return fmt.Sprintf("attempt %d failed at %v: %v",
21 e.Attempt, e.Timestamp.Format("15:04:05"), e.Cause)
22}
23
24func (e *RetryableError) Unwrap() error {
25 return e.Cause
26}
27
28type TemporaryError struct {
29 Reason string
30}
31
32func (e *TemporaryError) Error() string {
33 return fmt.Sprintf("temporary failure: %s", e.Reason)
34}
35
36func (e *TemporaryError) Temporary() bool {
37 return true
38}
39
40type PermanentError struct {
41 Reason string
42}
43
44func (e *PermanentError) Error() string {
45 return fmt.Sprintf("permanent failure: %s", e.Reason)
46}
47
48// Retry configuration
49type RetryConfig struct {
50 MaxAttempts int
51 InitialDelay time.Duration
52 MaxDelay time.Duration
53 BackoffFactor float64
54 Timeout time.Duration
55}
56
57func DefaultRetryConfig() RetryConfig {
58 return RetryConfig{
59 MaxAttempts: 3,
60 InitialDelay: 100 * time.Millisecond,
61 MaxDelay: 10 * time.Second,
62 BackoffFactor: 2.0,
63 Timeout: 30 * time.Second,
64 }
65}
66
67// Check if error is retryable
68func isRetryable(err error) bool {
69 // Check for temporary interface
70 type temporary interface {
71 Temporary() bool
72 }
73
74 var tempErr temporary
75 if errors.As(err, &tempErr) {
76 return tempErr.Temporary()
77 }
78
79 // Check for specific retryable error types
80 var retryErr *RetryableError
81 if errors.As(err, &retryErr) {
82 return true
83 }
84
85 var tempError *TemporaryError
86 if errors.As(err, &tempError) {
87 return true
88 }
89
90 // Check for permanent errors
91 var permErr *PermanentError
92 if errors.As(err, &permErr) {
93 return false
94 }
95
96 // Default: assume retryable
97 return true
98}
99
100// Retry with exponential backoff
101func RetryWithBackoff(ctx context.Context, operation func() error, config RetryConfig) error {
102 var lastErr error
103 delay := config.InitialDelay
104
105 // Create timeout context
106 timeoutCtx, cancel := context.WithTimeout(ctx, config.Timeout)
107 defer cancel()
108
109 for attempt := 1; attempt <= config.MaxAttempts; attempt++ {
110 // Check context cancellation
111 select {
112 case <-timeoutCtx.Done():
113 return fmt.Errorf("operation timeout after %d attempts: %w", attempt-1, timeoutCtx.Err())
114 default:
115 }
116
117 // Execute operation
118 lastErr = operation()
119
120 if lastErr == nil {
121 if attempt > 1 {
122 fmt.Printf("✓ Operation succeeded on attempt %d\n", attempt)
123 }
124 return nil
125 }
126
127 // Check if error is retryable
128 if !isRetryable(lastErr) {
129 fmt.Printf("✗ Non-retryable error on attempt %d: %v\n", attempt, lastErr)
130 return &RetryableError{
131 Attempt: attempt,
132 Cause: lastErr,
133 Timestamp: time.Now(),
134 }
135 }
136
137 // Don't sleep after last attempt
138 if attempt == config.MaxAttempts {
139 break
140 }
141
142 // Log retry
143 fmt.Printf("⚠ Attempt %d/%d failed, retrying in %v: %v\n",
144 attempt, config.MaxAttempts, delay, lastErr)
145
146 // Wait with exponential backoff
147 select {
148 case <-time.After(delay):
149 // Continue to next attempt
150 case <-timeoutCtx.Done():
151 return fmt.Errorf("timeout during backoff: %w", timeoutCtx.Err())
152 }
153
154 // Calculate next delay
155 delay = time.Duration(float64(delay) * config.BackoffFactor)
156 if delay > config.MaxDelay {
157 delay = config.MaxDelay
158 }
159 }
160
161 return &RetryableError{
162 Attempt: config.MaxAttempts,
163 Cause: lastErr,
164 Timestamp: time.Now(),
165 }
166}
167
168// Simulated operations with different failure patterns
169type Service struct {
170 failureRate float64
171}
172
173func NewService(failureRate float64) *Service {
174 return &Service{failureRate: failureRate}
175}
176
177func (s *Service) TemporaryFailure() error {
178 if rand.Float64() < s.failureRate {
179 return &TemporaryError{Reason: "network timeout"}
180 }
181 return nil
182}
183
184func (s *Service) PermanentFailure() error {
185 if rand.Float64() < s.failureRate {
186 return &PermanentError{Reason: "invalid API key"}
187 }
188 return nil
189}
190
191func (s *Service) RandomFailure() error {
192 r := rand.Float64()
193 if r < s.failureRate/2 {
194 return &TemporaryError{Reason: "connection reset"}
195 } else if r < s.failureRate {
196 return errors.New("unknown error")
197 }
198 return nil
199}
200
201func main() {
202 rand.Seed(time.Now().UnixNano())
203 ctx := context.Background()
204
205 fmt.Println("=== Advanced Error Handling with Retry ===\n")
206
207 // Example 1: Temporary failures with retry
208 fmt.Println("--- Example 1: Temporary Failures (60% failure rate) ---")
209 service1 := NewService(0.6)
210 config := DefaultRetryConfig()
211
212 err := RetryWithBackoff(ctx, service1.TemporaryFailure, config)
213 if err != nil {
214 fmt.Printf("Final error: %v\n", err)
215 } else {
216 fmt.Println("Operation succeeded!")
217 }
218
219 time.Sleep(time.Second)
220
221 // Example 2: Permanent failure (no retry)
222 fmt.Println("\n--- Example 2: Permanent Failure ---")
223 service2 := NewService(1.0) // Always fail
224
225 err = RetryWithBackoff(ctx, service2.PermanentFailure, config)
226 if err != nil {
227 fmt.Printf("Final error: %v\n", err)
228
229 // Check error type
230 var retryErr *RetryableError
231 if errors.As(err, &retryErr) {
232 fmt.Printf("Failed after %d attempts\n", retryErr.Attempt)
233 }
234 }
235
236 time.Sleep(time.Second)
237
238 // Example 3: Timeout scenario
239 fmt.Println("\n--- Example 3: Operation Timeout ---")
240 shortConfig := config
241 shortConfig.Timeout = 500 * time.Millisecond
242 shortConfig.InitialDelay = 200 * time.Millisecond
243
244 service3 := NewService(1.0) // Always fail
245 err = RetryWithBackoff(ctx, service3.TemporaryFailure, shortConfig)
246 if err != nil {
247 fmt.Printf("Final error: %v\n", err)
248 }
249
250 time.Sleep(time.Second)
251
252 // Example 4: Success after retries
253 fmt.Println("\n--- Example 4: Success After Retries (30% failure rate) ---")
254 service4 := NewService(0.3)
255
256 err = RetryWithBackoff(ctx, service4.RandomFailure, config)
257 if err != nil {
258 fmt.Printf("Final error: %v\n", err)
259 } else {
260 fmt.Println("Operation succeeded!")
261 }
262
263 // Example 5: Exponential backoff demonstration
264 fmt.Println("\n--- Example 5: Exponential Backoff Calculation ---")
265 fmt.Println("Demonstrating backoff timing:")
266
267 delay := config.InitialDelay
268 for i := 1; i <= 5; i++ {
269 nextDelay := time.Duration(float64(delay) * config.BackoffFactor)
270 if nextDelay > config.MaxDelay {
271 nextDelay = config.MaxDelay
272 }
273 fmt.Printf("Attempt %d: delay = %v, next = %v\n", i, delay, nextDelay)
274 delay = nextDelay
275 }
276}
What this demonstrates:
- Sophisticated retry logic with exponential backoff
- Error classification for retryable vs non-retryable errors
- Context integration for timeouts and cancellation
- Temporary error detection using custom interfaces
- Backoff calculations to prevent overwhelming services
- Detailed logging of retry attempts and outcomes
Production patterns:
- Exponential backoff prevents service overload
- Context-aware operations respect timeouts
- Error classification guides retry decisions
- Comprehensive logging for debugging
- Configurable retry parameters
Example 5: Production Error Handling System
Let's build a comprehensive error handling framework for production:
1// run
2package main
3
4import (
5 "context"
6 "encoding/json"
7 "fmt"
8 "sync"
9 "time"
10)
11
12// Error severity levels
13type Severity int
14
15const (
16 SeverityDebug Severity = iota
17 SeverityInfo
18 SeverityWarning
19 SeverityError
20 SeverityCritical
21)
22
23func (s Severity) String() string {
24 return []string{"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}[s]
25}
26
27// Structured error for production systems
28type ProductionError struct {
29 ID string `json:"id"`
30 Timestamp time.Time `json:"timestamp"`
31 Severity Severity `json:"severity"`
32 Service string `json:"service"`
33 Operation string `json:"operation"`
34 Message string `json:"message"`
35 Code string `json:"code"`
36 Context map[string]interface{} `json:"context,omitempty"`
37 Cause error `json:"-"`
38 UserID string `json:"user_id,omitempty"`
39 RequestID string `json:"request_id,omitempty"`
40 StackTrace []string `json:"stack_trace,omitempty"`
41}
42
43func (e *ProductionError) Error() string {
44 if e.Cause != nil {
45 return fmt.Sprintf("[%s] %s: %v", e.Code, e.Message, e.Cause)
46 }
47 return fmt.Sprintf("[%s] %s", e.Code, e.Message)
48}
49
50func (e *ProductionError) Unwrap() error {
51 return e.Cause
52}
53
54// Error tracking system
55type ErrorTracker struct {
56 errors chan *ProductionError
57 metrics map[string]int64
58 mu sync.RWMutex
59}
60
61func NewErrorTracker() *ErrorTracker {
62 return &ErrorTracker{
63 errors: make(chan *ProductionError, 100),
64 metrics: make(map[string]int64),
65 }
66}
67
68func (et *ErrorTracker) Track(err *ProductionError) {
69 select {
70 case et.errors <- err:
71 default:
72 fmt.Printf("Warning: Error tracking queue full, dropping error: %v\n", err)
73 }
74}
75
76func (et *ErrorTracker) Start(ctx context.Context, wg *sync.WaitGroup) {
77 defer wg.Done()
78
79 for {
80 select {
81 case <-ctx.Done():
82 return
83 case err := <-et.errors:
84 et.processError(err)
85 }
86 }
87}
88
89func (et *ErrorTracker) processError(err *ProductionError) {
90 // Update metrics
91 et.mu.Lock()
92 et.metrics[err.Code]++
93 et.metrics["total"]++
94 if err.Severity >= SeverityError {
95 et.metrics["error_count"]++
96 }
97 et.mu.Unlock()
98
99 // Log structured error
100 et.logError(err)
101
102 // Send critical alerts
103 if err.Severity >= SeverityCritical {
104 et.sendAlert(err)
105 }
106}
107
108func (et *ErrorTracker) logError(err *ProductionError) {
109 logEntry := map[string]interface{}{
110 "timestamp": err.Timestamp.Format(time.RFC3339),
111 "severity": err.Severity.String(),
112 "service": err.Service,
113 "operation": err.Operation,
114 "message": err.Message,
115 "code": err.Code,
116 "error_id": err.ID,
117 }
118
119 if len(err.Context) > 0 {
120 logEntry["context"] = err.Context
121 }
122
123 if err.UserID != "" {
124 logEntry["user_id"] = err.UserID
125 }
126
127 if err.RequestID != "" {
128 logEntry["request_id"] = err.RequestID
129 }
130
131 jsonData, _ := json.Marshal(logEntry)
132 fmt.Printf("LOG: %s\n", string(jsonData))
133}
134
135func (et *ErrorTracker) sendAlert(err *ProductionError) {
136 fmt.Printf("ALERT: Critical error %s - %s\n", err.ID, err.Message)
137 // In production: send to PagerDuty, Slack, email, etc.
138}
139
140func (et *ErrorTracker) GetMetrics() map[string]int64 {
141 et.mu.RLock()
142 defer et.mu.RUnlock()
143
144 metrics := make(map[string]int64)
145 for k, v := range et.metrics {
146 metrics[k] = v
147 }
148 return metrics
149}
150
151// Service with integrated error handling
152type PaymentService struct {
153 tracker *ErrorTracker
154 mu sync.Mutex
155}
156
157func NewPaymentService(tracker *ErrorTracker) *PaymentService {
158 return &PaymentService{
159 tracker: tracker,
160 }
161}
162
163func (ps *PaymentService) ProcessPayment(userID string, amount float64, requestID string) error {
164 // Input validation
165 if amount <= 0 {
166 err := &ProductionError{
167 ID: requestID,
168 Timestamp: time.Now(),
169 Severity: SeverityError,
170 Service: "payment-service",
171 Operation: "process-payment",
172 Message: "invalid payment amount",
173 Code: "INVALID_AMOUNT",
174 UserID: userID,
175 RequestID: requestID,
176 Context: map[string]interface{}{
177 "amount": amount,
178 },
179 }
180 ps.tracker.Track(err)
181 return err
182 }
183
184 if amount > 10000 {
185 err := &ProductionError{
186 ID: requestID,
187 Timestamp: time.Now(),
188 Severity: SeverityWarning,
189 Service: "payment-service",
190 Operation: "process-payment",
191 Message: "large payment requires additional verification",
192 Code: "LARGE_PAYMENT",
193 UserID: userID,
194 RequestID: requestID,
195 Context: map[string]interface{}{
196 "amount": amount,
197 "threshold": 10000,
198 },
199 }
200 ps.tracker.Track(err)
201 return err
202 }
203
204 // Simulate processing (20% failure rate)
205 if rand.Intn(10) < 2 {
206 err := &ProductionError{
207 ID: requestID,
208 Timestamp: time.Now(),
209 Severity: SeverityError,
210 Service: "payment-service",
211 Operation: "process-payment",
212 Message: "payment gateway timeout",
213 Code: "GATEWAY_TIMEOUT",
214 UserID: userID,
215 RequestID: requestID,
216 Context: map[string]interface{}{
217 "amount": amount,
218 "gateway": "stripe",
219 },
220 }
221 ps.tracker.Track(err)
222 return err
223 }
224
225 fmt.Printf("✓ Payment processed: user=%s, amount=%.2f, request=%s\n",
226 userID, amount, requestID)
227 return nil
228}
229
230func generateRequestID() string {
231 return fmt.Sprintf("req_%d", time.Now().UnixNano())
232}
233
234func main() {
235 rand.Seed(time.Now().UnixNano())
236
237 fmt.Println("=== Production Error Handling System ===\n")
238
239 tracker := NewErrorTracker()
240 ctx, cancel := context.WithCancel(context.Background())
241 defer cancel()
242
243 var wg sync.WaitGroup
244
245 // Start error tracker
246 wg.Add(1)
247 go tracker.Start(ctx, &wg)
248
249 // Give tracker time to start
250 time.Sleep(time.Millisecond * 100)
251
252 paymentService := NewPaymentService(tracker)
253
254 // Test scenarios
255 testCases := []struct {
256 userID string
257 amount float64
258 desc string
259 }{
260 {"user1", 100.0, "Valid payment"},
261 {"user2", -50.0, "Invalid amount (negative)"},
262 {"user3", 0.0, "Invalid amount (zero)"},
263 {"user4", 15000.0, "Large payment (requires verification)"},
264 {"user5", 200.0, "Valid payment (may fail randomly)"},
265 {"user6", 300.0, "Valid payment (may fail randomly)"},
266 {"user7", 150.0, "Valid payment (may fail randomly)"},
267 }
268
269 fmt.Println("Processing payments...\n")
270
271 for i, tc := range testCases {
272 requestID := generateRequestID()
273 fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
274 fmt.Printf("User: %s, Amount: $%.2f\n", tc.userID, tc.amount)
275
276 err := paymentService.ProcessPayment(tc.userID, tc.amount, requestID)
277 if err != nil {
278 fmt.Printf("Error: %v\n", err)
279 }
280
281 fmt.Println()
282 time.Sleep(time.Millisecond * 100)
283 }
284
285 // Let error tracker process remaining errors
286 time.Sleep(time.Second)
287
288 // Show error metrics
289 fmt.Println("=== Error Metrics ===")
290 metrics := tracker.GetMetrics()
291
292 jsonData, _ := json.MarshalIndent(metrics, "", " ")
293 fmt.Printf("%s\n", string(jsonData))
294
295 // Shutdown
296 cancel()
297
298 // Wait for error tracker to stop
299 done := make(chan struct{})
300 go func() {
301 wg.Wait()
302 close(done)
303 }()
304
305 select {
306 case <-done:
307 fmt.Println("\nError tracker stopped gracefully")
308 case <-time.After(2 * time.Second):
309 fmt.Println("\nError tracker shutdown timed out")
310 }
311}
What this demonstrates:
- Comprehensive error tracking with structured data
- Multi-level severity handling for appropriate alerting
- Metrics collection for error analysis and monitoring
- Context preservation across service boundaries
- Production-ready logging with structured JSON output
- Graceful shutdown for error tracking system
- Concurrent error processing with channels
Enterprise patterns implemented:
- Structured error IDs for correlation
- Severity-based routing and alerting
- Real-time metrics collection
- Context-rich error information
- Background error processing
- Integration hooks for external systems
Common Pitfalls and How to Avoid Them
Pitfall 1: Silent Failures
The most dangerous error handling mistake:
1// run
2package main
3
4import (
5 "fmt"
6 "os"
7)
8
9// ❌ WRONG: Silent failures that hide problems
10func badFileRead(filename string) string {
11 data, _ := os.ReadFile(filename) // Error ignored!
12 return string(data)
13}
14
15// ✅ CORRECT: Proper error handling
16func goodFileRead(filename string) (string, error) {
17 data, err := os.ReadFile(filename)
18 if err != nil {
19 return "", fmt.Errorf("failed to read file %s: %w", filename, err)
20 }
21 return string(data), nil
22}
23
24func main() {
25 fmt.Println("=== Silent Failure Pitfall ===\n")
26
27 // Bad example
28 fmt.Println("--- Bad Example (error ignored) ---")
29 result1 := badFileRead("nonexistent.txt")
30 fmt.Printf("Result: '%s' (empty because error was ignored)\n", result1)
31
32 // Good example
33 fmt.Println("\n--- Good Example (error handled) ---")
34 result2, err := goodFileRead("nonexistent.txt")
35 if err != nil {
36 fmt.Printf("Error: %v\n", err)
37 } else {
38 fmt.Printf("Result: %s\n", result2)
39 }
40}
Pitfall 2: Losing Context
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7)
8
9// ❌ WRONG: Context lost in error chain
10func badProcessing(input string) error {
11 if input == "" {
12 return errors.New("invalid input") // What input? Where?
13 }
14
15 if len(input) > 1000 {
16 return errors.New("input too long") // How long?
17 }
18
19 return nil
20}
21
22// ✅ CORRECT: Preserve context throughout error chain
23func goodProcessing(operation string, input string) error {
24 if input == "" {
25 return fmt.Errorf("%s failed: input cannot be empty", operation)
26 }
27
28 if len(input) > 1000 {
29 return fmt.Errorf("%s failed: input too long (%d chars, max 1000)",
30 operation, len(input))
31 }
32
33 return nil
34}
35
36func main() {
37 fmt.Println("=== Context Loss Pitfall ===\n")
38
39 // Bad example
40 fmt.Println("--- Bad Example (no context) ---")
41 err1 := badProcessing("")
42 fmt.Printf("Error: %v (no context about what or where)\n", err1)
43
44 // Good example
45 fmt.Println("\n--- Good Example (with context) ---")
46 err2 := goodProcessing("user validation", "")
47 fmt.Printf("Error: %v (clear context)\n", err2)
48
49 err3 := goodProcessing("data import", string(make([]byte, 2000)))
50 fmt.Printf("Error: %v (includes details)\n", err3)
51}
Pitfall 3: Panic for Recoverable Errors
1// run
2package main
3
4import (
5 "fmt"
6)
7
8// ❌ WRONG: Panic for recoverable errors
9func badDivision(a, b float64) float64 {
10 if b == 0 {
11 panic("division by zero") // Should return error!
12 }
13 return a / b
14}
15
16// ✅ CORRECT: Return errors for recoverable conditions
17func goodDivision(a, b float64) (float64, error) {
18 if b == 0 {
19 return 0, fmt.Errorf("division by zero: %.2f / %.2f", a, b)
20 }
21 return a / b, nil
22}
23
24func main() {
25 fmt.Println("=== Panic vs Error Pitfall ===\n")
26
27 // Bad example (wrapped in recover to prevent crash)
28 fmt.Println("--- Bad Example (panics) ---")
29 func() {
30 defer func() {
31 if r := recover(); r != nil {
32 fmt.Printf("Recovered from panic: %v\n", r)
33 fmt.Println("(This crashes the program without recover)")
34 }
35 }()
36 result := badDivision(10, 0)
37 fmt.Printf("Result: %.2f\n", result)
38 }()
39
40 // Good example
41 fmt.Println("\n--- Good Example (returns error) ---")
42 result, err := goodDivision(10, 0)
43 if err != nil {
44 fmt.Printf("Error: %v (handled gracefully)\n", err)
45 } else {
46 fmt.Printf("Result: %.2f\n", result)
47 }
48
49 // Successful operation
50 result, err = goodDivision(10, 2)
51 if err != nil {
52 fmt.Printf("Error: %v\n", err)
53 } else {
54 fmt.Printf("Success: 10 / 2 = %.2f\n", result)
55 }
56}
Practice Exercises
Exercise 1: Basic Error Handling
Learning Objectives: Master fundamental error handling patterns including error creation, checking, and context addition.
Difficulty: Beginner
Real-World Context: Input validation is critical in all applications. This exercise teaches you to provide clear, actionable error messages that help users correct their input.
Task: Create a user registration function that validates:
- Username (not empty, 3-20 characters, alphanumeric)
- Email (not empty, contains @ and .)
- Age (not empty, valid number, 18-100)
- Return descriptive errors for each validation failure
Requirements:
- Create separate validation functions for each field
- Use error wrapping to preserve context
- Provide clear error messages
- Test with valid and invalid inputs
Show Solution
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7 "regexp"
8 "strconv"
9 "strings"
10)
11
12var (
13 ErrInvalidUsername = errors.New("invalid username")
14 ErrInvalidEmail = errors.New("invalid email")
15 ErrInvalidAge = errors.New("invalid age")
16)
17
18func validateUsername(username string) error {
19 if username == "" {
20 return fmt.Errorf("%w: cannot be empty", ErrInvalidUsername)
21 }
22
23 if len(username) < 3 || len(username) > 20 {
24 return fmt.Errorf("%w: must be 3-20 characters (got %d)",
25 ErrInvalidUsername, len(username))
26 }
27
28 matched, _ := regexp.MatchString("^[a-zA-Z0-9]+$", username)
29 if !matched {
30 return fmt.Errorf("%w: must be alphanumeric only", ErrInvalidUsername)
31 }
32
33 return nil
34}
35
36func validateEmail(email string) error {
37 if email == "" {
38 return fmt.Errorf("%w: cannot be empty", ErrInvalidEmail)
39 }
40
41 if !strings.Contains(email, "@") || !strings.Contains(email, ".") {
42 return fmt.Errorf("%w: must contain @ and .", ErrInvalidEmail)
43 }
44
45 parts := strings.Split(email, "@")
46 if len(parts) != 2 {
47 return fmt.Errorf("%w: invalid format", ErrInvalidEmail)
48 }
49
50 if len(parts[0]) == 0 || len(parts[1]) == 0 {
51 return fmt.Errorf("%w: missing local or domain part", ErrInvalidEmail)
52 }
53
54 return nil
55}
56
57func validateAge(ageStr string) (int, error) {
58 if ageStr == "" {
59 return 0, fmt.Errorf("%w: cannot be empty", ErrInvalidAge)
60 }
61
62 age, err := strconv.Atoi(strings.TrimSpace(ageStr))
63 if err != nil {
64 return 0, fmt.Errorf("%w: must be a valid number: %v", ErrInvalidAge, err)
65 }
66
67 if age < 18 {
68 return 0, fmt.Errorf("%w: must be at least 18 (got %d)", ErrInvalidAge, age)
69 }
70
71 if age > 100 {
72 return 0, fmt.Errorf("%w: must be at most 100 (got %d)", ErrInvalidAge, age)
73 }
74
75 return age, nil
76}
77
78func registerUser(username, email, ageStr string) error {
79 // Validate username
80 if err := validateUsername(username); err != nil {
81 return fmt.Errorf("registration failed: %w", err)
82 }
83
84 // Validate email
85 if err := validateEmail(email); err != nil {
86 return fmt.Errorf("registration failed: %w", err)
87 }
88
89 // Validate age
90 age, err := validateAge(ageStr)
91 if err != nil {
92 return fmt.Errorf("registration failed: %w", err)
93 }
94
95 fmt.Printf("✓ Successfully registered: %s (%s), age %d\n", username, email, age)
96 return nil
97}
98
99func main() {
100 fmt.Println("=== User Registration Validation ===\n")
101
102 testCases := []struct {
103 username string
104 email string
105 age string
106 desc string
107 }{
108 {"alice", "alice@example.com", "25", "Valid user"},
109 {"", "bob@example.com", "30", "Empty username"},
110 {"ab", "charlie@example.com", "22", "Username too short"},
111 {"verylongusernamethatexceedslimit", "diana@example.com", "28", "Username too long"},
112 {"user@123", "eve@example.com", "35", "Username with special chars"},
113 {"frank", "", "40", "Empty email"},
114 {"grace", "invalidemail", "45", "Invalid email format"},
115 {"henry", "henry@example.com", "", "Empty age"},
116 {"iris", "iris@example.com", "invalid", "Invalid age format"},
117 {"jack", "jack@example.com", "15", "Age too young"},
118 {"kate", "kate@example.com", "150", "Age too old"},
119 {"leo", "leo@example.com", "30", "Valid user"},
120 }
121
122 for i, tc := range testCases {
123 fmt.Printf("--- Test %d: %s ---\n", i+1, tc.desc)
124 fmt.Printf("Input: username=%s, email=%s, age=%s\n", tc.username, tc.email, tc.age)
125
126 err := registerUser(tc.username, tc.email, tc.age)
127 if err != nil {
128 fmt.Printf("✗ Error: %v\n", err)
129
130 // Check error types
131 if errors.Is(err, ErrInvalidUsername) {
132 fmt.Println(" Type: Username validation error")
133 } else if errors.Is(err, ErrInvalidEmail) {
134 fmt.Println(" Type: Email validation error")
135 } else if errors.Is(err, ErrInvalidAge) {
136 fmt.Println(" Type: Age validation error")
137 }
138 }
139
140 fmt.Println()
141 }
142}
Key Concepts:
- Sentinel errors for error type checking
- Error wrapping with
%w - Clear, actionable error messages
- Error inspection with
errors.Is() - Input validation patterns
Exercise 2: Custom Error Types
Learning Objectives: Create custom error types with rich context and implement error type assertions.
Difficulty: Intermediate
Real-World Context: APIs need to return structured errors with HTTP status codes, error codes, and detailed information. This exercise demonstrates building production-ready error types.
Task: Build an API error system that:
- Defines custom error types for different HTTP status codes
- Includes error codes, messages, and status codes
- Implements retryability checking
- Provides structured error information
Requirements:
- Create at least 4 different error types (400, 401, 404, 500)
- Include methods for accessing error properties
- Implement error wrapping
- Add detailed context to errors
Show Solution
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7 "time"
8)
9
10// Base API error type
11type APIError struct {
12 StatusCode int
13 Code string
14 Message string
15 Retryable bool
16 Timestamp time.Time
17 Details map[string]interface{}
18}
19
20func (e *APIError) Error() string {
21 return fmt.Sprintf("[HTTP %d] %s: %s", e.StatusCode, e.Code, e.Message)
22}
23
24// Specific error types
25type BadRequestError struct {
26 *APIError
27}
28
29func NewBadRequestError(code, message string, details map[string]interface{}) *BadRequestError {
30 return &BadRequestError{
31 APIError: &APIError{
32 StatusCode: 400,
33 Code: code,
34 Message: message,
35 Retryable: false,
36 Timestamp: time.Now(),
37 Details: details,
38 },
39 }
40}
41
42type UnauthorizedError struct {
43 *APIError
44}
45
46func NewUnauthorizedError(message string) *UnauthorizedError {
47 return &UnauthorizedError{
48 APIError: &APIError{
49 StatusCode: 401,
50 Code: "UNAUTHORIZED",
51 Message: message,
52 Retryable: false,
53 Timestamp: time.Now(),
54 },
55 }
56}
57
58type NotFoundError struct {
59 *APIError
60 ResourceType string
61 ResourceID string
62}
63
64func NewNotFoundError(resourceType, resourceID string) *NotFoundError {
65 return &NotFoundError{
66 APIError: &APIError{
67 StatusCode: 404,
68 Code: "NOT_FOUND",
69 Message: fmt.Sprintf("%s not found: %s", resourceType, resourceID),
70 Retryable: false,
71 Timestamp: time.Now(),
72 Details: map[string]interface{}{
73 "resource_type": resourceType,
74 "resource_id": resourceID,
75 },
76 },
77 ResourceType: resourceType,
78 ResourceID: resourceID,
79 }
80}
81
82type InternalServerError struct {
83 *APIError
84 Cause error
85}
86
87func NewInternalServerError(message string, cause error) *InternalServerError {
88 return &InternalServerError{
89 APIError: &APIError{
90 StatusCode: 500,
91 Code: "INTERNAL_ERROR",
92 Message: message,
93 Retryable: true,
94 Timestamp: time.Now(),
95 },
96 Cause: cause,
97 }
98}
99
100func (e *InternalServerError) Unwrap() error {
101 return e.Cause
102}
103
104// API service
105type UserAPI struct {
106 users map[string]string
107}
108
109func NewUserAPI() *UserAPI {
110 return &UserAPI{
111 users: map[string]string{
112 "1": "Alice",
113 "2": "Bob",
114 },
115 }
116}
117
118func (api *UserAPI) GetUser(id, token string) (string, error) {
119 // Check authentication
120 if token == "" {
121 return "", NewUnauthorizedError("missing authentication token")
122 }
123
124 if token != "valid-token" {
125 return "", NewUnauthorizedError("invalid authentication token")
126 }
127
128 // Validate input
129 if id == "" {
130 return "", NewBadRequestError(
131 "INVALID_ID",
132 "user ID cannot be empty",
133 map[string]interface{}{
134 "field": "id",
135 "value": id,
136 },
137 )
138 }
139
140 // Check if user exists
141 name, exists := api.users[id]
142 if !exists {
143 return "", NewNotFoundError("User", id)
144 }
145
146 // Simulate internal error
147 if id == "999" {
148 return "", NewInternalServerError(
149 "database connection failed",
150 errors.New("connection timeout"),
151 )
152 }
153
154 return name, nil
155}
156
157func handleAPIError(err error) {
158 fmt.Println("\n=== Error Details ===")
159
160 // Try specific error types
161 var badReq *BadRequestError
162 var unauth *UnauthorizedError
163 var notFound *NotFoundError
164 var internal *InternalServerError
165
166 switch {
167 case errors.As(err, &badReq):
168 fmt.Printf("Type: Bad Request\n")
169 fmt.Printf("Status: %d\n", badReq.StatusCode)
170 fmt.Printf("Code: %s\n", badReq.Code)
171 fmt.Printf("Message: %s\n", badReq.Message)
172 fmt.Printf("Retryable: %v\n", badReq.Retryable)
173 if len(badReq.Details) > 0 {
174 fmt.Printf("Details: %v\n", badReq.Details)
175 }
176
177 case errors.As(err, &unauth):
178 fmt.Printf("Type: Unauthorized\n")
179 fmt.Printf("Status: %d\n", unauth.StatusCode)
180 fmt.Printf("Message: %s\n", unauth.Message)
181 fmt.Println("Action: Provide valid authentication")
182
183 case errors.As(err, ¬Found):
184 fmt.Printf("Type: Not Found\n")
185 fmt.Printf("Status: %d\n", notFound.StatusCode)
186 fmt.Printf("Resource: %s (ID: %s)\n", notFound.ResourceType, notFound.ResourceID)
187 fmt.Println("Action: Check resource identifier")
188
189 case errors.As(err, &internal):
190 fmt.Printf("Type: Internal Server Error\n")
191 fmt.Printf("Status: %d\n", internal.StatusCode)
192 fmt.Printf("Message: %s\n", internal.Message)
193 fmt.Printf("Retryable: %v\n", internal.Retryable)
194 if internal.Cause != nil {
195 fmt.Printf("Cause: %v\n", internal.Cause)
196 }
197 fmt.Println("Action: Retry operation")
198
199 default:
200 fmt.Printf("Unknown error: %v\n", err)
201 }
202}
203
204func main() {
205 fmt.Println("=== API Error Types Demo ===")
206
207 api := NewUserAPI()
208
209 testCases := []struct {
210 id string
211 token string
212 desc string
213 }{
214 {"1", "valid-token", "Valid request"},
215 {"1", "", "Missing token"},
216 {"1", "invalid", "Invalid token"},
217 {"", "valid-token", "Empty ID"},
218 {"999", "valid-token", "Internal error"},
219 {"nonexistent", "valid-token", "User not found"},
220 }
221
222 for i, tc := range testCases {
223 fmt.Printf("\n--- Test %d: %s ---\n", i+1, tc.desc)
224 fmt.Printf("Request: GET /users/%s (token: %s)\n", tc.id, tc.token)
225
226 name, err := api.GetUser(tc.id, tc.token)
227 if err != nil {
228 handleAPIError(err)
229 } else {
230 fmt.Printf("\n✓ Success: User found: %s\n", name)
231 }
232 }
233}
Key Concepts:
- Custom error types with embedded base type
- HTTP status code mapping
- Error type assertions with
errors.As() - Retryability flags for error handling
- Structured error details
Exercise 3: Error Wrapping Chain
Learning Objectives: Master error wrapping through multiple application layers and implement error chain inspection.
Difficulty: Intermediate
Real-World Context: Multi-tier applications need to preserve context as errors propagate from database → repository → service → controller. This exercise demonstrates layered error handling.
Task: Build a 3-layer application (database, repository, service) where:
- Each layer wraps errors with its own context
- Sentinel errors are defined at the database layer
- Error inspection works through the entire chain
- Error messages build a complete story
Requirements:
- Implement 3 distinct layers
- Use error wrapping with
%w - Create sentinel errors for common cases
- Test error chain inspection
Show Solution
1// run
2package main
3
4import (
5 "errors"
6 "fmt"
7)
8
9// Sentinel errors at database layer
10var (
11 ErrNotFound = errors.New("record not found")
12 ErrDuplicateKey = errors.New("duplicate key")
13 ErrConnection = errors.New("database connection failed")
14)
15
16// Database layer
17type Database struct {
18 records map[string]string
19}
20
21func NewDatabase() *Database {
22 return &Database{
23 records: map[string]string{
24 "1": "Alice",
25 "2": "Bob",
26 },
27 }
28}
29
30func (db *Database) Get(id string) (string, error) {
31 record, exists := db.records[id]
32 if !exists {
33 return "", fmt.Errorf("database: %w (id: %s)", ErrNotFound, id)
34 }
35 return record, nil
36}
37
38func (db *Database) Insert(id, name string) error {
39 if _, exists := db.records[id]; exists {
40 return fmt.Errorf("database: %w (id: %s)", ErrDuplicateKey, id)
41 }
42 db.records[id] = name
43 return nil
44}
45
46// Repository layer
47type UserRepository struct {
48 db *Database
49}
50
51func NewUserRepository(db *Database) *UserRepository {
52 return &UserRepository{db: db}
53}
54
55func (r *UserRepository) FindByID(id string) (string, error) {
56 name, err := r.db.Get(id)
57 if err != nil {
58 return "", fmt.Errorf("repository: failed to find user %s: %w", id, err)
59 }
60 return name, nil
61}
62
63func (r *UserRepository) Create(id, name string) error {
64 if err := r.db.Insert(id, name); err != nil {
65 return fmt.Errorf("repository: failed to create user %s: %w", id, err)
66 }
67 return nil
68}
69
70// Service layer
71type UserService struct {
72 repo *UserRepository
73}
74
75func NewUserService(repo *UserRepository) *UserService {
76 return &UserService{repo: repo}
77}
78
79func (s *UserService) GetUser(id string) (string, error) {
80 if id == "" {
81 return "", fmt.Errorf("service: invalid user ID")
82 }
83
84 name, err := s.repo.FindByID(id)
85 if err != nil {
86 return "", fmt.Errorf("service: failed to get user: %w", err)
87 }
88
89 return name, nil
90}
91
92func (s *UserService) CreateUser(id, name string) error {
93 if id == "" || name == "" {
94 return fmt.Errorf("service: invalid input (id: %s, name: %s)", id, name)
95 }
96
97 if err := s.repo.Create(id, name); err != nil {
98 return fmt.Errorf("service: failed to create user: %w", err)
99 }
100
101 return nil
102}
103
104// Error analysis
105func analyzeError(err error) {
106 fmt.Println("\n=== Error Analysis ===")
107 fmt.Printf("Full error message:\n%v\n\n", err)
108
109 // Check for sentinel errors
110 checks := map[string]error{
111 "ErrNotFound": ErrNotFound,
112 "ErrDuplicateKey": ErrDuplicateKey,
113 "ErrConnection": ErrConnection,
114 }
115
116 fmt.Println("Sentinel error checks:")
117 for name, sentinel := range checks {
118 if errors.Is(err, sentinel) {
119 fmt.Printf("✓ Error chain contains %s\n", name)
120 }
121 }
122
123 // Unwrap the error chain
124 fmt.Println("\nError chain (from outermost to innermost):")
125 currentErr := err
126 depth := 0
127 for currentErr != nil {
128 indent := ""
129 for i := 0; i < depth; i++ {
130 indent += " "
131 }
132 fmt.Printf("%s%d: %v\n", indent, depth, currentErr)
133 currentErr = errors.Unwrap(currentErr)
134 depth++
135 }
136}
137
138func main() {
139 fmt.Println("=== Error Wrapping Chain Demo ===")
140
141 db := NewDatabase()
142 repo := NewUserRepository(db)
143 service := NewUserService(repo)
144
145 // Test 1: Successful operation
146 fmt.Println("\n--- Test 1: Successful GetUser ---")
147 name, err := service.GetUser("1")
148 if err != nil {
149 analyzeError(err)
150 } else {
151 fmt.Printf("✓ Success: Found user: %s\n", name)
152 }
153
154 // Test 2: Not found error (wrapped through all layers)
155 fmt.Println("\n--- Test 2: User Not Found ---")
156 name, err = service.GetUser("999")
157 if err != nil {
158 analyzeError(err)
159
160 // Demonstrate error-based logic
161 if errors.Is(err, ErrNotFound) {
162 fmt.Println("\nAction: Could return HTTP 404")
163 }
164 }
165
166 // Test 3: Duplicate key error
167 fmt.Println("\n--- Test 3: Duplicate Key ---")
168 err = service.CreateUser("1", "Charlie") // ID 1 already exists
169 if err != nil {
170 analyzeError(err)
171
172 if errors.Is(err, ErrDuplicateKey) {
173 fmt.Println("\nAction: Could return HTTP 409 Conflict")
174 }
175 }
176
177 // Test 4: Invalid input (no sentinel error)
178 fmt.Println("\n--- Test 4: Invalid Input ---")
179 err = service.CreateUser("", "")
180 if err != nil {
181 analyzeError(err)
182 fmt.Println("\nAction: Could return HTTP 400 Bad Request")
183 }
184
185 // Test 5: Successful create
186 fmt.Println("\n--- Test 5: Successful CreateUser ---")
187 err = service.CreateUser("3", "Charlie")
188 if err != nil {
189 analyzeError(err)
190 } else {
191 fmt.Println("✓ Success: User created")
192
193 // Verify
194 name, _ = service.GetUser("3")
195 fmt.Printf("✓ Verification: Found user: %s\n", name)
196 }
197}
Key Concepts:
- Multi-layer error wrapping
- Sentinel errors for common cases
- Error chain preservation
- Error inspection with
errors.Is() - Context-rich error messages
- Layer-specific error handling
Exercise 4: Retry Logic with Error Classification
Learning Objectives: Implement retry logic that classifies errors as retryable or permanent, with exponential backoff.
Difficulty: Advanced
Real-World Context: External API calls, database operations, and network requests often fail temporarily. This exercise teaches you to build resilient systems that retry transient failures while failing fast on permanent errors.
Task: Build a retry system that:
- Classifies errors as temporary or permanent
- Implements exponential backoff for retries
- Respects context timeouts
- Tracks retry attempts and timing
- Provides detailed logging
Requirements:
- Support configurable retry parameters
- Implement exponential backoff calculation
- Handle context cancellation
- Track success/failure statistics
Show Solution
1// run
2package main
3
4import (
5 "context"
6 "errors"
7 "fmt"
8 "math/rand"
9 "time"
10)
11
12// Error types
13type TemporaryError struct {
14 Msg string
15}
16
17func (e *TemporaryError) Error() string {
18 return fmt.Sprintf("temporary: %s", e.Msg)
19}
20
21func (e *TemporaryError) Temporary() bool {
22 return true
23}
24
25type PermanentError struct {
26 Msg string
27}
28
29func (e *PermanentError) Error() string {
30 return fmt.Sprintf("permanent: %s", e.Msg)
31}
32
33// Retry configuration
34type RetryConfig struct {
35 MaxAttempts int
36 InitialDelay time.Duration
37 MaxDelay time.Duration
38 Multiplier float64
39}
40
41func DefaultRetryConfig() RetryConfig {
42 return RetryConfig{
43 MaxAttempts: 5,
44 InitialDelay: 100 * time.Millisecond,
45 MaxDelay: 5 * time.Second,
46 Multiplier: 2.0,
47 }
48}
49
50// Retry statistics
51type RetryStats struct {
52 TotalAttempts int
53 Successes int
54 Failures int
55 TotalDelay time.Duration
56}
57
58// Check if error is retryable
59func isRetryable(err error) bool {
60 type temporary interface {
61 Temporary() bool
62 }
63
64 var tempErr temporary
65 if errors.As(err, &tempErr) && tempErr.Temporary() {
66 return true
67 }
68
69 var permErr *PermanentError
70 if errors.As(err, &permErr) {
71 return false
72 }
73
74 // Default: assume retryable
75 return true
76}
77
78// Retry with exponential backoff
79func RetryWithBackoff(ctx context.Context, operation func() error, config RetryConfig) (*RetryStats, error) {
80 stats := &RetryStats{}
81 delay := config.InitialDelay
82
83 for attempt := 1; attempt <= config.MaxAttempts; attempt++ {
84 stats.TotalAttempts++
85
86 // Execute operation
87 startTime := time.Now()
88 err := operation()
89 elapsed := time.Since(startTime)
90
91 if err == nil {
92 stats.Successes++
93 if attempt > 1 {
94 fmt.Printf("✓ Success on attempt %d/%d (took %v)\n",
95 attempt, config.MaxAttempts, elapsed)
96 }
97 return stats, nil
98 }
99
100 // Check if error is retryable
101 if !isRetryable(err) {
102 stats.Failures++
103 fmt.Printf("✗ Permanent error on attempt %d: %v\n", attempt, err)
104 return stats, err
105 }
106
107 // Check if this was the last attempt
108 if attempt == config.MaxAttempts {
109 stats.Failures++
110 fmt.Printf("✗ All %d attempts exhausted: %v\n", config.MaxAttempts, err)
111 return stats, fmt.Errorf("max retries exceeded: %w", err)
112 }
113
114 // Log retry
115 fmt.Printf("⚠ Attempt %d/%d failed, retrying in %v: %v\n",
116 attempt, config.MaxAttempts, delay, err)
117
118 // Wait with exponential backoff
119 select {
120 case <-time.After(delay):
121 stats.TotalDelay += delay
122 case <-ctx.Done():
123 stats.Failures++
124 return stats, fmt.Errorf("context cancelled during retry: %w", ctx.Err())
125 }
126
127 // Calculate next delay
128 nextDelay := time.Duration(float64(delay) * config.Multiplier)
129 if nextDelay > config.MaxDelay {
130 nextDelay = config.MaxDelay
131 }
132 delay = nextDelay
133 }
134
135 stats.Failures++
136 return stats, errors.New("retry logic error: should not reach here")
137}
138
139// Simulated operations
140func simulatedOperation(failureRate float64, failureType string) func() error {
141 attempts := 0
142
143 return func() error {
144 attempts++
145
146 if rand.Float64() < failureRate {
147 switch failureType {
148 case "temporary":
149 return &TemporaryError{Msg: fmt.Sprintf("network timeout (attempt %d)", attempts)}
150 case "permanent":
151 return &PermanentError{Msg: "authentication failed"}
152 default:
153 return fmt.Errorf("unknown error (attempt %d)", attempts)
154 }
155 }
156
157 return nil
158 }
159}
160
161func main() {
162 rand.Seed(time.Now().UnixNano())
163 ctx := context.Background()
164
165 fmt.Println("=== Retry Logic with Error Classification ===\n")
166
167 config := DefaultRetryConfig()
168
169 // Test 1: Temporary failures with eventual success
170 fmt.Println("--- Test 1: Temporary Failures (50% rate) ---")
171 op1 := simulatedOperation(0.5, "temporary")
172 stats1, err := RetryWithBackoff(ctx, op1, config)
173
174 fmt.Printf("\nStatistics:\n")
175 fmt.Printf(" Total attempts: %d\n", stats1.TotalAttempts)
176 fmt.Printf(" Successes: %d\n", stats1.Successes)
177 fmt.Printf(" Failures: %d\n", stats1.Failures)
178 fmt.Printf(" Total delay: %v\n", stats1.TotalDelay)
179 if err != nil {
180 fmt.Printf(" Final error: %v\n", err)
181 }
182
183 time.Sleep(time.Second)
184
185 // Test 2: Permanent failure (immediate stop)
186 fmt.Println("\n--- Test 2: Permanent Failure ---")
187 op2 := simulatedOperation(1.0, "permanent")
188 stats2, err := RetryWithBackoff(ctx, op2, config)
189
190 fmt.Printf("\nStatistics:\n")
191 fmt.Printf(" Total attempts: %d\n", stats2.TotalAttempts)
192 fmt.Printf(" Failures: %d\n", stats2.Failures)
193 fmt.Printf(" Final error: %v\n", err)
194
195 time.Sleep(time.Second)
196
197 // Test 3: All attempts fail
198 fmt.Println("\n--- Test 3: All Attempts Fail (100% temporary failure) ---")
199 op3 := simulatedOperation(1.0, "temporary")
200 stats3, err := RetryWithBackoff(ctx, op3, config)
201
202 fmt.Printf("\nStatistics:\n")
203 fmt.Printf(" Total attempts: %d\n", stats3.TotalAttempts)
204 fmt.Printf(" Failures: %d\n", stats3.Failures)
205 fmt.Printf(" Total delay: %v\n", stats3.TotalDelay)
206 fmt.Printf(" Final error: %v\n", err)
207
208 time.Sleep(time.Second)
209
210 // Test 4: Context timeout
211 fmt.Println("\n--- Test 4: Context Timeout ---")
212 timeoutCtx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
213 defer cancel()
214
215 op4 := simulatedOperation(1.0, "temporary")
216 stats4, err := RetryWithBackoff(timeoutCtx, op4, config)
217
218 fmt.Printf("\nStatistics:\n")
219 fmt.Printf(" Total attempts: %d\n", stats4.TotalAttempts)
220 fmt.Printf(" Failures: %d\n", stats4.Failures)
221 fmt.Printf(" Total delay: %v\n", stats4.TotalDelay)
222 fmt.Printf(" Final error: %v\n", err)
223
224 // Test 5: Demonstrate backoff calculation
225 fmt.Println("\n--- Test 5: Backoff Timing Demo ---")
226 fmt.Println("Demonstrating exponential backoff delays:")
227
228 currentDelay := config.InitialDelay
229 for i := 1; i <= config.MaxAttempts; i++ {
230 fmt.Printf("Attempt %d: delay = %v\n", i, currentDelay)
231
232 nextDelay := time.Duration(float64(currentDelay) * config.Multiplier)
233 if nextDelay > config.MaxDelay {
234 nextDelay = config.MaxDelay
235 }
236 currentDelay = nextDelay
237 }
238}
Key Concepts:
- Error classification (temporary vs permanent)
- Exponential backoff algorithm
- Context-aware retry logic
- Retry statistics tracking
- Configurable retry behavior
- Early termination on permanent errors
Exercise 5: Comprehensive Error Monitoring System
Learning Objectives: Build a production-ready error monitoring system with structured logging, metrics collection, and alerting.
Difficulty: Advanced
Real-World Context: Production systems need comprehensive error tracking for debugging, monitoring, and alerting. This exercise demonstrates building an enterprise-grade error monitoring system.
Task: Implement an error monitoring system that:
- Tracks all errors with structured metadata
- Collects metrics (error counts, rates, types)
- Implements severity-based handling
- Provides error analytics and reporting
- Simulates alerting for critical errors
Requirements:
- Structured error logging with JSON
- Real-time metrics collection
- Severity-based routing
- Background error processing
- Graceful shutdown handling
Show Solution
1// run
2package main
3
4import (
5 "context"
6 "encoding/json"
7 "fmt"
8 "math/rand"
9 "sync"
10 "time"
11)
12
13// Severity levels
14type Severity int
15
16const (
17 SeverityDebug Severity = iota
18 SeverityInfo
19 SeverityWarning
20 SeverityError
21 SeverityCritical
22)
23
24func (s Severity) String() string {
25 return []string{"DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"}[s]
26}
27
28// Structured error
29type MonitoredError struct {
30 ID string `json:"id"`
31 Timestamp time.Time `json:"timestamp"`
32 Severity Severity `json:"severity"`
33 Service string `json:"service"`
34 Operation string `json:"operation"`
35 Message string `json:"message"`
36 Code string `json:"code"`
37 Context map[string]interface{} `json:"context,omitempty"`
38 UserID string `json:"user_id,omitempty"`
39}
40
41// Error monitoring system
42type ErrorMonitor struct {
43 errors chan *MonitoredError
44 metrics map[string]int64
45 mu sync.RWMutex
46}
47
48func NewErrorMonitor() *ErrorMonitor {
49 return &ErrorMonitor{
50 errors: make(chan *MonitoredError, 100),
51 metrics: make(map[string]int64),
52 }
53}
54
55func (em *ErrorMonitor) Track(err *MonitoredError) {
56 select {
57 case em.errors <- err:
58 default:
59 fmt.Printf("Warning: Error queue full, dropping error %s\n", err.ID)
60 }
61}
62
63func (em *ErrorMonitor) Start(ctx context.Context, wg *sync.WaitGroup) {
64 defer wg.Done()
65
66 for {
67 select {
68 case <-ctx.Done():
69 fmt.Println("Error monitor shutting down...")
70 return
71 case err := <-em.errors:
72 em.processError(err)
73 }
74 }
75}
76
77func (em *ErrorMonitor) processError(err *MonitoredError) {
78 // Update metrics
79 em.mu.Lock()
80 em.metrics["total"]++
81 em.metrics[err.Code]++
82 em.metrics[err.Severity.String()]++
83 em.mu.Unlock()
84
85 // Log error
86 em.logError(err)
87
88 // Alert on critical errors
89 if err.Severity >= SeverityCritical {
90 em.sendAlert(err)
91 }
92}
93
94func (em *ErrorMonitor) logError(err *MonitoredError) {
95 logEntry := map[string]interface{}{
96 "id": err.ID,
97 "timestamp": err.Timestamp.Format(time.RFC3339),
98 "severity": err.Severity.String(),
99 "service": err.Service,
100 "operation": err.Operation,
101 "message": err.Message,
102 "code": err.Code,
103 }
104
105 if err.Context != nil {
106 logEntry["context"] = err.Context
107 }
108
109 if err.UserID != "" {
110 logEntry["user_id"] = err.UserID
111 }
112
113 jsonData, _ := json.Marshal(logEntry)
114 fmt.Printf("LOG: %s\n", string(jsonData))
115}
116
117func (em *ErrorMonitor) sendAlert(err *MonitoredError) {
118 fmt.Printf("ALERT: [%s] %s - %s (Error ID: %s)\n",
119 err.Severity.String(), err.Service, err.Message, err.ID)
120}
121
122func (em *ErrorMonitor) GetMetrics() map[string]int64 {
123 em.mu.RLock()
124 defer em.mu.RUnlock()
125
126 metrics := make(map[string]int64)
127 for k, v := range em.metrics {
128 metrics[k] = v
129 }
130 return metrics
131}
132
133func (em *ErrorMonitor) GetReport() string {
134 em.mu.RLock()
135 defer em.mu.RUnlock()
136
137 total := em.metrics["total"]
138 if total == 0 {
139 return "No errors recorded"
140 }
141
142 report := fmt.Sprintf("=== Error Report ===\n")
143 report += fmt.Sprintf("Total errors: %d\n\n", total)
144
145 report += "By severity:\n"
146 for i := SeverityDebug; i <= SeverityCritical; i++ {
147 count := em.metrics[i.String()]
148 if count > 0 {
149 pct := float64(count) / float64(total) * 100
150 report += fmt.Sprintf(" %s: %d (%.1f%%)\n", i.String(), count, pct)
151 }
152 }
153
154 report += "\nTop error codes:\n"
155 codes := make(map[string]int64)
156 for k, v := range em.metrics {
157 if k != "total" && k != SeverityDebug.String() &&
158 k != SeverityInfo.String() && k != SeverityWarning.String() &&
159 k != SeverityError.String() && k != SeverityCritical.String() {
160 codes[k] = v
161 }
162 }
163
164 // Print top 5 codes
165 count := 0
166 for code, num := range codes {
167 if count >= 5 {
168 break
169 }
170 pct := float64(num) / float64(total) * 100
171 report += fmt.Sprintf(" %s: %d (%.1f%%)\n", code, num, pct)
172 count++
173 }
174
175 return report
176}
177
178// Application service
179type OrderService struct {
180 monitor *ErrorMonitor
181}
182
183func NewOrderService(monitor *ErrorMonitor) *OrderService {
184 return &OrderService{monitor: monitor}
185}
186
187func (os *OrderService) ProcessOrder(orderID, userID string, amount float64) error {
188 // Generate request ID
189 requestID := fmt.Sprintf("req_%d", time.Now().UnixNano())
190
191 // Validation error
192 if amount <= 0 {
193 err := &MonitoredError{
194 ID: requestID,
195 Timestamp: time.Now(),
196 Severity: SeverityError,
197 Service: "order-service",
198 Operation: "process-order",
199 Message: "invalid order amount",
200 Code: "INVALID_AMOUNT",
201 UserID: userID,
202 Context: map[string]interface{}{
203 "order_id": orderID,
204 "amount": amount,
205 },
206 }
207 os.monitor.Track(err)
208 return err
209 }
210
211 // Warning for large orders
212 if amount > 1000 {
213 err := &MonitoredError{
214 ID: requestID,
215 Timestamp: time.Now(),
216 Severity: SeverityWarning,
217 Service: "order-service",
218 Operation: "process-order",
219 Message: "large order detected",
220 Code: "LARGE_ORDER",
221 UserID: userID,
222 Context: map[string]interface{}{
223 "order_id": orderID,
224 "amount": amount,
225 },
226 }
227 os.monitor.Track(err)
228 }
229
230 // Simulate random failures
231 r := rand.Float64()
232 if r < 0.1 {
233 // Critical error (10%)
234 err := &MonitoredError{
235 ID: requestID,
236 Timestamp: time.Now(),
237 Severity: SeverityCritical,
238 Service: "order-service",
239 Operation: "process-order",
240 Message: "payment gateway unavailable",
241 Code: "GATEWAY_DOWN",
242 UserID: userID,
243 Context: map[string]interface{}{
244 "order_id": orderID,
245 "amount": amount,
246 },
247 }
248 os.monitor.Track(err)
249 return err
250 } else if r < 0.3 {
251 // Regular error (20%)
252 err := &MonitoredError{
253 ID: requestID,
254 Timestamp: time.Now(),
255 Severity: SeverityError,
256 Service: "order-service",
257 Operation: "process-order",
258 Message: "inventory check failed",
259 Code: "INVENTORY_ERROR",
260 UserID: userID,
261 Context: map[string]interface{}{
262 "order_id": orderID,
263 "amount": amount,
264 },
265 }
266 os.monitor.Track(err)
267 return err
268 }
269
270 // Success
271 fmt.Printf("✓ Order processed: %s (user: %s, amount: $%.2f)\n", orderID, userID, amount)
272 return nil
273}
274
275func main() {
276 rand.Seed(time.Now().UnixNano())
277
278 fmt.Println("=== Comprehensive Error Monitoring System ===\n")
279
280 monitor := NewErrorMonitor()
281 ctx, cancel := context.WithCancel(context.Background())
282 defer cancel()
283
284 var wg sync.WaitGroup
285
286 // Start error monitor
287 wg.Add(1)
288 go monitor.Start(ctx, &wg)
289
290 time.Sleep(time.Millisecond * 100)
291
292 orderService := NewOrderService(monitor)
293
294 // Simulate various orders
295 fmt.Println("Processing orders...\n")
296
297 orders := []struct {
298 id string
299 userID string
300 amount float64
301 }{
302 {"order1", "user1", 100},
303 {"order2", "user2", -50}, // Invalid
304 {"order3", "user3", 1500}, // Large
305 {"order4", "user4", 200},
306 {"order5", "user5", 300},
307 {"order6", "user6", 150},
308 {"order7", "user7", 2000}, // Large
309 {"order8", "user8", 250},
310 {"order9", "user9", 0}, // Invalid
311 {"order10", "user10", 175},
312 }
313
314 for _, order := range orders {
315 orderService.ProcessOrder(order.id, order.userID, order.amount)
316 time.Sleep(time.Millisecond * 100)
317 }
318
319 // Allow time for error processing
320 time.Sleep(time.Second)
321
322 // Print metrics
323 fmt.Println("\n" + monitor.GetReport())
324
325 // Detailed metrics
326 fmt.Println("\n=== Detailed Metrics ===")
327 metrics := monitor.GetMetrics()
328 for key, value := range metrics {
329 fmt.Printf("%s: %d\n", key, value)
330 }
331
332 // Shutdown
333 fmt.Println("\nShutting down...")
334 cancel()
335
336 done := make(chan struct{})
337 go func() {
338 wg.Wait()
339 close(done)
340 }()
341
342 select {
343 case <-done:
344 fmt.Println("Shutdown complete")
345 case <-time.After(2 * time.Second):
346 fmt.Println("Shutdown timeout")
347 }
348}
Key Concepts:
- Structured error logging with JSON
- Real-time metrics collection
- Severity-based error handling
- Background error processing
- Error analytics and reporting
- Graceful shutdown
- Production monitoring patterns
Summary
Key Takeaways
Error handling philosophy:
- Explicit over implicit: Errors must be handled where they occur
- Values over exceptions: Errors are ordinary values, not special constructs
- Context preservation: Each layer adds relevant context while preserving the cause
- Graceful degradation: Handle failures without crashing when possible
Essential patterns:
- Immediate checking:
if err != nilafter every function that can fail - Error wrapping: Use
%wto preserve error chains withfmt.Errorf - Custom error types: Create structured errors with business context
- Error inspection: Use
errors.Is()anderrors.As()for type-safe handling - Retry logic: Implement exponential backoff for transient failures
- Context integration: Use context for timeouts and cancellation
Production considerations:
- Structured logging: Log errors with context, timestamps, and correlation IDs
- Metrics collection: Track error rates, types, and patterns
- Severity levels: Route errors appropriately (debug, info, warning, error, critical)
- Alerting: Escalate critical errors to operations teams
- Error analytics: Build dashboards and reports from error data
Next Steps
Continue your Go learning journey with these topics:
- Testing Error Paths - Master testing patterns for error handling
- Distributed Error Handling - Handle errors in microservices
- Monitoring and Observability - Build comprehensive error tracking
- Resilience Patterns - Circuit breakers, bulkheads, retries
- Error Design Principles - Design clear, actionable error APIs
Production Readiness
You now have the foundation for building production-ready Go applications with robust error handling. The patterns covered here are used in:
- Web services and APIs with clear error responses
- Microservice architectures with proper error propagation
- Database systems with transaction error handling
- File processing systems with graceful failure handling
- Background job systems with comprehensive error tracking
Remember: Good error handling is not about preventing errors - it's about handling them gracefully. Master Go's explicit error handling model, and you'll build systems that are more reliable, debuggable, and maintainable.