Exercise: Retry Mechanism with Exponential Backoff
Difficulty - Intermediate
Learning Objectives
- Implement exponential backoff retry logic
- Add jitter to prevent thundering herd problems
- Handle context cancellation during retries
- Build circuit breaker patterns
- Create retry strategies for different failure types
Problem Statement
Create a robust retry mechanism that handles transient failures gracefully using exponential backoff and jitter. Your implementation should support context cancellation, configurable retry policies, and intelligent failure classification to determine retry eligibility.
Requirements
1. Basic Retry Function
Implement a retry function that:
- Accepts a context, configuration, and a function to execute
- Retries the function up to
MaxAttemptstimes on failure - Uses exponential backoff between retries
- Caps the maximum delay to prevent excessive waiting
- Respects context cancellation at any point
- Returns the last error if all attempts fail
Example Usage:
1cfg := RetryConfig{
2 MaxAttempts: 5,
3 InitialDelay: 100 * time.Millisecond,
4 MaxDelay: 10 * time.Second,
5 Multiplier: 2.0,
6}
7
8err := Retry(ctx, cfg, func(ctx context.Context) error {
9 return makeAPICall(ctx)
10})
2. Retry with Jitter
Add randomized jitter to prevent thundering herd:
- Implements full jitter:
delay = random(0, exponentialDelay) - Implements decorrelated jitter for better distribution
- Prevents multiple clients from retrying simultaneously
- Configurable jitter strategy
Example Usage:
1cfg := RetryConfig{
2 MaxAttempts: 3,
3 InitialDelay: 1 * time.Second,
4 MaxDelay: 30 * time.Second,
5 Multiplier: 2.0,
6 Jitter: FullJitter,
7}
8
9err := Retry(ctx, cfg, func(ctx context.Context) error {
10 return fetchFromDatabase(ctx)
11})
3. Retryable Error Classification
Create a system to classify errors as retryable or permanent:
- Implements
IsRetryable(error) boolfunction - Supports custom retryable error predicates
- Handles wrapped errors correctly
- Includes common retryable error types
- Stops retrying immediately on permanent errors
Example Usage:
1classifier := NewErrorClassifier()
2classifier.AddRetryable(IsNetworkError)
3classifier.AddRetryable(IsTemporaryDatabaseError)
4
5cfg := RetryConfig{
6 MaxAttempts: 5,
7 InitialDelay: 100 * time.Millisecond,
8 MaxDelay: 10 * time.Second,
9 Multiplier: 2.0,
10 ErrorClassifier: classifier,
11}
12
13err := Retry(ctx, cfg, func(ctx context.Context) error {
14 return doWork(ctx)
15})
16// Only retries if error is classified as retryable
4. Retry with Callback Hooks
Support lifecycle hooks for monitoring and debugging:
OnRetry(attempt int, err error, delay time.Duration)- called before each retryOnSuccess(attempt int)- called when operation succeedsOnFailure(err error)- called when all retries are exhausted- Useful for logging, metrics, and alerting
Example Usage:
1cfg := RetryConfig{
2 MaxAttempts: 3,
3 InitialDelay: 1 * time.Second,
4 MaxDelay: 5 * time.Second,
5 Multiplier: 2.0,
6 OnRetry: func(attempt int, err error, delay time.Duration) {
7 log.Printf("Retry %d after %v: %v", attempt, delay, err)
8 },
9 OnSuccess: func(attempt int) {
10 log.Printf("Success on attempt %d", attempt)
11 },
12}
5. Circuit Breaker Integration
Implement a circuit breaker to prevent retry storms:
- Opens circuit after N consecutive failures
- Stays open for a cooldown period
- Transitions to half-open state to test recovery
- Closes circuit after successful operations in half-open state
- Prevents retries when circuit is open
Example Usage:
1breaker := NewCircuitBreaker(CircuitBreakerConfig{
2 FailureThreshold: 5,
3 SuccessThreshold: 2,
4 Timeout: 30 * time.Second,
5})
6
7cfg := RetryConfig{
8 MaxAttempts: 3,
9 InitialDelay: 1 * time.Second,
10 CircuitBreaker: breaker,
11}
12
13err := Retry(ctx, cfg, func(ctx context.Context) error {
14 if breaker.State() == CircuitOpen {
15 return ErrCircuitOpen
16 }
17 return callExternalService(ctx)
18})
Function Signatures
1package retry
2
3import (
4 "context"
5 "errors"
6 "time"
7)
8
9// RetryConfig configures retry behavior
10type RetryConfig struct {
11 MaxAttempts int
12 InitialDelay time.Duration
13 MaxDelay time.Duration
14 Multiplier float64
15 Jitter JitterStrategy
16 ErrorClassifier *ErrorClassifier
17 CircuitBreaker *CircuitBreaker
18 OnRetry func(attempt int, err error, delay time.Duration)
19 OnSuccess func(attempt int)
20 OnFailure func(err error)
21}
22
23// JitterStrategy defines how jitter is applied
24type JitterStrategy int
25
26const (
27 NoJitter JitterStrategy = iota
28 FullJitter
29 EqualJitter
30 DecorrelatedJitter
31)
32
33// Retry executes fn with exponential backoff retry logic
34func Retry(ctx context.Context, cfg RetryConfig, fn func(context.Context) error) error
35
36// ErrorClassifier determines if errors are retryable
37type ErrorClassifier struct {
38 predicates []func(error) bool
39}
40
41// NewErrorClassifier creates a new error classifier
42func NewErrorClassifier() *ErrorClassifier
43
44// AddRetryable adds a predicate to classify errors as retryable
45func AddRetryable(predicate func(error) bool)
46
47// IsRetryable checks if an error should trigger a retry
48func IsRetryable(err error) bool
49
50// CircuitBreakerState represents the state of a circuit breaker
51type CircuitBreakerState int
52
53const (
54 CircuitClosed CircuitBreakerState = iota
55 CircuitOpen
56 CircuitHalfOpen
57)
58
59// CircuitBreakerConfig configures circuit breaker behavior
60type CircuitBreakerConfig struct {
61 FailureThreshold int
62 SuccessThreshold int
63 Timeout time.Duration
64}
65
66// CircuitBreaker implements the circuit breaker pattern
67type CircuitBreaker struct {
68 config CircuitBreakerConfig
69 state CircuitBreakerState
70 failures int
71 successes int
72 lastFailureTime time.Time
73}
74
75// NewCircuitBreaker creates a new circuit breaker
76func NewCircuitBreaker(cfg CircuitBreakerConfig) *CircuitBreaker
77
78// State returns the current circuit breaker state
79func State() CircuitBreakerState
80
81// RecordSuccess records a successful operation
82func RecordSuccess()
83
84// RecordFailure records a failed operation
85func RecordFailure()
Test Cases
Your implementation should pass these test scenarios:
1// Test successful retry after failures
2func TestRetrySuccessAfterFailures() {
3 attempt := 0
4 cfg := RetryConfig{
5 MaxAttempts: 3,
6 InitialDelay: 10 * time.Millisecond,
7 MaxDelay: 1 * time.Second,
8 Multiplier: 2.0,
9 }
10
11 err := Retry(context.Background(), cfg, func(ctx context.Context) error {
12 attempt++
13 if attempt < 3 {
14 return errors.New("temporary error")
15 }
16 return nil
17 })
18
19 assert.NoError(t, err)
20 assert.Equal(t, 3, attempt)
21}
22
23// Test context cancellation stops retries
24func TestRetryContextCancellation() {
25 ctx, cancel := context.WithCancel(context.Background())
26
27 cfg := RetryConfig{
28 MaxAttempts: 10,
29 InitialDelay: 100 * time.Millisecond,
30 MaxDelay: 1 * time.Second,
31 Multiplier: 2.0,
32 }
33
34 attempt := 0
35 go func() {
36 time.Sleep(150 * time.Millisecond)
37 cancel()
38 }()
39
40 err := Retry(ctx, cfg, func(ctx context.Context) error {
41 attempt++
42 return errors.New("always fails")
43 })
44
45 assert.Error(t, err)
46 assert.True(t, errors.Is(err, context.Canceled))
47 assert.Less(t, attempt, 10) // Should stop before max attempts
48}
49
50// Test exponential backoff timing
51func TestExponentialBackoff() {
52 cfg := RetryConfig{
53 MaxAttempts: 4,
54 InitialDelay: 100 * time.Millisecond,
55 MaxDelay: 10 * time.Second,
56 Multiplier: 2.0,
57 }
58
59 start := time.Now()
60 delays := []time.Duration{}
61
62 Retry(context.Background(), cfg, func(ctx context.Context) error {
63 if len(delays) > 0 {
64 delays = append(delays, time.Since(start))
65 }
66 start = time.Now()
67 return errors.New("fail")
68 })
69
70 // Expected delays: ~100ms, ~200ms, ~400ms
71 assert.InDelta(t, 100, delays[0].Milliseconds(), 50)
72 assert.InDelta(t, 200, delays[1].Milliseconds(), 50)
73 assert.InDelta(t, 400, delays[2].Milliseconds(), 50)
74}
75
76// Test non-retryable errors stop immediately
77func TestNonRetryableError() {
78 classifier := NewErrorClassifier()
79 classifier.AddRetryable(func(err error) bool {
80 return err.Error() != "permanent error"
81 })
82
83 cfg := RetryConfig{
84 MaxAttempts: 5,
85 InitialDelay: 10 * time.Millisecond,
86 ErrorClassifier: classifier,
87 }
88
89 attempt := 0
90 err := Retry(context.Background(), cfg, func(ctx context.Context) error {
91 attempt++
92 return errors.New("permanent error")
93 })
94
95 assert.Error(t, err)
96 assert.Equal(t, 1, attempt) // Should not retry
97}
Common Pitfalls
⚠️ Watch out for these common mistakes:
- Thundering herd: Without jitter, all clients retry at the same time, overwhelming the service
- Infinite retries: Always have a maximum attempt limit to prevent infinite loops
- Not respecting context: Check
ctx.Done()during delays, not just in the function - Retrying permanent errors: Don't retry 4xx errors or validation failures
- Delay calculation overflow: Cap delays to prevent time.Duration overflow with large multipliers
- Missing defer in circuit breaker: Always record the result in the circuit breaker
Hints
💡 Hint 1: Exponential Backoff Calculation
Calculate exponential backoff with capping:
1delay := time.Duration(float64(cfg.InitialDelay) * math.Pow(cfg.Multiplier, float64(attempt)))
2if delay > cfg.MaxDelay {
3 delay = cfg.MaxDelay
4}
💡 Hint 2: Full Jitter Implementation
Full jitter uses random value from 0 to calculated delay:
1func applyFullJitter(delay time.Duration) time.Duration {
2 if delay <= 0 {
3 return 0
4 }
5 return time.Duration(rand.Int63n(int64(delay)))
6}
💡 Hint 3: Context-Aware Sleep
Use select to respect context cancellation during delays:
1select {
2case <-time.After(delay):
3 // Continue to next retry
4case <-ctx.Done():
5 return ctx.Err()
6}
💡 Hint 4: Error Wrapping Detection
Use errors.Is and errors.As to check wrapped errors:
1func IsRetryable(err error) bool {
2 for _, predicate := range ec.predicates {
3 if predicate(err) {
4 return true
5 }
6 // Check wrapped errors
7 if unwrapped := errors.Unwrap(err); unwrapped != nil {
8 if predicate(unwrapped) {
9 return true
10 }
11 }
12 }
13 return false
14}
Solution
Click to see the solution
1package retry
2
3import (
4 "context"
5 "errors"
6 "math"
7 "math/rand"
8 "sync"
9 "time"
10)
11
12// RetryConfig configures retry behavior
13type RetryConfig struct {
14 MaxAttempts int
15 InitialDelay time.Duration
16 MaxDelay time.Duration
17 Multiplier float64
18 Jitter JitterStrategy
19 ErrorClassifier *ErrorClassifier
20 CircuitBreaker *CircuitBreaker
21 OnRetry func(attempt int, err error, delay time.Duration)
22 OnSuccess func(attempt int)
23 OnFailure func(err error)
24}
25
26// JitterStrategy defines how jitter is applied
27type JitterStrategy int
28
29const (
30 NoJitter JitterStrategy = iota
31 FullJitter
32 EqualJitter
33 DecorrelatedJitter
34)
35
36var ErrCircuitOpen = errors.New("circuit breaker is open")
37
38// Retry executes fn with exponential backoff retry logic
39func Retry(ctx context.Context, cfg RetryConfig, fn func(context.Context) error) error {
40 var lastErr error
41
42 for attempt := 0; attempt < cfg.MaxAttempts; attempt++ {
43 // Check circuit breaker
44 if cfg.CircuitBreaker != nil && cfg.CircuitBreaker.State() == CircuitOpen {
45 return ErrCircuitOpen
46 }
47
48 // Execute function
49 err := fn(ctx)
50
51 // Success
52 if err == nil {
53 if cfg.CircuitBreaker != nil {
54 cfg.CircuitBreaker.RecordSuccess()
55 }
56 if cfg.OnSuccess != nil {
57 cfg.OnSuccess(attempt + 1)
58 }
59 return nil
60 }
61
62 lastErr = err
63
64 // Record failure in circuit breaker
65 if cfg.CircuitBreaker != nil {
66 cfg.CircuitBreaker.RecordFailure()
67 }
68
69 // Check if error is retryable
70 if cfg.ErrorClassifier != nil && !cfg.ErrorClassifier.IsRetryable(err) {
71 if cfg.OnFailure != nil {
72 cfg.OnFailure(err)
73 }
74 return err
75 }
76
77 // Don't sleep after last attempt
78 if attempt == cfg.MaxAttempts-1 {
79 break
80 }
81
82 // Calculate delay with exponential backoff
83 delay := calculateDelay(cfg, attempt)
84
85 // Call OnRetry hook
86 if cfg.OnRetry != nil {
87 cfg.OnRetry(attempt+1, err, delay)
88 }
89
90 // Wait with context awareness
91 select {
92 case <-time.After(delay):
93 // Continue to next attempt
94 case <-ctx.Done():
95 return ctx.Err()
96 }
97 }
98
99 if cfg.OnFailure != nil {
100 cfg.OnFailure(lastErr)
101 }
102
103 return lastErr
104}
105
106func calculateDelay(cfg RetryConfig, attempt int) time.Duration {
107 // Calculate exponential backoff
108 delay := time.Duration(float64(cfg.InitialDelay) * math.Pow(cfg.Multiplier, float64(attempt)))
109
110 // Cap at max delay
111 if delay > cfg.MaxDelay {
112 delay = cfg.MaxDelay
113 }
114
115 // Apply jitter
116 switch cfg.Jitter {
117 case FullJitter:
118 delay = applyFullJitter(delay)
119 case EqualJitter:
120 delay = applyEqualJitter(delay)
121 case DecorrelatedJitter:
122 delay = applyDecorrelatedJitter(delay, cfg.InitialDelay)
123 }
124
125 return delay
126}
127
128func applyFullJitter(delay time.Duration) time.Duration {
129 if delay <= 0 {
130 return 0
131 }
132 return time.Duration(rand.Int63n(int64(delay)))
133}
134
135func applyEqualJitter(delay time.Duration) time.Duration {
136 if delay <= 0 {
137 return 0
138 }
139 half := delay / 2
140 jitter := time.Duration(rand.Int63n(int64(half)))
141 return half + jitter
142}
143
144func applyDecorrelatedJitter(delay, initial time.Duration) time.Duration {
145 if delay <= 0 {
146 return initial
147 }
148 return time.Duration(rand.Int63n(int64(delay*3))) + initial
149}
150
151// ErrorClassifier determines if errors are retryable
152type ErrorClassifier struct {
153 predicates []func(error) bool
154}
155
156// NewErrorClassifier creates a new error classifier
157func NewErrorClassifier() *ErrorClassifier {
158 return &ErrorClassifier{
159 predicates: make([]func(error) bool, 0),
160 }
161}
162
163// AddRetryable adds a predicate to classify errors as retryable
164func AddRetryable(predicate func(error) bool) {
165 ec.predicates = append(ec.predicates, predicate)
166}
167
168// IsRetryable checks if an error should trigger a retry
169func IsRetryable(err error) bool {
170 if err == nil {
171 return false
172 }
173
174 for _, predicate := range ec.predicates {
175 if predicate(err) {
176 return true
177 }
178 }
179
180 return false
181}
182
183// CircuitBreakerState represents the state of a circuit breaker
184type CircuitBreakerState int
185
186const (
187 CircuitClosed CircuitBreakerState = iota
188 CircuitOpen
189 CircuitHalfOpen
190)
191
192// CircuitBreakerConfig configures circuit breaker behavior
193type CircuitBreakerConfig struct {
194 FailureThreshold int
195 SuccessThreshold int
196 Timeout time.Duration
197}
198
199// CircuitBreaker implements the circuit breaker pattern
200type CircuitBreaker struct {
201 mu sync.RWMutex
202 config CircuitBreakerConfig
203 state CircuitBreakerState
204 failures int
205 successes int
206 lastFailureTime time.Time
207}
208
209// NewCircuitBreaker creates a new circuit breaker
210func NewCircuitBreaker(cfg CircuitBreakerConfig) *CircuitBreaker {
211 return &CircuitBreaker{
212 config: cfg,
213 state: CircuitClosed,
214 }
215}
216
217// State returns the current circuit breaker state
218func State() CircuitBreakerState {
219 cb.mu.RLock()
220 defer cb.mu.RUnlock()
221
222 // Check if we should transition from open to half-open
223 if cb.state == CircuitOpen {
224 if time.Since(cb.lastFailureTime) > cb.config.Timeout {
225 cb.state = CircuitHalfOpen
226 cb.successes = 0
227 }
228 }
229
230 return cb.state
231}
232
233// RecordSuccess records a successful operation
234func RecordSuccess() {
235 cb.mu.Lock()
236 defer cb.mu.Unlock()
237
238 cb.failures = 0
239
240 if cb.state == CircuitHalfOpen {
241 cb.successes++
242 if cb.successes >= cb.config.SuccessThreshold {
243 cb.state = CircuitClosed
244 cb.successes = 0
245 }
246 }
247}
248
249// RecordFailure records a failed operation
250func RecordFailure() {
251 cb.mu.Lock()
252 defer cb.mu.Unlock()
253
254 cb.failures++
255 cb.lastFailureTime = time.Now()
256
257 if cb.state == CircuitHalfOpen {
258 cb.state = CircuitOpen
259 cb.successes = 0
260 } else if cb.failures >= cb.config.FailureThreshold {
261 cb.state = CircuitOpen
262 }
263}
264
265// Common retryable error predicates
266
267// IsNetworkError checks if an error is a network error
268func IsNetworkError(err error) bool {
269 if err == nil {
270 return false
271 }
272 // In real implementation, check for specific network error types
273 return errors.Is(err, context.DeadlineExceeded) ||
274 errors.Is(err, context.Canceled)
275}
276
277// IsTemporaryError checks if an error is temporary
278func IsTemporaryError(err error) bool {
279 type temporary interface {
280 Temporary() bool
281 }
282
283 te, ok := err.(temporary)
284 return ok && te.Temporary()
285}
Key Takeaways
- Exponential backoff prevents overwhelming failing services during recovery
- Jitter prevents thundering herd problems when multiple clients retry simultaneously
- Context cancellation is critical for graceful shutdown and timeout handling
- Error classification ensures you don't retry permanent failures
- Circuit breakers prevent cascade failures and give systems time to recover
- Callback hooks enable observability and debugging of retry behavior
- Maximum delay caps prevent excessive waiting times