Go Runtime Internals

Why This Matters

🎯 Real-World Impact: Understanding Go's runtime internals is the difference between writing code that works and writing code that scales. At Google, understanding runtime internals enabled their ad serving system to handle 10x more traffic on the same hardware. At Dropbox, runtime optimization reduced file synchronization latency by 60%, enabling real-time collaboration features.

Think of the Go runtime as an invisible orchestra conductor managing thousands of musicians playing in harmony on a limited number of instruments. You don't see it, but it ensures everyone gets their turn to play at the right time.

πŸ” The Problem: Go makes concurrency look easy with go statements, but underneath lies a sophisticated system that determines whether your application scales or crashes. Misunderstanding the runtime leads to:

  • Memory leaks from goroutine mismanagement
  • CPU waste from poor scheduling decisions
  • Latency spikes from garbage collection pauses
  • Scaling bottlenecks that appear only under load

πŸ’‘ The Opportunity: Mastering runtime internals transforms you from a Go developer into a Go performance expert. You'll be able to diagnose mysterious slowdowns, optimize for specific hardware, and write code that scales gracefully from 10 to 10,000,000 requests per second.

Learning Objectives

By the end of this article, you will master:

  1. GMP Scheduler Mechanics - Understand how Go's work-stealing scheduler distributes millions of goroutines across CPU cores
  2. Performance Tuning - Learn to optimize GOMAXPROCS, GC settings, and memory allocation patterns
  3. Production Debugging - Master runtime metrics, profiling, and troubleshooting techniques
  4. Advanced Patterns - Build high-performance systems using goroutine pools, memory allocators, and lock-free data structures

Prerequisite Check: You should be comfortable with goroutines, channels, and basic Go concurrency patterns. If you've never wondered why your concurrent application isn't scaling as expected, you're ready for this deep dive.

Core Concepts - The GMP Scheduler Model

Before diving into code, let's understand the fundamental architecture that makes Go's concurrency efficient.

The M:N Threading Revolution

Traditional threading models have limitations:

  • 1:1 Threading: Limited to ~1000 concurrent threads
  • N:1 Threading: No true parallelism
  • Go's M:N Threading: M goroutines on N OS threads - the best of both worlds
 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "time"
 8)
 9
10// Demonstration of M:N efficiency
11func main() {
12    fmt.Println("=== M:N Threading Demonstration ===")
13
14    // Show available CPUs
15    numCPU := runtime.NumCPU()
16    runtime.GOMAXPROCS(numCPU)
17
18    fmt.Printf("System: %d CPUs\n", numCPU)
19    fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
20
21    // Launch many more goroutines than threads
22    const numGoroutines = 10000
23    fmt.Printf("Launching %d goroutines\n", numGoroutines)
24
25    start := time.Now()
26
27    // Channel to signal completion
28    done := make(chan struct{}, numGoroutines)
29
30    // Launch M goroutines that will run on N threads
31    for i := 0; i < numGoroutines; i++ {
32        go func(id int) {
33            // Simulate work
34            sum := 0
35            for j := 0; j < 1000; j++ {
36                sum += j * j
37            }
38
39            // Signal completion
40            done <- struct{}{}
41        }(i)
42    }
43
44    // Wait for all goroutines
45    for i := 0; i < numGoroutines; i++ {
46        <-done
47    }
48
49    elapsed := time.Since(start)
50    fmt.Printf("Completed %d goroutines in %v\n", numGoroutines, elapsed)
51    fmt.Printf("Throughput: %.0f goroutines/sec\n", float64(numGoroutines)/elapsed.Seconds())
52}

πŸ’‘ Key Insight: Go successfully scheduled 10,000 goroutines across typically 4-8 CPU cores, achieving massive concurrency with minimal overhead. This M:N model is why Go can handle millions of concurrent connections efficiently.

The GMP Components

Go's scheduler uses three key components working in harmony:

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Visualize GMP components in action
12func main() {
13    fmt.Println("=== GMP Component Visualization ===")
14
15    // Configure for demonstration
16    runtime.GOMAXPROCS(2) // Use 2 processors for clear demonstration
17    numCPU := runtime.NumCPU()
18
19    fmt.Printf("Available CPUs: %d\n", numCPU)
20    fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
21    fmt.Printf("Initial goroutines: %d\n\n", runtime.NumGoroutine())
22
23    var wg sync.WaitGroup
24
25    // Create CPU-bound goroutines
26    for i := 0; i < 8; i++ {
27        wg.Add(1)
28        go func(id int) {
29            defer wg.Done()
30
31            fmt.Printf("Goroutine %d: Starting\n", id)
32
33            // CPU-intensive work
34            sum := 0
35            for j := 0; j < 5000000; j++ {
36                sum += j * % 1000
37            }
38
39            fmt.Printf("Goroutine %d: Completed\n", id, sum%10000)
40        }(i)
41    }
42
43    // Monitor runtime behavior
44    done := make(chan struct{})
45    go func() {
46        ticker := time.NewTicker(100 * time.Millisecond)
47        defer ticker.Stop()
48
49        for {
50            select {
51            case <-ticker.C:
52                fmt.Printf("Runtime: %d goroutines, %d Gs waiting\n",
53                    runtime.NumGoroutine(), runtime.NumGoroutine()-1)
54            case <-done:
55                return
56            }
57        }
58    }()
59
60    wg.Wait()
61    close(done)
62
63    fmt.Printf("\nFinal: %d goroutines\n", runtime.NumGoroutine())
64}

The GMP model enables:

  • Efficient work stealing - idle processors can "steal" work from busy ones
  • Minimal overhead - goroutine context switches are ~10x faster than thread switches
  • Automatic load balancing - the runtime distributes work across available CPU cores

Practical Examples - Optimizing for Performance

Now let's apply this understanding to real-world scenarios where runtime knowledge makes the difference.

Example 1: Work Stealing in Action

The scheduler's work-stealing mechanism is crucial for load balancing. Let's see it in action:

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Demonstrate work stealing with imbalanced workloads
12func main() {
13    fmt.Println("=== Work Stealing Demonstration ===")
14
15    // Use multiple processors to enable work stealing
16    numCPU := runtime.NumCPU()
17    runtime.GOMAXPROCS(numCPU)
18
19    fmt.Printf("Using %d CPUs for work stealing\n", numCPU)
20
21    var wg sync.WaitGroup
22
23    // Scenario 1: Balanced workload
24    fmt.Println("\n--- Scenario 1: Balanced Workload ---")
25    start := time.Now()
26
27    for i := 0; i < numCPU; i++ {
28        wg.Add(1)
29        go func(id int) {
30            defer wg.Done()
31
32            // Equal work distribution
33            for j := 0; j < 10000000; j++ {
34                _ = j * j
35            }
36            fmt.Printf("Worker %d: Balanced work completed\n", id)
37        }(i)
38    }
39
40    wg.Wait()
41    balancedTime := time.Since(start)
42    fmt.Printf("Balanced workload: %v\n", balancedTime)
43
44    // Scenario 2: Imbalanced workload
45    fmt.Println("\n--- Scenario 2: Imbalanced Workload ---")
46    start = time.Now()
47
48    for i := 0; i < numCPU; i++ {
49        wg.Add(1)
50        go func(id int) {
51            defer wg.Done()
52
53            var workAmount int
54            if id == 0 {
55                // One goroutine gets 80% of work
56                workAmount = 8000000
57                fmt.Printf("Worker %d: Heavy workload\n", id, workAmount)
58            } else {
59                // Others get 20% combined
60                workAmount = 2000000 /
61                fmt.Printf("Worker %d: Light workload\n", id, workAmount)
62            }
63
64            for j := 0; j < workAmount; j++ {
65                _ = j * j
66            }
67        }(i)
68    }
69
70    wg.Wait()
71    imbalancedTime := time.Since(start)
72    fmt.Printf("Imbalanced workload: %v\n", imbalancedTime)
73
74    fmt.Printf("\nWork stealing efficiency: %.2fx\n",
75        float64(balancedTime)/float64(imbalancedTime))
76}

Example 2: Goroutine Lifecycle Management

Understanding goroutine states helps prevent leaks and optimize resource usage:

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Goroutine lifecycle patterns
12func main() {
13    fmt.Println("=== Goroutine Lifecycle Management ===")
14
15    showGoroutineCount("Start")
16
17    // Good pattern: Controlled goroutine lifecycle
18    fmt.Println("\n--- Good Pattern: Worker Pool ---")
19    goodWorkerPool()
20    showGoroutineCount("After worker pool")
21
22    // Bad pattern: Uncontrolled goroutine creation
23    fmt.Println("\n--- Bad Pattern: Goroutine Leak ---")
24    badGoroutineLeak()
25    showGoroutineCount("After goroutine leak")
26
27    // Give time for GC to clean up
28    time.Sleep(100 * time.Millisecond)
29    showGoroutineCount("After cleanup")
30}
31
32func showGoroutineCount(label string) {
33    fmt.Printf("%-20s: %d goroutines\n", label, runtime.NumGoroutine())
34}
35
36// Good pattern: Controlled worker pool
37func goodWorkerPool() {
38    const numWorkers = 4
39    const numJobs = 20
40
41    var wg sync.WaitGroup
42    jobs := make(chan int, numJobs)
43    results := make(chan int, numJobs)
44
45    // Start fixed number of workers
46    for i := 0; i < numWorkers; i++ {
47        wg.Add(1)
48        go func(id int) {
49            defer wg.Done()
50            fmt.Printf("Worker %d: Started\n", id)
51
52            for job := range jobs {
53                result := job * job
54                results <- result
55                fmt.Printf("Worker %d: Processed job %d\n", id, job)
56            }
57
58            fmt.Printf("Worker %d: Finished\n", id)
59        }(i)
60    }
61
62    // Send jobs
63    for j := 0; j < numJobs; j++ {
64        jobs <- j
65    }
66    close(jobs)
67
68    // Collect results
69    for r := 0; r < numJobs; r++ {
70        <-results
71    }
72
73    wg.Wait()
74    fmt.Printf("Worker pool: All %d jobs completed with %d workers\n",
75        numJobs, numWorkers)
76}
77
78// Bad pattern: Goroutine leak
79func badGoroutineLeak() {
80    // Create goroutines that never terminate
81    for i := 0; i < 10; i++ {
82        go func(id int) {
83            // This goroutine will run forever
84            ticker := time.NewTicker(time.Second)
85            defer ticker.Stop()
86
87            for range ticker.C {
88                // Simulate work but never exit
89                if id == 5 {
90                    fmt.Printf("Leaked goroutine %d: Still running\n", id)
91                    break
92                }
93            }
94        }(i)
95    }
96
97    fmt.Println("Created 10 goroutines that will leak")
98}

Example 3: Production-Ready Runtime Monitoring

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13// Production-grade runtime monitoring
 14type RuntimeMonitor struct {
 15    samples     []RuntimeStats
 16    sampleCount int64
 17    mu          sync.RWMutex
 18
 19    // Alert thresholds
 20    maxGoroutines int
 21    maxMemoryMB  int
 22
 23    // Performance metrics
 24    startTime time.Time
 25}
 26
 27type RuntimeStats struct {
 28    Timestamp     time.Time
 29    Goroutines    int
 30    MemoryAllocMB uint64
 31    HeapSysMB     uint64
 32    NumGC         uint32
 33    GCCyclesPerSec float64
 34}
 35
 36func NewRuntimeMonitor(maxGoroutines, maxMemoryMB int) *RuntimeMonitor {
 37    return &RuntimeMonitor{
 38        samples:      make([]RuntimeStats, 0, 1000),
 39        maxGoroutines: maxGoroutines,
 40        maxMemoryMB:   maxMemoryMB,
 41        startTime:     time.Now(),
 42    }
 43}
 44
 45func Start(ctx context.Context, interval time.Duration) {
 46    ticker := time.NewTicker(interval)
 47    defer ticker.Stop()
 48
 49    for {
 50        select {
 51        case <-ctx.Done():
 52            return
 53        case <-ticker.C:
 54            rm.collectSample()
 55        }
 56    }
 57}
 58
 59func collectSample() {
 60    var m runtime.MemStats
 61    runtime.ReadMemStats(&m)
 62
 63    stats := RuntimeStats{
 64        Timestamp:     time.Now(),
 65        Goroutines:    runtime.NumGoroutine(),
 66        MemoryAllocMB: m.Alloc / 1024 / 1024,
 67        HeapSysMB:     m.HeapSys / 1024 / 1024,
 68        NumGC:         m.NumGC,
 69    }
 70
 71    // Calculate GC rate
 72    rm.mu.RLock()
 73    if len(rm.samples) > 0 {
 74        last := rm.samples[len(rm.samples)-1]
 75        timeDiff := stats.Timestamp.Sub(last.Timestamp).Seconds()
 76        gcDiff := stats.NumGC - last.NumGC
 77        stats.GCCyclesPerSec = float64(gcDiff) / timeDiff
 78    }
 79    rm.mu.RUnlock()
 80
 81    // Store sample
 82    rm.mu.Lock()
 83    rm.samples = append(rm.samples, stats)
 84    if len(rm.samples) > 1000 {
 85        rm.samples = rm.samples[1:] // Keep only recent samples
 86    }
 87    atomic.AddInt64(&rm.sampleCount, 1)
 88    rm.mu.Unlock()
 89
 90    // Check for alerts
 91    rm.checkAlerts(stats)
 92}
 93
 94func checkAlerts(stats RuntimeStats) {
 95    alerts := []string{}
 96
 97    if stats.Goroutines > rm.maxGoroutines {
 98        alerts = append(alerts, fmt.Sprintf("HIGH GOROUTINE COUNT: %d > %d",
 99            stats.Goroutines, rm.maxGoroutines))
100    }
101
102    if int(stats.MemoryAllocMB) > rm.maxMemoryMB {
103        alerts = append(alerts, fmt.Sprintf("HIGH MEMORY USAGE: %dMB > %dMB",
104            stats.MemoryAllocMB, rm.maxMemoryMB))
105    }
106
107    if stats.GCCyclesPerSec > 100 {
108        alerts = append(alerts, fmt.Sprintf("HIGH GC RATE: %.1f cycles/sec",
109            stats.GCCyclesPerSec))
110    }
111
112    if len(alerts) > 0 {
113        fmt.Printf("\n🚨 RUNTIME ALERTS at %s:\n",
114            stats.Timestamp.Format("15:04:05"))
115        for _, alert := range alerts {
116            fmt.Printf("   %s\n", alert)
117        }
118    }
119}
120
121func Report() {
122    rm.mu.RLock()
123    defer rm.mu.RUnlock()
124
125    if len(rm.samples) == 0 {
126        fmt.Println("No samples collected")
127        return
128    }
129
130    latest := rm.samples[len(rm.samples)-1]
131    uptime := time.Since(rm.startTime)
132
133    fmt.Printf("\n=== Runtime Monitor Report ===\n")
134    fmt.Printf("Uptime: %v\n", uptime.Round(time.Second))
135    fmt.Printf("Samples collected: %d\n", atomic.LoadInt64(&rm.sampleCount))
136    fmt.Printf("\nCurrent State:\n")
137    fmt.Printf("  Goroutines: %d\n", latest.Goroutines)
138    fmt.Printf("  Memory: %d MB allocated, %d MB system\n",
139        latest.MemoryAllocMB, latest.HeapSysMB)
140    fmt.Printf("  GC: %d total cycles, %.1f cycles/sec\n",
141        latest.NumGC, latest.GCCyclesPerSec)
142
143    // Analyze trends
144    if len(rm.samples) >= 10 {
145        rm.analyzeTrends()
146    }
147}
148
149func analyzeTrends() {
150    // Get recent samples for trend analysis
151    recent := rm.samples[len(rm.samples)-10:]
152
153    var goroutineTrend, memoryTrend float64
154    var gcTrend float64
155
156    for i := 1; i < len(recent); i++ {
157        goroutineTrend += float64(recent[i].Goroutines - recent[i-1].Goroutines)
158        memoryTrend += float64(int(recent[i].MemoryAllocMB) - int(recent[i-1].MemoryAllocMB))
159        gcTrend += recent[i].GCCyclesPerSec - recent[i-1].GCCyclesPerSec
160    }
161
162    fmt.Printf("\nTrends:\n")
163    if goroutineTrend > 0 {
164        fmt.Printf("  Goroutines: INCREASING\n", goroutineTrend)
165    } else {
166        fmt.Printf("  Goroutines: stable/decreasing\n", goroutineTrend)
167    }
168
169    if memoryTrend > 0 {
170        fmt.Printf("  Memory: INCREASING\n", memoryTrend)
171    } else {
172        fmt.Printf("  Memory: stable/decreasing\n", memoryTrend)
173    }
174
175    if gcTrend > 0 {
176        fmt.Printf("  GC Rate: INCREASING\n", gcTrend)
177    } else {
178        fmt.Printf("  GC Rate: stable/decreasing\n", gcTrend)
179    }
180}
181
182func main() {
183    fmt.Println("=== Production Runtime Monitoring ===")
184
185    // Create monitor with alert thresholds
186    monitor := NewRuntimeMonitor(100, 100) // Alert on 100 goroutines or 100MB
187
188    // Start monitoring
189    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
190    defer cancel()
191
192    go monitor.Start(ctx, time.Second)
193
194    // Simulate various workloads
195    fmt.Println("Simulating workload patterns...")
196
197    // Phase 1: Normal workload
198    time.Sleep(5 * time.Second)
199
200    // Phase 2: Burst of goroutines
201    var wg sync.WaitGroup
202    for i := 0; i < 50; i++ {
203        wg.Add(1)
204        go func() {
205            defer wg.Done()
206            time.Sleep(10 * time.Second)
207        }()
208    }
209
210    fmt.Println("Created burst of 50 goroutines")
211    time.Sleep(10 * time.Second)
212
213    // Phase 3: Memory allocation burst
214    data := make([][]byte, 20)
215    for i := range data {
216        data[i] = make([]byte, 1024*1024) // 1MB each
217        time.Sleep(100 * time.Millisecond)
218    }
219
220    fmt.Println("Allocated 20MB of memory")
221    time.Sleep(10 * time.Second)
222
223    wg.Wait()
224    fmt.Println("All goroutines completed")
225
226    // Final report
227    monitor.Report()
228}

Common Patterns and Pitfalls

Pattern 1: Goroutine Pool Pattern

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "time"
  9)
 10
 11// Efficient goroutine pool implementation
 12type WorkerPool struct {
 13    workers   int
 14    taskQueue chan func()
 15    wg        sync.WaitGroup
 16    quit      chan struct{}
 17}
 18
 19func NewWorkerPool(workers int) *WorkerPool {
 20    return &WorkerPool{
 21        workers:   workers,
 22        taskQueue: make(chan func(), workers*2),
 23        quit:      make(chan struct{}),
 24    }
 25}
 26
 27func Start() {
 28    for i := 0; i < wp.workers; i++ {
 29        wp.wg.Add(1)
 30        go wp.worker(i)
 31    }
 32}
 33
 34func worker(id int) {
 35    defer wp.wg.Done()
 36
 37    fmt.Printf("Worker %d: Started\n", id)
 38
 39    for {
 40        select {
 41        case task := <-wp.taskQueue:
 42            if task != nil {
 43                task()
 44            }
 45        case <-wp.quit:
 46            fmt.Printf("Worker %d: Shutting down\n", id)
 47            return
 48        }
 49    }
 50}
 51
 52func Submit(task func()) {
 53    wp.taskQueue <- task
 54}
 55
 56func Stop() {
 57    close(wp.quit)
 58    wp.wg.Wait()
 59}
 60
 61func main() {
 62    fmt.Println("=== Goroutine Pool Pattern ===")
 63
 64    // Compare: direct goroutines vs pool
 65    const tasks = 1000
 66
 67    // Method 1: Direct goroutine creation
 68    fmt.Println("\n--- Direct Goroutine Creation ---")
 69    start := time.Now()
 70
 71    var wg1 sync.WaitGroup
 72    for i := 0; i < tasks; i++ {
 73        wg1.Add(1)
 74        go func(id int) {
 75            defer wg1.Done()
 76            // Simulate work
 77            sum := 0
 78            for j := 0; j < 1000; j++ {
 79                sum += j * j
 80            }
 81            if id%100 == 0 {
 82                fmt.Printf("Direct goroutine %d completed\n", id)
 83            }
 84        }(i)
 85    }
 86
 87    wg1.Wait()
 88    directTime := time.Since(start)
 89
 90    // Give time for cleanup
 91    runtime.GC()
 92    time.Sleep(100 * time.Millisecond)
 93
 94    // Method 2: Worker pool
 95    fmt.Println("\n--- Worker Pool Approach ---")
 96    pool := NewWorkerPool(runtime.NumCPU())
 97    pool.Start()
 98    defer pool.Stop()
 99
100    start = time.Now()
101
102    var wg2 sync.WaitGroup
103    for i := 0; i < tasks; i++ {
104        wg2.Add(1)
105        pool.Submit(func() {
106            defer wg2.Done()
107            // Simulate same work
108            sum := 0
109            for j := 0; j < 1000; j++ {
110                sum += j * j
111            }
112        })
113    }
114
115    wg2.Wait()
116    poolTime := time.Since(start)
117
118    fmt.Printf("\nPerformance Comparison:\n")
119    fmt.Printf("Direct goroutines: %v\n", directTime)
120    fmt.Printf("Worker pool:       %v\n", poolTime)
121    fmt.Printf("Improvement:        %.2fx\n", float64(directTime)/float64(poolTime))
122}

Pattern 2: Context-Based Cancellation

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "time"
 10)
 11
 12// Context-based cancellation for clean shutdown
 13type Server struct {
 14    workers   int
 15    taskQueue chan Task
 16    ctx       context.Context
 17    cancel    context.CancelFunc
 18    wg        sync.WaitGroup
 19}
 20
 21type Task struct {
 22    id   int
 23    work func() error
 24}
 25
 26func NewServer(workers int) *Server {
 27    ctx, cancel := context.WithCancel(context.Background())
 28
 29    return &Server{
 30        workers:   workers,
 31        taskQueue: make(chan Task, workers*2),
 32        ctx:       ctx,
 33        cancel:    cancel,
 34    }
 35}
 36
 37func Start() {
 38    for i := 0; i < s.workers; i++ {
 39        s.wg.Add(1)
 40        go s.worker(i)
 41    }
 42}
 43
 44func worker(id int) {
 45    defer s.wg.Done()
 46
 47    fmt.Printf("Worker %d: Started\n", id)
 48
 49    for {
 50        select {
 51        case <-s.ctx.Done():
 52            fmt.Printf("Worker %d: Context cancelled, shutting down\n", id)
 53            return
 54
 55        case task := <-s.taskQueue:
 56            if task.work != nil {
 57                if err := task.work(); err != nil {
 58                    fmt.Printf("Worker %d: Task %d failed: %v\n", id, task.id, err)
 59                } else {
 60                    fmt.Printf("Worker %d: Task %d completed\n", id, task.id)
 61                }
 62            }
 63        }
 64    }
 65}
 66
 67func Submit(task Task) {
 68    select {
 69    case s.taskQueue <- task:
 70        fmt.Printf("Server: Task %d submitted\n", task.id)
 71    case <-s.ctx.Done():
 72        fmt.Printf("Server: Rejecting task %d - server shutting down\n", task.id)
 73    }
 74}
 75
 76func Shutdown(timeout time.Duration) error {
 77    fmt.Printf("Server: Initiating shutdown with %v timeout\n", timeout)
 78
 79    // Cancel context to signal workers
 80    s.cancel()
 81
 82    // Wait for workers with timeout
 83    done := make(chan struct{})
 84    go func() {
 85        s.wg.Wait()
 86        close(done)
 87    }()
 88
 89    select {
 90    case <-done:
 91        fmt.Println("Server: All workers shut down gracefully")
 92        return nil
 93    case <-time.After(timeout):
 94        fmt.Println("Server: Shutdown timeout - workers may still be running")
 95        return fmt.Errorf("shutdown timeout after %v", timeout)
 96    }
 97}
 98
 99func main() {
100    fmt.Println("=== Context-Based Cancellation ===")
101
102    server := NewServer(3)
103    server.Start()
104    defer server.Shutdown(5 * time.Second)
105
106    // Submit some tasks
107    for i := 0; i < 10; i++ {
108        task := Task{
109            id: i,
110            work: func() error {
111                // Simulate work with occasional errors
112                if i%7 == 0 {
113                    return fmt.Errorf("simulated error for task %d", i)
114                }
115                time.Sleep(100 * time.Millisecond)
116                return nil
117            },
118        }
119
120        server.Submit(task)
121        time.Sleep(50 * time.Millisecond)
122    }
123
124    fmt.Printf("\nActive goroutines: %d\n", runtime.NumGoroutine())
125    time.Sleep(2 * time.Second)
126}

Common Pitfalls to Avoid

Pitfall 1: Goroutine Leaks

 1// ❌ BAD: Goroutine leak
 2func badLeak() {
 3    for i := 0; i < 100; i++ {
 4        go func() {
 5            // This goroutine never exits!
 6            for {
 7                time.Sleep(time.Second)
 8            }
 9        }()
10    }
11}
12
13// βœ… GOOD: Proper lifecycle management
14func goodNoLeak(ctx context.Context) {
15    for i := 0; i < 100; i++ {
16        go func(id int) {
17            for {
18                select {
19                case <-ctx.Done():
20                    return // Clean exit
21                case <-time.After(time.Second):
22                    fmt.Printf("Worker %d: Still working\n", id)
23                }
24            }
25        }(i)
26    }
27}

Pitfall 2: Ignoring GC Pressure

 1// ❌ BAD: Excessive allocation in hot path
 2func badAllocation(data []int) int {
 3    sum := 0
 4    for _, v := range data {
 5        // Creates new slice and string every iteration!
 6        temp := []int{v, v * 2, v * 3}
 7        sum += len(temp)
 8    }
 9    return sum
10}
11
12// βœ… GOOD: Reuse buffers or use primitive types
13func goodAllocation(data []int) int {
14    sum := 0
15    for _, v := range data {
16        // No allocation, just arithmetic
17        temp := v + v*2 + v*3
18        sum += temp
19    }
20    return sum
21}

Pitfall 3: Blocking Operations Without Context

 1// ❌ BAD: Blocking call without timeout
 2func badBlocking() {
 3    // This could block forever
 4    result := make(chan string)
 5    go func() {
 6        time.Sleep(10 * time.Second)
 7        result <- "done"
 8    }()
 9
10    fmt.Println(<-result) // Might block forever
11}
12
13// βœ… GOOD: Context-aware operations
14func goodWithContext(ctx context.Context) error {
15    result := make(chan string)
16
17    go func() {
18        select {
19        case <-ctx.Done():
20            return
21        case <-time.After(10 * time.Second):
22            result <- "done"
23        }
24    }()
25
26    select {
27    case res := <-result:
28        fmt.Println(res)
29        return nil
30    case <-ctx.Done():
31        return ctx.Err()
32    }
33}

Integration and Mastery - Building High-Performance Systems

Now let's integrate all these concepts into a complete, production-ready system.

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13// High-performance request handler integrating all runtime concepts
 14type RequestHandler struct {
 15    // Worker pool for request processing
 16    pool        *WorkerPool
 17
 18    // Runtime monitoring
 19    monitor     *RuntimeMonitor
 20
 21    // Lifecycle management
 22    ctx         context.Context
 23    cancel      context.CancelFunc
 24
 25    // Performance metrics
 26    requests    int64
 27    errors      int64
 28    latencySum  int64
 29}
 30
 31type Request struct {
 32    id      int
 33    payload []byte
 34    result  chan Response
 35}
 36
 37type Response struct {
 38    data   []byte
 39    err    error
 40    timeMs int64
 41}
 42
 43// Enhanced WorkerPool with metrics
 44type EnhancedWorkerPool struct {
 45    workers    int
 46    taskQueue  chan Request
 47    workersOut int64 // Atomic counter for completed tasks
 48    ctx        context.Context
 49    wg         sync.WaitGroup
 50}
 51
 52func NewEnhancedWorkerPool(workers int, ctx context.Context) *EnhancedWorkerPool {
 53    return &EnhancedWorkerPool{
 54        workers:   workers,
 55        taskQueue: make(chan Request, workers*4),
 56        ctx:       ctx,
 57    }
 58}
 59
 60func Start() {
 61    for i := 0; i < p.workers; i++ {
 62        p.wg.Add(1)
 63        go p.worker(i)
 64    }
 65}
 66
 67func worker(id int) {
 68    defer p.wg.Done()
 69
 70    for {
 71        select {
 72        case <-p.ctx.Done():
 73            return
 74        case req := <-p.taskQueue:
 75            start := time.Now()
 76
 77            // Simulate request processing
 78            time.Sleep(time.Duration(len(req.payload)%10) * time.Millisecond)
 79
 80            // Simulate occasional errors
 81            var err error
 82            if req.id%100 == 0 {
 83                err = fmt.Errorf("simulated error for request %d", req.id)
 84            }
 85
 86            response := Response{
 87                data:   []byte(fmt.Sprintf("processed-%d", req.id)),
 88                err:    err,
 89                timeMs: time.Since(start).Milliseconds(),
 90            }
 91
 92            req.result <- response
 93            atomic.AddInt64(&p.workersOut, 1)
 94        }
 95    }
 96}
 97
 98func Submit(req Request) {
 99    select {
100    case p.taskQueue <- req:
101    case <-p.ctx.Done():
102        req.result <- Response{err: fmt.Errorf("pool shutting down")}
103    }
104}
105
106func Shutdown() {
107    p.wg.Wait()
108}
109
110func Stats() int64 {
111    return atomic.LoadInt64(&p.workersOut)
112}
113
114func NewRequestHandler(workers int) *RequestHandler {
115    ctx, cancel := context.WithCancel(context.Background())
116
117    handler := &RequestHandler{
118        pool:    NewEnhancedWorkerPool(workers, ctx),
119        monitor:  NewRuntimeMonitor(1000, 500), // Alert on 1000 goroutines or 500MB
120        ctx:     ctx,
121        cancel:  cancel,
122    }
123
124    return handler
125}
126
127func Start() {
128    // Start worker pool
129    rh.pool.Start()
130
131    // Start runtime monitoring
132    go rh.monitor.Start(rh.ctx, time.Second)
133
134    fmt.Println("RequestHandler: Started with runtime monitoring")
135}
136
137func ProcessRequest(req Request) Response {
138    atomic.AddInt64(&rh.requests, 1)
139
140    // Submit to worker pool
141    rh.pool.Submit(req)
142
143    // Wait for response with timeout
144    select {
145    case resp := <-req.result:
146        if resp.err != nil {
147            atomic.AddInt64(&rh.errors, 1)
148        }
149        atomic.AddInt64(&rh.latencySum, resp.timeMs)
150        return resp
151    case <-time.After(5 * time.Second):
152        atomic.AddInt64(&rh.errors, 1)
153        return Response{err: fmt.Errorf("request timeout")}
154    case <-rh.ctx.Done():
155        return Response{err: fmt.Errorf("handler shutting down")}
156    }
157}
158
159func Shutdown() {
160    fmt.Println("RequestHandler: Initiating shutdown...")
161
162    // Cancel context to stop monitoring and workers
163    rh.cancel()
164
165    // Shutdown worker pool
166    rh.pool.Shutdown()
167
168    // Final report
169    rh.printFinalReport()
170}
171
172func printFinalReport() {
173    requests := atomic.LoadInt64(&rh.requests)
174    errors := atomic.LoadInt64(&rh.errors)
175    latencySum := atomic.LoadInt64(&rh.latencySum)
176
177    fmt.Printf("\n=== RequestHandler Final Report ===\n")
178    fmt.Printf("Total requests:    %d\n", requests)
179    fmt.Printf("Total errors:      %d\n", errors)
180    fmt.Printf("Error rate:        %.2f%%\n", float64(errors)/float64(requests)*100)
181
182    if requests > 0 {
183        avgLatency := latencySum / requests
184        fmt.Printf("Average latency:    %dms\n", avgLatency)
185    }
186
187    fmt.Printf("Worker throughput: %d tasks\n", rh.pool.Stats())
188
189    // Runtime statistics
190    rh.monitor.Report()
191}
192
193func main() {
194    fmt.Println("=== High-Performance Request Handler ===")
195
196    // Create handler with worker pool
197    handler := NewRequestHandler(runtime.NumCPU() * 2)
198    handler.Start()
199    defer handler.Shutdown()
200
201    // Simulate request load
202    const totalRequests = 1000
203    const requestRate = 100 // requests per second
204
205    fmt.Printf("\nProcessing %d requests at %d req/sec...\n", totalRequests, requestRate)
206
207    ticker := time.NewTicker(time.Second / time.Duration(requestRate))
208    defer ticker.Stop()
209
210    var wg sync.WaitGroup
211
212    for i := 0; i < totalRequests; i++ {
213        <-ticker.C
214
215        wg.Add(1)
216        go func(reqID int) {
217            defer wg.Done()
218
219            // Create request
220            req := Request{
221                id:      reqID,
222                payload: make([]byte, reqID%1000), // Variable payload size
223                result:  make(chan Response, 1),
224            }
225
226            // Process request
227            resp := handler.ProcessRequest(req)
228
229            if reqID%50 == 0 {
230                if resp.err != nil {
231                    fmt.Printf("Request %d failed: %v\n", reqID, resp.err)
232                } else {
233                    fmt.Printf("Request %d completed in %dms\n", reqID, resp.timeMs)
234                }
235            }
236        }(i)
237    }
238
239    wg.Wait()
240
241    fmt.Printf("\nAll %d requests processed\n", totalRequests)
242
243    // Let monitoring run a bit more to collect stats
244    time.Sleep(5 * time.Second)
245}

Exercise 1: Scheduler Analysis

🎯 Learning Objectives:

  • Understand Go's GMP scheduler behavior and performance characteristics
  • Master runtime metrics collection and analysis
  • Learn to optimize GOMAXPROCS for different workload types
  • Practice systematic performance measurement and comparison

🌍 Real-World Context:
Understanding scheduler behavior is crucial for high-performance Go services. At Google, optimizing GOMAXPROCS for different workload types improved their ad serving system throughput by 35%. Netflix uses scheduler analysis to determine optimal resource allocation for their video processing pipelines, while Cloudflare tunes their CDN edge servers for mixed HTTP/TCP workloads.

⏱️ Time Estimate: 75-90 minutes
πŸ“Š Difficulty: Advanced

Create a program that creates goroutines with different characteristics, monitors scheduler behavior using runtime metrics, experiments with various GOMAXPROCS values, and reports comprehensive analysis of goroutine distribution and execution times. This will help you understand how to optimize Go applications for different workload patterns.

Solution
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13type WorkloadType int
 14
 15const (
 16    CPUBound WorkloadType = iota
 17    IOBound
 18    Mixed
 19)
 20
 21type WorkloadStats struct {
 22    completed    atomic.Int64
 23    totalTime    atomic.Int64
 24    workloadType WorkloadType
 25}
 26
 27func recordCompletion(duration time.Duration) {
 28    ws.completed.Add(1)
 29    ws.totalTime.Add(int64(duration))
 30}
 31
 32func avgDuration() time.Duration {
 33    completed := ws.completed.Load()
 34    if completed == 0 {
 35        return 0
 36    }
 37    return time.Duration(ws.totalTime.Load() / completed)
 38}
 39
 40type SchedulerAnalyzer struct {
 41    cpuStats   *WorkloadStats
 42    ioStats    *WorkloadStats
 43    mixedStats *WorkloadStats
 44}
 45
 46func NewSchedulerAnalyzer() *SchedulerAnalyzer {
 47    return &SchedulerAnalyzer{
 48        cpuStats:   &WorkloadStats{workloadType: CPUBound},
 49        ioStats:    &WorkloadStats{workloadType: IOBound},
 50        mixedStats: &WorkloadStats{workloadType: Mixed},
 51    }
 52}
 53
 54func runCPUBound(ctx context.Context) {
 55    start := time.Now()
 56    defer sa.cpuStats.recordCompletion(time.Since(start))
 57
 58    sum := 0
 59    for i := 0; i < 1000000; i++ {
 60        sum += i * i
 61
 62        select {
 63        case <-ctx.Done():
 64            return
 65        default:
 66        }
 67    }
 68}
 69
 70func runIOBound(ctx context.Context) {
 71    start := time.Now()
 72    defer sa.ioStats.recordCompletion(time.Since(start))
 73
 74    select {
 75    case <-time.After(10 * time.Millisecond):
 76    case <-ctx.Done():
 77    }
 78}
 79
 80func runMixed(ctx context.Context) {
 81    start := time.Now()
 82    defer sa.mixedStats.recordCompletion(time.Since(start))
 83
 84    // Some CPU work
 85    sum := 0
 86    for i := 0; i < 100000; i++ {
 87        sum += i
 88    }
 89
 90    // Some I/O wait
 91    select {
 92    case <-time.After(5 * time.Millisecond):
 93    case <-ctx.Done():
 94    }
 95}
 96
 97func runWorkload(ctx context.Context, workloadType WorkloadType, count int) {
 98    var wg sync.WaitGroup
 99
100    for i := 0; i < count; i++ {
101        wg.Add(1)
102        go func() {
103            defer wg.Done()
104
105            switch workloadType {
106            case CPUBound:
107                sa.runCPUBound(ctx)
108            case IOBound:
109                sa.runIOBound(ctx)
110            case Mixed:
111                sa.runMixed(ctx)
112            }
113        }()
114    }
115
116    wg.Wait()
117}
118
119func analyze(procs int) {
120    runtime.GOMAXPROCS(procs)
121
122    fmt.Printf("\n=== Analysis with GOMAXPROCS=%d ===\n", procs)
123
124    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
125    defer cancel()
126
127    start := time.Now()
128
129    // Run mixed workload
130    var wg sync.WaitGroup
131
132    wg.Add(3)
133    go func() {
134        defer wg.Done()
135        sa.runWorkload(ctx, CPUBound, 50)
136    }()
137    go func() {
138        defer wg.Done()
139        sa.runWorkload(ctx, IOBound, 50)
140    }()
141    go func() {
142        defer wg.Done()
143        sa.runWorkload(ctx, Mixed, 50)
144    }()
145
146    wg.Wait()
147    totalTime := time.Since(start)
148
149    fmt.Printf("Total execution time: %v\n", totalTime)
150    fmt.Printf("CPU-bound tasks: %d\n",
151        sa.cpuStats.completed.Load(), sa.cpuStats.avgDuration())
152    fmt.Printf("I/O-bound tasks: %d\n",
153        sa.ioStats.completed.Load(), sa.ioStats.avgDuration())
154    fmt.Printf("Mixed tasks: %d\n",
155        sa.mixedStats.completed.Load(), sa.mixedStats.avgDuration())
156}
157
158func main() {
159    fmt.Println("=== Scheduler Analysis ===")
160    fmt.Printf("System CPUs: %d\n", runtime.NumCPU())
161
162    // Test with different GOMAXPROCS values
163    for _, procs := range []int{1, 2, 4, runtime.NumCPU()} {
164        analyzer := NewSchedulerAnalyzer()
165        analyzer.analyze(procs)
166    }
167}

The GMP Scheduler Model

🎯 Practical Analogy: Think of GMP as a workshop with workers, workbenches, and tasks. Each worker can use any workbench, and each workbench has a queue of tasks. Workers who finish their tasks can "steal" tasks from other workbenches to keep busy.

Go's scheduler implements an M:N threading model where M goroutines run on N OS threads. This means thousands of goroutines can efficiently run on just a handful of OS threads, enabling massive concurrency without massive resource usage.

Core Components

G: Lightweight execution unit containing:

  • Stack pointer
  • Program counter
  • Status
  • Associated M
  • Cost: ~2KB starting stack size vs ~1MB for OS threads

M: OS thread that executes goroutines:

  • Has a G0 goroutine for scheduling
  • Can be bound to or released from Ps
  • Handles system calls without blocking other Ms
  • Real-world constraint: Limited by the OS, typically hundreds maximum

P: Execution context with resources:

  • Local run queue
  • Memory cache for fast allocations
  • Count determined by GOMAXPROCS
  • Key insight: P represents a virtual CPU core for goroutines

⚠️ Important: The M:N model is what makes Go's concurrency so efficient. Traditional 1:1 threading would limit you to hundreds of concurrent goroutines. Go's M:N model supports millions!

Understanding Goroutine States

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Visualize goroutine lifecycle states
12func main() {
13    fmt.Println("=== Goroutine State Visualization ===")
14
15    var wg sync.WaitGroup
16
17    // Track different goroutine states
18    fmt.Printf("Initial goroutines: %d\n", runtime.NumGoroutine())
19
20    // 1. Grunnable -> Grunning
21    wg.Add(1)
22    go func() {
23        defer wg.Done()
24        fmt.Println("Goroutine 1: Started")
25
26        // 2. Grunning -> Gwaiting
27        ch := make(chan int)
28        wg.Add(1)
29        go func() {
30            defer wg.Done()
31            time.Sleep(100 * time.Millisecond)
32            ch <- 42
33            fmt.Println("Goroutine 2: Sent data")
34        }()
35
36        fmt.Println("Goroutine 1: Waiting for channel")
37        val := <-ch
38        fmt.Printf("Goroutine 1: Received %d\n", val)
39
40        // 3. Grunning -> Gsyscall
41        fmt.Println("Goroutine 1: Making system call")
42        time.Sleep(50 * time.Millisecond) // Internally makes syscall
43        fmt.Println("Goroutine 1: System call completed")
44
45        // 4. Grunning -> Gdead
46        fmt.Println("Goroutine 1: Finishing")
47    }()
48
49    // Monitor goroutine count during transitions
50    done := make(chan struct{})
51    go func() {
52        ticker := time.NewTicker(50 * time.Millisecond)
53        defer ticker.Stop()
54
55        for {
56            select {
57            case <-ticker.C:
58                fmt.Printf("Active goroutines: %d\n", runtime.NumGoroutine())
59            case <-done:
60                return
61            }
62        }
63    }()
64
65    wg.Wait()
66    close(done)
67
68    fmt.Printf("Final goroutines: %d\n", runtime.NumGoroutine())
69}

Optimizing GOMAXPROCS

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "time"
  9)
 10
 11// GOMAXPROCS optimization for different workloads
 12func main() {
 13    fmt.Println("=== GOMAXPROCS Optimization ===")
 14
 15    numCPU := runtime.NumCPU()
 16    fmt.Printf("System has %d CPU cores\n", numCPU)
 17
 18    // Test different GOMAXPROCS values
 19    gomaxprocsValues := []int{1, 2, numCPU, numCPU * 2}
 20
 21    for _, procs := range gomaxprocsValues {
 22        runtime.GOMAXPROCS(procs)
 23        fmt.Printf("\n--- Testing with GOMAXPROCS=%d ---\n", procs)
 24
 25        // Test CPU-bound workload
 26        cpuTime := benchmarkCPUWorkload()
 27        fmt.Printf("CPU-bound: %v\n", cpuTime)
 28
 29        // Test I/O-bound workload
 30        ioTime := benchmarkIOWorkload()
 31        fmt.Printf("I/O-bound: %v\n", ioTime)
 32
 33        // Test mixed workload
 34        mixedTime := benchmarkMixedWorkload()
 35        fmt.Printf("Mixed:     %v\n", mixedTime)
 36    }
 37
 38    // Reset to default
 39    runtime.GOMAXPROCS(numCPU)
 40    fmt.Printf("\nReset GOMAXPROCS to %d\n", numCPU)
 41}
 42
 43func benchmarkCPUWorkload() time.Duration {
 44    const tasks = 100
 45    var wg sync.WaitGroup
 46
 47    start := time.Now()
 48
 49    for i := 0; i < tasks; i++ {
 50        wg.Add(1)
 51        go func() {
 52            defer wg.Done()
 53            // CPU-intensive work
 54            sum := 0
 55            for j := 0; j < 100000; j++ {
 56                sum += j * j
 57            }
 58            _ = sum
 59        }()
 60    }
 61
 62    wg.Wait()
 63    return time.Since(start)
 64}
 65
 66func benchmarkIOWorkload() time.Duration {
 67    const tasks = 100
 68    var wg sync.WaitGroup
 69
 70    start := time.Now()
 71
 72    for i := 0; i < tasks; i++ {
 73        wg.Add(1)
 74        go func() {
 75            defer wg.Done()
 76            // I/O simulation
 77            time.Sleep(10 * time.Millisecond)
 78        }()
 79    }
 80
 81    wg.Wait()
 82    return time.Since(start)
 83}
 84
 85func benchmarkMixedWorkload() time.Duration {
 86    const tasks = 100
 87    var wg sync.WaitGroup
 88
 89    start := time.Now()
 90
 91    for i := 0; i < tasks; i++ {
 92        wg.Add(1)
 93        go func() {
 94            defer wg.Done()
 95            // Mix of CPU and I/O
 96            sum := 0
 97            for j := 0; j < 50000; j++ {
 98                sum += j * j
 99            }
100            _ = sum
101            time.Sleep(5 * time.Millisecond)
102        }()
103    }
104
105    wg.Wait()
106    return time.Since(start)
107}
 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Visualizing the GMP Model
12func main() {
13    // Set GOMAXPROCS to number of CPUs
14    numCPU := runtime.NumCPU()
15    runtime.GOMAXPROCS(numCPU)
16
17    fmt.Printf("System Info:\n")
18    fmt.Printf("- CPUs: %d\n", numCPU)
19    fmt.Printf("- GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
20    fmt.Printf("- NumGoroutine: %d\n", runtime.NumGoroutine())
21
22    // Create multiple goroutines
23    var wg sync.WaitGroup
24    for i := 0; i < 10; i++ {
25        wg.Add(1)
26        go func(id int) {
27            defer wg.Done()
28
29            // CPU-bound work
30            sum := 0
31            for j := 0; j < 1000000; j++ {
32                sum += j
33            }
34
35            fmt.Printf("Goroutine %d done\n", id, sum)
36        }(i)
37    }
38
39    // Monitor goroutine count
40    ticker := time.NewTicker(100 * time.Millisecond)
41    done := make(chan struct{})
42
43    go func() {
44        for {
45            select {
46            case <-ticker.C:
47                fmt.Printf("Active goroutines: %d\n", runtime.NumGoroutine())
48            case <-done:
49                ticker.Stop()
50                return
51            }
52        }
53    }()
54
55    wg.Wait()
56    close(done)
57
58    fmt.Printf("Final goroutine count: %d\n", runtime.NumGoroutine())
59}

Real-world Example: Google's search backend uses this exact model. Millions of search queries are processed on just a few hundred cores, with efficient scheduling ensuring fast response times even under massive load.
// run


### Scheduler Data Structures

```go
// run
package main

import (
    "fmt"
    "runtime"
    "sync/atomic"
    "time"
)

// SchedulerStats provides insight into scheduler behavior
type SchedulerStats struct {
    Gs int   // Number of goroutines
    Ms int   // Number of OS threads
    Ps int   // Number of processors
}

func GetSchedulerStats() SchedulerStats {
    return SchedulerStats{
        Gs: runtime.NumGoroutine(),
        Ms: runtime.NumCgoCall(), // Approximation
        Ps: runtime.GOMAXPROCS(0),
    }
}

// Production Example: Monitor scheduler pressure
type SchedulerMonitor struct {
    samples     []SchedulerStats
    maxGs       atomic.Int64
    sampleCount atomic.Int64
}

func NewSchedulerMonitor() *SchedulerMonitor {
    return &SchedulerMonitor{
        samples: make([]SchedulerStats, 0, 1000),
    }
}

func Start(interval time.Duration) chan struct{} {
    stop := make(chan struct{})

    go func() {
        ticker := time.NewTicker(interval)
        defer ticker.Stop()

        for {
            select {
            case <-ticker.C:
                stats := GetSchedulerStats()

                // Track maximum goroutines
                if int64(stats.Gs) > sm.maxGs.Load() {
                    sm.maxGs.Store(int64(stats.Gs))
                }

                sm.samples = append(sm.samples, stats)
                sm.sampleCount.Add(1)

            case <-stop:
                return
            }
        }
    }()

    return stop
}

func Report() {
    fmt.Printf("\n=== Scheduler Monitor Report ===\n")
    fmt.Printf("Samples collected: %d\n", sm.sampleCount.Load())
    fmt.Printf("Peak goroutines: %d\n", sm.maxGs.Load())

    if len(sm.samples) > 0 {
        last := sm.samples[len(sm.samples)-1]
        fmt.Printf("Current state:\n")
        fmt.Printf("  - Goroutines: %d\n", last.Gs)
        fmt.Printf("  - Processors: %d\n", last.Ps)
    }
}

func main() {
    monitor := NewSchedulerMonitor()
    stop := monitor.Start(50 * time.Millisecond)

    // Simulate workload
    var done atomic.Bool
    for i := 0; i < 100; i++ {
        go func(id int) {
            for !done.Load() {
                // Simulate work
                time.Sleep(10 * time.Millisecond)
            }
        }(i)
    }

    time.Sleep(500 * time.Millisecond)
    done.Store(true)
    time.Sleep(100 * time.Millisecond)

    close(stop)
    monitor.Report()
}
// run

Goroutine Lifecycle

🎯 Practical Analogy: A goroutine's lifecycle is like a worker's day at the factory: starts idle, gets scheduled, works actively, takes breaks, and goes home.

States

A goroutine transitions through several states:

  1. _Gidle: Just allocated, not initialized
  2. _Grunnable: On a run queue, ready to execute
  3. _Grunning: Currently executing on an M
  4. _Gsyscall: Executing a system call
  5. _Gwaiting: Blocked
  6. _Gdead: Execution completed

πŸ’‘ Key Takeaway: Most performance problems occur when too many goroutines are in the _Gwaiting state, indicating I/O bottlenecks, or when the run queues are too long, indicating CPU saturation.

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Demonstrating goroutine state transitions
12func main() {
13    var wg sync.WaitGroup
14    ch := make(chan int)
15
16    // Goroutine lifecycle example
17    wg.Add(1)
18    go func() {
19        defer wg.Done()
20
21        // State: Grunnable -> Grunning
22        fmt.Println("Goroutine started")
23
24        // State: Grunning -> Gwaiting
25        fmt.Println("Waiting on channel...")
26        val := <-ch
27
28        // State: Gwaiting -> Grunnable -> Grunning
29        fmt.Printf("Received: %d\n", val)
30
31        // State: Grunning -> Gwaiting
32        time.Sleep(100 * time.Millisecond)
33
34        fmt.Println("Goroutine completing...")
35        // State: Grunning -> Gdead
36    }()
37
38    // Let goroutine reach waiting state
39    time.Sleep(50 * time.Millisecond)
40
41    fmt.Printf("Active goroutines: %d\n", runtime.NumGoroutine())
42
43    // Wake up waiting goroutine
44    ch <- 42
45
46    wg.Wait()
47    fmt.Println("All goroutines completed")
48}
49// run

Goroutine Creation and Destruction

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Understanding goroutine overhead
12type GoroutinePool struct {
13    workers int
14    jobs    chan func()
15    wg      sync.WaitGroup
16}
17
18func NewGoroutinePool(workers int) *GoroutinePool {
19    pool := &GoroutinePool{
20        workers: workers,
21        jobs:    make(chan func(), workers*2),
22    }
23
24    // Pre-create goroutines
25    for i := 0; i < workers; i++ {
26        pool.wg.Add(1)
27        go pool.worker()
28    }
29
30    return pool
31}
32
33func worker() {
34    defer p.wg.Done()
35
36    for job := range p.jobs {
37        job()
38    }
39}
40
41func Submit(job func()) {
42    p.jobs <- job
43}
44
45func Shutdown() {
46    close(p.jobs)
47    p.wg.Wait()
48}
49
50// Compare: Pool vs On-demand goroutines
51func BenchmarkGoroutineCreation() {
52    const tasks = 10000
53
54    // Method 1: Create goroutine per task
55    start := time.Now()
56    var wg sync.WaitGroup
57    for i := 0; i < tasks; i++ {
58        wg.Add(1)
59        go func() {
60            defer wg.Done()
61            // Minimal work
62            _ = 1 + 1
63        }()
64    }
65    wg.Wait()
66    elapsed1 := time.Since(start)
67
68    // Method 2: Use goroutine pool
69    start = time.Now()
70    pool := NewGoroutinePool(runtime.NumCPU())
71    var wg2 sync.WaitGroup
72    for i := 0; i < tasks; i++ {
73        wg2.Add(1)
74        pool.Submit(func() {
75            defer wg2.Done()
76            _ = 1 + 1
77        })
78    }
79    wg2.Wait()
80    pool.Shutdown()
81    elapsed2 := time.Since(start)
82
83    fmt.Printf("On-demand goroutines: %v\n", elapsed1)
84    fmt.Printf("Goroutine pool: %v\n", elapsed2)
85    fmt.Printf("Speedup: %.2fx\n", float64(elapsed1)/float64(elapsed2))
86}
87
88func main() {
89    fmt.Println("Initial goroutines:", runtime.NumGoroutine())
90    BenchmarkGoroutineCreation()
91
92    // Force GC to clean up dead goroutines
93    runtime.GC()
94    time.Sleep(10 * time.Millisecond)
95    fmt.Println("Final goroutines:", runtime.NumGoroutine())
96}
97// run

Work Stealing and Load Balancing

🎯 Practical Analogy: Work stealing is like having multiple checkout counters in a supermarket. If one cashier finishes their queue and another has a long line, the free cashier can "steal" customers from the busy one. This keeps all cashiers productive and prevents long waits.

The scheduler implements work stealing to balance load across Ps. This is crucial for preventing situations where one processor is overloaded while others sit idle.

Work Stealing Algorithm

πŸ’‘ Key Takeaway: Work stealing makes Go's scheduler extremely efficient at load balancing. Without it, you could have one processor struggling with 1000 goroutines while another sits idle with nothing to do.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "math/rand"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13// Simulating work stealing behavior
 14type WorkStealingScheduler struct {
 15    numProcs  int
 16    localQs   []chan func() // Local run queues
 17    globalQ   chan func()   // Global run queue
 18    done      chan struct{}
 19    processed atomic.Int64
 20}
 21
 22func NewWorkStealingScheduler(numProcs int) *WorkStealingScheduler {
 23    ws := &WorkStealingScheduler{
 24        numProcs: numProcs,
 25        localQs:  make([]chan func(), numProcs),
 26        globalQ:  make(chan func(), 256),
 27        done:     make(chan struct{}),
 28    }
 29
 30    for i := range ws.localQs {
 31        ws.localQs[i] = make(chan func(), 256)
 32    }
 33
 34    return ws
 35}
 36
 37func Start() {
 38    for p := 0; p < ws.numProcs; p++ {
 39        go ws.processor(p)
 40    }
 41}
 42
 43func processor(pid int) {
 44    localQ := ws.localQs[pid]
 45
 46    for {
 47        select {
 48        case <-ws.done:
 49            return
 50
 51        case job := <-localQ:
 52            // Execute from local queue
 53            job()
 54            ws.processed.Add(1)
 55
 56        case job := <-ws.globalQ:
 57            // Execute from global queue
 58            job()
 59            ws.processed.Add(1)
 60
 61        default:
 62            // Try work stealing from other processors
 63            victim :=) % ws.numProcs
 64            select {
 65            case job := <-ws.localQs[victim]:
 66                // Stole work!
 67                job()
 68                ws.processed.Add(1)
 69            case <-time.After(time.Microsecond):
 70                // No work available
 71            }
 72        }
 73    }
 74}
 75
 76func Submit(job func()) {
 77    // Try local queue of random P
 78    pid := rand.Intn(ws.numProcs)
 79    select {
 80    case ws.localQs[pid] <- job:
 81        return
 82    default:
 83        // Local queue full, use global queue
 84        ws.globalQ <- job
 85    }
 86}
 87
 88func Stop() {
 89    close(ws.done)
 90}
 91
 92func main() {
 93    runtime.GOMAXPROCS(4)
 94
 95    scheduler := NewWorkStealingScheduler(4)
 96    scheduler.Start()
 97
 98    // Submit unbalanced workload
 99    const totalJobs = 1000
100
101    start := time.Now()
102    var wg sync.WaitGroup
103
104    for i := 0; i < totalJobs; i++ {
105        wg.Add(1)
106        scheduler.Submit(func() {
107            defer wg.Done()
108            // Variable work duration
109            time.Sleep(time.Duration(rand.Intn(10)) * time.Microsecond)
110        })
111    }
112
113    wg.Wait()
114    elapsed := time.Since(start)
115    scheduler.Stop()
116
117    fmt.Printf("Processed %d jobs in %v\n", scheduler.processed.Load(), elapsed)
118    fmt.Printf("Throughput: %.2f jobs/ms\n",
119        float64(totalJobs)/float64(elapsed.Milliseconds()))
120}

Real-world Example: Netflix's video encoding service uses work stealing to balance encoding tasks across available CPU cores. When one core finishes its video chunk, it can steal work from overloaded cores, ensuring consistent encoding throughput across their massive video library.
// run


### Global Run Queue

```go
// run
package main

import (
    "fmt"
    "runtime"
    "sync"
    "time"
)

// Understanding global vs local run queue behavior
func demonstrateRunQueues() {
    runtime.GOMAXPROCS(2) // Use 2 Ps

    var wg sync.WaitGroup

    // Scenario 1: Balanced load
    fmt.Println("=== Scenario 1: Balanced Load ===")
    start := time.Now()
    for i := 0; i < 100; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            time.Sleep(time.Millisecond)
        }(i)

        if i % 10 == 0 {
            time.Sleep(time.Millisecond) // Give scheduler time
        }
    }
    wg.Wait()
    fmt.Printf("Balanced load completed in: %v\n\n", time.Since(start))

    // Scenario 2: Burst load
    fmt.Println("=== Scenario 2: Burst Load ===")
    start = time.Now()
    for i := 0; i < 1000; i++ {
        wg.Add(1)
        go func(id int) {
            defer wg.Done()
            time.Sleep(time.Millisecond)
        }(i)
        // No sleep - create burst
    }
    wg.Wait()
    fmt.Printf("Burst load completed in: %v\n\n", time.Since(start))
}

// Production pattern: Rate limiting goroutine creation
type RateLimitedSpawner struct {
    limiter  chan struct{}
    maxBurst int
}

func NewRateLimitedSpawner(maxConcurrent int) *RateLimitedSpawner {
    return &RateLimitedSpawner{
        limiter:  make(chan struct{}, maxConcurrent),
        maxBurst: maxConcurrent,
    }
}

func Spawn(job func()) {
    r.limiter <- struct{}{} // Acquire token

    go func() {
        defer func() { <-r.limiter }() // Release token
        job()
    }()
}

func main() {
    demonstrateRunQueues()

    // Use rate-limited spawner
    fmt.Println("=== Rate-Limited Spawner ===")
    spawner := NewRateLimitedSpawner(runtime.NumCPU() * 2)

    var wg sync.WaitGroup
    start := time.Now()

    for i := 0; i < 100; i++ {
        wg.Add(1)
        spawner.Spawn(func() {
            defer wg.Done()
            time.Sleep(time.Millisecond)
        })
    }

    wg.Wait()
    fmt.Printf("Rate-limited completed in: %v\n", time.Since(start))
}
// run

Preemption Mechanisms

🎯 Practical Analogy: Preemption is like a teacher making sure no student hogs the blackboard. Some students naturally share, but occasionally the teacher needs to interrupt a student who's been writing too long.

Go uses cooperative and asynchronous preemption to prevent goroutines from monopolizing CPU. This ensures that even badly behaved goroutines don't starve others.

Cooperative Preemption

πŸ’‘ Key Takeaway: Before Go 1.14, preemption was cooperative only. Goroutines had to voluntarily yield control, which meant CPU-bound goroutines could starve others if they didn't make function calls.

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync/atomic"
 8    "time"
 9)
10
11// Before Go 1.14: Cooperative preemption only
12// Goroutines must yield
13
14func demonstrateCooperativePreemption() {
15    runtime.GOMAXPROCS(1) // Single P
16
17    done := make(chan bool)
18    counter := atomic.Int64{}
19
20    // CPU-bound goroutine with function calls
21    go func() {
22        for i := 0; i < 10; i++ {
23            // This function call is a preemption point
24            counter.Add(1)
25            runtime.Gosched() // Explicit yield
26        }
27        done <- true
28    }()
29
30    // Another goroutine
31    go func() {
32        for i := 0; i < 10; i++ {
33            counter.Add(1)
34            runtime.Gosched()
35        }
36        done <- true
37    }()
38
39    <-done
40    <-done
41
42    fmt.Printf("Final counter: %d\n", counter.Load())
43}
44
45// Tight loop without preemption points
46func problematicTightLoop() {
47    fmt.Println("\n=== Tight Loop ===")
48    runtime.GOMAXPROCS(1)
49
50    start := time.Now()
51    timeout := time.After(100 * time.Millisecond)
52
53    go func() {
54        // Tight loop - no function calls, no preemption points
55        sum := 0
56        // In Go 1.14+, async preemption saves us
57        for i := 0; i < 100000000; i++ {
58            sum += i
59        }
60        fmt.Printf("Tight loop done: sum=%d\n", sum)
61    }()
62
63    go func() {
64        <-timeout
65        fmt.Printf("Timeout goroutine ran after %v\n", time.Since(start))
66    }()
67
68    time.Sleep(200 * time.Millisecond)
69}
70
71func main() {
72    demonstrateCooperativePreemption()
73    problematicTightLoop()
74}
75// run

Asynchronous Preemption

⚠️ Important: Asynchronous preemption was a game-changer! It solved the "tight loop problem" where CPU-bound goroutines could starve others. The runtime now sends signals to interrupt goroutines that are running too long.

πŸ’‘ Key Takeaway: With async preemption, even tight loops without function calls will be preempted. This makes Go's scheduler much more robust and fair.

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync/atomic"
 8    "time"
 9)
10
11// Go 1.14+ uses signals for asynchronous preemption
12// Goroutines can be preempted even in tight loops
13
14func demonstrateAsyncPreemption() {
15    runtime.GOMAXPROCS(1) // Single P to show preemption
16
17    counter := atomic.Int64{}
18    done := make(chan struct{})
19
20    // CPU-bound goroutine WITHOUT explicit yields
21    go func() {
22        // Tight loop - will be preempted asynchronously
23        for i := 0; i < 1000000; i++ {
24            counter.Add(1)
25            // No runtime.Gosched() needed!
26        }
27        done <- struct{}{}
28    }()
29
30    // Monitor goroutine
31    ticker := time.NewTicker(10 * time.Millisecond)
32    defer ticker.Stop()
33
34    samples := 0
35    for {
36        select {
37        case <-ticker.C:
38            samples++
39            fmt.Printf("Monitor tick %d: counter=%d\n",
40                samples, counter.Load())
41
42        case <-done:
43            fmt.Printf("Completed with %d monitor samples\n", samples)
44            return
45        }
46    }
47}

Real-world Example: Before Go 1.14, Bitcoin mining software written in Go had to add explicit runtime.Gosched() calls to prevent mining goroutines from starving other parts of the system. Now async preemption handles this automatically.

// Production example: CPU-bound task with progress reporting
type CPUBoundTask struct {
progress atomic.Int64
total int64
}

func Run() {
// Simulate CPU-intensive work
for i := int64(0); i < t.total; i++ {
// Complex calculation
result := 1
for j := 0; j < 100; j++ {
result = % 1000000007
}

    t.progress.Add(1)
}

}

func Progress() float64 {
return float64(t.progress.Load()) / float64(t.total) * 100
}

func main() {
fmt.Println("=== Asynchronous Preemption Demo ===")
demonstrateAsyncPreemption()

fmt.Println("\n=== CPU-Bound Task with Progress ===")
task := &CPUBoundTask{total: 1000}

done := make(chan struct{})
go func() {
    task.Run()
    close(done)
}()

// Progress reporter
ticker := time.NewTicker(50 * time.Millisecond)
defer ticker.Stop()

for {
    select {
    case <-ticker.C:
        fmt.Printf("Progress: %.1f%%\n", task.Progress())
    case <-done:
        fmt.Printf("Completed: %.1f%%\n", task.Progress())
        return
    }
}

}
// run


## Garbage Collection

**🎯 Practical Analogy**: Go's garbage collector is like an automated cleaning crew that continuously monitors your workspace, identifies what you're no longer using, and cleans it up while you continue working. This is unlike languages where you have to manually throw things away or stop everything to clean.

Go uses a concurrent, tri-color mark-and-sweep garbage collector that runs in parallel with your program, minimizing pause times.

### Tricolor Marking Algorithm

**πŸ’‘ Key Takeaway**: The tri-color algorithm allows the GC to run concurrently with your program. Objects are colored white, gray, or black. This enables incremental collection without stopping the world.

```go
// run
package main

import (
    "fmt"
    "runtime"
    "runtime/debug"
    "time"
)

// Understanding GC phases and timing
func demonstrateGCPhases() {
    // Configure GC
    debug.SetGCPercent(100) // Default: trigger at 100% heap growth

    var m runtime.MemStats

    // Allocate memory
    fmt.Println("=== Phase 1: Allocation ===")
    runtime.ReadMemStats(&m)
    fmt.Printf("Alloc before: %d MB\n", m.Alloc/1024/1024)

    // Allocate large objects
    data := make([][]byte, 100)
    for i := range data {
        data[i] = make([]byte, 1024*1024) // 1 MB each
    }

    runtime.ReadMemStats(&m)
    fmt.Printf("Alloc after: %d MB\n", m.Alloc/1024/1024)
    fmt.Printf("NumGC: %d\n\n", m.NumGC)

    // Force GC
    fmt.Println("=== Phase 2: Forced GC ===")
    start := time.Now()
    runtime.GC()
    gcTime := time.Since(start)

    runtime.ReadMemStats(&m)
    fmt.Printf("GC completed in: %v\n", gcTime)
    fmt.Printf("Alloc after GC: %d MB\n", m.Alloc/1024/1024)
    fmt.Printf("NumGC: %d\n", m.NumGC)
    fmt.Printf("PauseNs: %v\n\n", time.Duration(m.PauseNs[(m.NumGC+255)%256]))

    // Keep data alive
    _ = data
}

⚠️ Important: Go's GC is designed for low latency rather than maximum throughput. This makes it ideal for web services and real-time systems where pause times matter more than raw collection speed.

Real-world Example: Uber's ride-sharing service relies on Go's low-latency GC to handle millions of concurrent ride requests without noticeable pauses that could affect ride matching and pricing algorithms.

// GC pacing and tuning
type GCMonitor struct {
lastGC uint32
gcCount int
pauseTimes []time.Duration
}

func NewGCMonitor() *GCMonitor {
return &GCMonitor{
pauseTimes: make([]time.Duration, 0, 100),
}
}

func Check() {
var m runtime.MemStats
runtime.ReadMemStats(&m)

if m.NumGC > g.lastGC {
    // New GC occurred
    newGCs := int(m.NumGC - g.lastGC)
    g.gcCount += newGCs

    // Record pause times
    for i := 0; i < newGCs; i++ {
        idx := + 255) % 256
        pause := time.Duration(m.PauseNs[idx])
        g.pauseTimes = append(g.pauseTimes, pause)
    }

    g.lastGC = m.NumGC
}

}

func Report() {
if len(g.pauseTimes) == 0 {
fmt.Println("No GC events recorded")
return
}

var total time.Duration
var max time.Duration

for _, p := range g.pauseTimes {
    total += p
    if p > max {
        max = p
    }
}

avg := total / time.Duration(len(g.pauseTimes))

fmt.Printf("\n=== GC Monitor Report ===\n")
fmt.Printf("Total GC cycles: %d\n", g.gcCount)
fmt.Printf("Pause times:\n")
fmt.Printf("  - Average: %v\n", avg)
fmt.Printf("  - Maximum: %v\n", max)
fmt.Printf("  - Total: %v\n", total)

}

func main() {
demonstrateGCPhases()

// Monitor GC during workload
monitor := NewGCMonitor()

fmt.Println("=== Running Workload with GC Monitoring ===")
for i := 0; i < 50; i++ {
    // Allocate and discard
    _ = make([]byte, 1024*1024)

    if i%10 == 0 {
        monitor.Check()
    }
}

runtime.GC()
monitor.Check()
monitor.Report()

}
// run


### Write Barriers

```go
// run
package main

import (
    "fmt"
    "runtime"
    "sync"
    "time"
)

// Write barriers protect heap pointers during concurrent marking
// They ensure the GC doesn't miss newly created references

type Node struct {
    Value int
    Next  *Node
}

// Demonstrating pointer updates during GC
func demonstrateWriteBarriers() {
    runtime.GOMAXPROCS(2)

    // Create linked list
    head := &Node{Value: 1}
    current := head

    for i := 2; i <= 100; i++ {
        current.Next = &Node{Value: i}
        current = current.Next
    }

    var wg sync.WaitGroup

    // Goroutine 1: Modify list
    wg.Add(1)
    go func() {
        defer wg.Done()

        current := head
        for current != nil {
            // Update pointer
            if current.Next != nil {
                current.Next = &Node{
                    Value: current.Next.Value * 2,
                    Next:  current.Next.Next,
                }
            }
            current = current.Next
            time.Sleep(time.Microsecond)
        }
    }()

    // Goroutine 2: Trigger GCs
    wg.Add(1)
    go func() {
        defer wg.Done()

        for i := 0; i < 10; i++ {
            runtime.GC()
            time.Sleep(time.Millisecond)
        }
    }()

    wg.Wait()

    // Verify list integrity
    count := 0
    current = head
    for current != nil {
        count++
        current = current.Next
    }

    fmt.Printf("List nodes after concurrent modification: %d\n", count)
}

// Production pattern: GC-friendly data structures
type GCFriendlyCache struct {
    mu    sync.RWMutex
    items map[string]*cacheEntry
}

type cacheEntry struct {
    value      interface{}
    expiration time.Time
}

func NewGCFriendlyCache() *GCFriendlyCache {
    cache := &GCFriendlyCache{
        items: make(map[string]*cacheEntry),
    }

    // Periodic cleanup to help GC
    go cache.cleanup()

    return cache
}

func Set(key string, value interface{}, ttl time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()

    c.items[key] = &cacheEntry{
        value:      value,
        expiration: time.Now().Add(ttl),
    }
}

func Get(key string) {
    c.mu.RLock()
    defer c.mu.RUnlock()

    entry, ok := c.items[key]
    if !ok || time.Now().After(entry.expiration) {
        return nil, false
    }

    return entry.value, true
}

func cleanup() {
    ticker := time.NewTicker(time.Minute)
    defer ticker.Stop()

    for range ticker.C {
        c.mu.Lock()
        now := time.Now()

        // Remove expired entries to reduce GC pressure
        for key, entry := range c.items {
            if now.After(entry.expiration) {
                delete(c.items, key)
            }
        }

        c.mu.Unlock()

        // Give GC a chance to run
        runtime.Gosched()
    }
}

func main() {
    demonstrateWriteBarriers()

    // Test GC-friendly cache
    fmt.Println("\n=== GC-Friendly Cache ===")
    cache := NewGCFriendlyCache()

    // Add items
    for i := 0; i < 1000; i++ {
        cache.Set(fmt.Sprintf("key%d", i), i, time.Second)
    }

    // Trigger GC
    runtime.GC()

    // Verify
    if val, ok := cache.Get("key500"); ok {
        fmt.Printf("Retrieved value: %v\n", val)
    }
}
// run

GC Tuning

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "runtime/debug"
  8    "time"
  9)
 10
 11// GOGC tuning strategies
 12func demonstrateGOGCTuning() {
 13    scenarios := []struct {
 14        name    string
 15        gogc    int
 16        memMB   int
 17    }{
 18        {"Aggressive GC", 50, 100},   // GC at 50% growth
 19        {"Default GC", 100, 100},     // GC at 100% growth
 20        {"Relaxed GC", 200, 100},     // GC at 200% growth
 21        {"Minimal GC", 500, 100},     // GC at 500% growth
 22    }
 23
 24    for _, scenario := range scenarios {
 25        fmt.Printf("\n=== %s ===\n", scenario.name, scenario.gogc)
 26
 27        // Reset GC
 28        debug.SetGCPercent(scenario.gogc)
 29        runtime.GC()
 30
 31        var m runtime.MemStats
 32        runtime.ReadMemStats(&m)
 33        startGC := m.NumGC
 34
 35        start := time.Now()
 36
 37        // Allocate memory
 38        data := make([][]byte, scenario.memMB)
 39        for i := 0; i < scenario.memMB; i++ {
 40            data[i] = make([]byte, 1024*1024)
 41        }
 42
 43        elapsed := time.Since(start)
 44
 45        runtime.ReadMemStats(&m)
 46        gcCount := m.NumGC - startGC
 47
 48        fmt.Printf("Allocation time: %v\n", elapsed)
 49        fmt.Printf("GC cycles: %d\n", gcCount)
 50        fmt.Printf("Alloc: %d MB\n", m.Alloc/1024/1024)
 51
 52        // Keep data alive
 53        _ = data[0]
 54    }
 55
 56    // Restore default
 57    debug.SetGCPercent(100)
 58}
 59
 60// GOMEMLIMIT - Soft memory limit
 61func demonstrateMemoryLimit() {
 62    fmt.Println("\n=== Memory Limit ===")
 63
 64    // Set soft memory limit to 100 MB
 65    // In production: export GOMEMLIMIT=100MiB
 66    limit := int64(100 * 1024 * 1024)
 67    debug.SetMemoryLimit(limit)
 68
 69    var m runtime.MemStats
 70    runtime.ReadMemStats(&m)
 71
 72    fmt.Printf("Memory limit: %d MB\n", limit/1024/1024)
 73    fmt.Printf("Current alloc: %d MB\n", m.Alloc/1024/1024)
 74
 75    // Try to allocate beyond limit
 76    data := make([][]byte, 0, 200)
 77
 78    for i := 0; i < 200; i++ {
 79        data = append(data, make([]byte, 1024*1024))
 80
 81        if i%20 == 0 {
 82            runtime.ReadMemStats(&m)
 83            fmt.Printf("Allocated %d MB, GC cycles: %d\n",
 84               , m.NumGC)
 85        }
 86    }
 87
 88    runtime.ReadMemStats(&m)
 89    fmt.Printf("Final alloc: %d MB\n", m.Alloc/1024/1024)
 90    fmt.Printf("Total GC cycles: %d\n", m.NumGC)
 91
 92    _ = data[0]
 93}
 94
 95// Production pattern: Memory-aware caching
 96type MemoryAwareCache struct {
 97    items     map[string][]byte
 98    maxMemory uint64
 99}
100
101func NewMemoryAwareCache(maxMemoryMB uint64) *MemoryAwareCache {
102    return &MemoryAwareCache{
103        items:     make(map[string][]byte),
104        maxMemory: maxMemoryMB * 1024 * 1024,
105    }
106}
107
108func Set(key string, value []byte) bool {
109    var m runtime.MemStats
110    runtime.ReadMemStats(&m)
111
112    if m.Alloc >= c.maxMemory {
113        // Memory pressure - reject
114        return false
115    }
116
117    c.items[key] = value
118    return true
119}
120
121func Get(key string) {
122    val, ok := c.items[key]
123    return val, ok
124}
125
126func main() {
127    demonstrateGOGCTuning()
128    demonstrateMemoryLimit()
129
130    // Test memory-aware cache
131    fmt.Println("\n=== Memory-Aware Cache ===")
132    cache := NewMemoryAwareCache(50) // 50 MB limit
133
134    for i := 0; i < 100; i++ {
135        data := make([]byte, 1024*1024)
136        if !cache.Set(fmt.Sprintf("key%d", i), data) {
137            fmt.Printf("Cache rejected at item %d\n", i)
138            break
139        }
140    }
141}
142// run

Memory Allocator

Go's memory allocator is based on TCMalloc, optimized for concurrent allocation.

Allocator Components

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7)
  8
  9// Memory allocator hierarchy:
 10// 1. mcache - Per-P cache
 11// 2. mcentral - Central cache per size class
 12// 3. mheap - Global heap
 13
 14func demonstrateAllocatorBehavior() {
 15    var m runtime.MemStats
 16
 17    // Small object allocation
 18    fmt.Println("=== Small Object Allocation ===")
 19    runtime.ReadMemStats(&m)
 20    baseAlloc := m.TotalAlloc
 21
 22    smallObjects := make([][]byte, 1000)
 23    for i := range smallObjects {
 24        smallObjects[i] = make([]byte, 1024) // 1 KB
 25    }
 26
 27    runtime.ReadMemStats(&m)
 28    fmt.Printf("Allocated: %d KB\n",/1024)
 29    fmt.Printf("Mallocs: %d\n", m.Mallocs)
 30    fmt.Printf("HeapObjects: %d\n\n", m.HeapObjects)
 31
 32    // Large object allocation
 33    fmt.Println("=== Large Object Allocation ===")
 34    runtime.ReadMemStats(&m)
 35    baseAlloc = m.TotalAlloc
 36
 37    largeObjects := make([][]byte, 100)
 38    for i := range largeObjects {
 39        largeObjects[i] = make([]byte, 64*1024) // 64 KB
 40    }
 41
 42    runtime.ReadMemStats(&m)
 43    fmt.Printf("Allocated: %d KB\n",/1024)
 44    fmt.Printf("Mallocs: %d\n", m.Mallocs)
 45    fmt.Printf("HeapObjects: %d\n", m.HeapObjects)
 46
 47    // Keep alive
 48    _ = smallObjects[0]
 49    _ = largeObjects[0]
 50}
 51
 52// Size classes demonstration
 53func demonstrateSizeClasses() {
 54    fmt.Println("\n=== Size Class Efficiency ===")
 55
 56    sizes := []int{8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192}
 57
 58    for _, size := range sizes {
 59        var m1, m2 runtime.MemStats
 60        runtime.GC()
 61        runtime.ReadMemStats(&m1)
 62
 63        // Allocate many objects of this size
 64        objects := make([][]byte, 1000)
 65        for i := range objects {
 66            objects[i] = make([]byte, size)
 67        }
 68
 69        runtime.ReadMemStats(&m2)
 70
 71        actualSize := / 1000
 72        overhead := float64(actualSize-uint64(size)) / float64(size) * 100
 73
 74        fmt.Printf("Size %5d: actual %5d bytes\n",
 75            size, actualSize, overhead)
 76
 77        _ = objects[0]
 78    }
 79}
 80
 81// Production pattern: Object pooling
 82type Buffer struct {
 83    data []byte
 84}
 85
 86var bufferPool = sync.Pool{
 87    New: func() interface{} {
 88        return &Buffer{
 89            data: make([]byte, 4096),
 90        }
 91    },
 92}
 93
 94func GetBuffer() *Buffer {
 95    return bufferPool.Get().(*Buffer)
 96}
 97
 98func PutBuffer(b *Buffer) {
 99    // Reset before returning to pool
100    b.data = b.data[:0]
101    bufferPool.Put(b)
102}
103
104// Compare pooled vs unpooled allocation
105func benchmarkBufferAllocation() {
106    fmt.Println("\n=== Buffer Allocation Benchmark ===")
107
108    const iterations = 10000
109
110    // Without pooling
111    var m1 runtime.MemStats
112    runtime.GC()
113    runtime.ReadMemStats(&m1)
114
115    for i := 0; i < iterations; i++ {
116        buf := &Buffer{data: make([]byte, 4096)}
117        _ = buf
118    }
119
120    var m2 runtime.MemStats
121    runtime.ReadMemStats(&m2)
122
123    fmt.Printf("Without pooling:\n")
124    fmt.Printf("  Allocations: %d\n", m2.Mallocs-m1.Mallocs)
125    fmt.Printf("  Memory: %d KB\n",/1024)
126
127    // With pooling
128    runtime.GC()
129    runtime.ReadMemStats(&m1)
130
131    for i := 0; i < iterations; i++ {
132        buf := GetBuffer()
133        PutBuffer(buf)
134    }
135
136    runtime.ReadMemStats(&m2)
137
138    fmt.Printf("With pooling:\n")
139    fmt.Printf("  Allocations: %d\n", m2.Mallocs-m1.Mallocs)
140    fmt.Printf("  Memory: %d KB\n",/1024)
141}
142
143func main() {
144    demonstrateAllocatorBehavior()
145    demonstrateSizeClasses()
146    benchmarkBufferAllocation()
147}
148// run

Stack Management

Goroutine stacks start small and grow as needed.

Stack Growth

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8)
  9
 10// Stack growth demonstration
 11func recursiveFunction(depth int, maxDepth int) int {
 12    if depth >= maxDepth {
 13        // Capture stack size info
 14        var buf [64]byte
 15        n := runtime.Stack(buf[:], false)
 16        fmt.Printf("Stack trace size at depth %d: %d bytes\n", depth, n)
 17        return depth
 18    }
 19
 20    // Allocate on stack
 21    var localArray [1024]byte
 22    localArray[0] = byte(depth)
 23
 24    return recursiveFunction(depth+1, maxDepth) + int(localArray[0])
 25}
 26
 27func demonstrateStackGrowth() {
 28    fmt.Println("=== Stack Growth ===")
 29
 30    var m1, m2 runtime.MemStats
 31    runtime.ReadMemStats(&m1)
 32
 33    // Deep recursion forces stack growth
 34    result := recursiveFunction(0, 100)
 35
 36    runtime.ReadMemStats(&m2)
 37
 38    fmt.Printf("Result: %d\n", result)
 39    fmt.Printf("Stack spans: %d\n", m2.StackSys)
 40}
 41
 42// Stack copying during growth
 43func demonstrateStackCopying() {
 44    fmt.Println("\n=== Stack Copying ===")
 45
 46    var wg sync.WaitGroup
 47
 48    // Create goroutines with different stack needs
 49    for i := 0; i < 5; i++ {
 50        wg.Add(1)
 51        go func(id int) {
 52            defer wg.Done()
 53
 54            // Variable depth recursion
 55            depth := id * 20
 56            result := recursiveFunction(0, depth)
 57            fmt.Printf("Goroutine %d: result=%d\n",
 58                id, depth, result)
 59        }(i)
 60    }
 61
 62    wg.Wait()
 63}
 64
 65// Production pattern: Tail call optimization workaround
 66type StackFrame struct {
 67    value int
 68    next  *StackFrame
 69}
 70
 71// Instead of recursion, use iteration
 72func iterativeSum(start, end int) int {
 73    sum := 0
 74    for i := start; i <= end; i++ {
 75        sum += i
 76    }
 77    return sum
 78}
 79
 80// Compare stack usage
 81func compareStackUsage() {
 82    fmt.Println("\n=== Stack Usage Comparison ===")
 83
 84    // Recursive approach
 85    var recursiveSum func(int, int) int
 86    recursiveSum = func(start, end int) int {
 87        if start > end {
 88            return 0
 89        }
 90        return start + recursiveSum(start+1, end)
 91    }
 92
 93    const limit = 10000
 94
 95    // Iterative
 96    result1 := iterativeSum(1, limit)
 97    fmt.Printf("Iterative sum(1..%d) = %d\n", limit, result1)
 98
 99    // Recursive - be careful with large N
100    // result2 := recursiveSum(1, limit) // May overflow!
101    result2 := recursiveSum(1, 100) // Safe smaller N
102    fmt.Printf("Recursive sum(1..100) = %d\n", result2)
103}
104
105// Stack size monitoring
106func monitorStackSize() {
107    fmt.Println("\n=== Stack Size Monitoring ===")
108
109    var m runtime.MemStats
110
111    runtime.ReadMemStats(&m)
112    fmt.Printf("Initial StackInuse: %d KB\n", m.StackInuse/1024)
113    fmt.Printf("Initial StackSys: %d KB\n", m.StackSys/1024)
114
115    // Create goroutines
116    var wg sync.WaitGroup
117    for i := 0; i < 1000; i++ {
118        wg.Add(1)
119        go func() {
120            defer wg.Done()
121
122            // Some stack usage
123            var buf [4096]byte
124            buf[0] = 1
125
126            // Block to keep goroutine alive
127            <-time.After(time.Millisecond)
128
129            _ = buf
130        }()
131    }
132
133    runtime.ReadMemStats(&m)
134    fmt.Printf("With 1000 goroutines:\n")
135    fmt.Printf("  StackInuse: %d KB\n", m.StackInuse/1024)
136    fmt.Printf("  StackSys: %d KB\n", m.StackSys/1024)
137
138    wg.Wait()
139
140    runtime.GC()
141    runtime.ReadMemStats(&m)
142    fmt.Printf("After goroutines exit:\n")
143    fmt.Printf("  StackInuse: %d KB\n", m.StackInuse/1024)
144}
145
146func main() {
147    demonstrateStackGrowth()
148    demonstrateStackCopying()
149    compareStackUsage()
150    monitorStackSize()
151}
152// run

Runtime Metrics

MemStats Deep Dive

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "time"
  8)
  9
 10// Comprehensive memory statistics
 11func printDetailedMemStats() {
 12    var m runtime.MemStats
 13    runtime.ReadMemStats(&m)
 14
 15    fmt.Println("=== Memory Statistics ===")
 16
 17    // General
 18    fmt.Printf("\nGeneral:\n")
 19    fmt.Printf("  Alloc: %d MB\n", m.Alloc/1024/1024)
 20    fmt.Printf("  TotalAlloc: %d MB\n", m.TotalAlloc/1024/1024)
 21    fmt.Printf("  Sys: %d MB\n", m.Sys/1024/1024)
 22    fmt.Printf("  NumGC: %d\n", m.NumGC)
 23
 24    // Heap
 25    fmt.Printf("\nHeap:\n")
 26    fmt.Printf("  HeapAlloc: %d MB\n", m.HeapAlloc/1024/1024)
 27    fmt.Printf("  HeapSys: %d MB\n", m.HeapSys/1024/1024)
 28    fmt.Printf("  HeapIdle: %d MB\n", m.HeapIdle/1024/1024)
 29    fmt.Printf("  HeapInuse: %d MB\n", m.HeapInuse/1024/1024)
 30    fmt.Printf("  HeapReleased: %d MB\n", m.HeapReleased/1024/1024)
 31    fmt.Printf("  HeapObjects: %d\n", m.HeapObjects)
 32
 33    // Stack
 34    fmt.Printf("\nStack:\n")
 35    fmt.Printf("  StackInuse: %d KB\n", m.StackInuse/1024)
 36    fmt.Printf("  StackSys: %d KB\n", m.StackSys/1024)
 37
 38    // GC
 39    fmt.Printf("\nGC:\n")
 40    fmt.Printf("  NextGC: %d MB\n", m.NextGC/1024/1024)
 41    fmt.Printf("  LastGC: %v ago\n", time.Since(time.Unix(0, int64(m.LastGC))))
 42    fmt.Printf("  PauseTotalNs: %v\n", time.Duration(m.PauseTotalNs))
 43
 44    if m.NumGC > 0 {
 45        fmt.Printf("  Last pause: %v\n",
 46            time.Duration(m.PauseNs[(m.NumGC+255)%256]))
 47    }
 48
 49    // Allocation rates
 50    fmt.Printf("\nAllocation:\n")
 51    fmt.Printf("  Mallocs: %d\n", m.Mallocs)
 52    fmt.Printf("  Frees: %d\n", m.Frees)
 53    fmt.Printf("  Live objects: %d\n", m.Mallocs-m.Frees)
 54}
 55
 56// Production pattern: Metrics collector
 57type RuntimeMetrics struct {
 58    timestamp     time.Time
 59    goroutines    int
 60    allocMB       uint64
 61    totalAllocMB  uint64
 62    heapAllocMB   uint64
 63    numGC         uint32
 64    pauseTotalMs  uint64
 65}
 66
 67func CollectMetrics() RuntimeMetrics {
 68    var m runtime.MemStats
 69    runtime.ReadMemStats(&m)
 70
 71    return RuntimeMetrics{
 72        timestamp:     time.Now(),
 73        goroutines:    runtime.NumGoroutine(),
 74        allocMB:       m.Alloc / 1024 / 1024,
 75        totalAllocMB:  m.TotalAlloc / 1024 / 1024,
 76        heapAllocMB:   m.HeapAlloc / 1024 / 1024,
 77        numGC:         m.NumGC,
 78        pauseTotalMs:  m.PauseTotalNs / 1000000,
 79    }
 80}
 81
 82type MetricsHistory struct {
 83    samples []RuntimeMetrics
 84}
 85
 86func NewMetricsHistory() *MetricsHistory {
 87    return &MetricsHistory{
 88        samples: make([]RuntimeMetrics, 0, 100),
 89    }
 90}
 91
 92func Record() {
 93    h.samples = append(h.samples, CollectMetrics())
 94}
 95
 96func Report() {
 97    if len(h.samples) < 2 {
 98        fmt.Println("Not enough samples")
 99        return
100    }
101
102    first := h.samples[0]
103    last := h.samples[len(h.samples)-1]
104    duration := last.timestamp.Sub(first.timestamp)
105
106    fmt.Printf("\n=== Runtime Metrics Report ===\n")
107    fmt.Printf("Duration: %v\n", duration)
108    fmt.Printf("Samples: %d\n\n", len(h.samples))
109
110    fmt.Printf("Goroutines: %d -> %d\n",
111        first.goroutines, last.goroutines)
112    fmt.Printf("HeapAlloc: %d MB -> %d MB\n",
113        first.heapAllocMB, last.heapAllocMB)
114    fmt.Printf("TotalAlloc: %d MB\n",
115        last.totalAllocMB,
116        float64(last.totalAllocMB-first.totalAllocMB)/duration.Seconds())
117    fmt.Printf("GC cycles: %d\n",
118        last.numGC-first.numGC,
119        float64(last.numGC-first.numGC)/duration.Seconds())
120    fmt.Printf("GC pause: %d ms total\n",
121        last.pauseTotalMs-first.pauseTotalMs)
122}
123
124func main() {
125    printDetailedMemStats()
126
127    // Collect metrics during workload
128    history := NewMetricsHistory()
129    history.Record()
130
131    // Simulate workload
132    for i := 0; i < 10; i++ {
133        // Allocate
134        data := make([]byte, 10*1024*1024)
135        _ = data
136
137        time.Sleep(100 * time.Millisecond)
138        history.Record()
139    }
140
141    history.Report()
142}
143// run

Performance Tuning

GOMAXPROCS Tuning

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "sync"
 8    "time"
 9)
10
11// Understanding GOMAXPROCS impact
12func benchmarkGOMAXPROCS() {
13    values := []int{1, 2, 4, 8}
14
15    for _, procs := range values {
16        runtime.GOMAXPROCS(procs)
17
18        start := time.Now()
19        var wg sync.WaitGroup
20
21        // CPU-bound work
22        for i := 0; i < 100; i++ {
23            wg.Add(1)
24            go func() {
25                defer wg.Done()
26
27                sum := 0
28                for j := 0; j < 1000000; j++ {
29                    sum += j
30                }
31            }()
32        }
33
34        wg.Wait()
35        elapsed := time.Since(start)
36
37        fmt.Printf("GOMAXPROCS=%d: %v\n", procs, elapsed)
38    }
39}
40
41// Production pattern: Dynamic GOMAXPROCS
42func optimizeGOMAXPROCS() {
43    numCPU := runtime.NumCPU()
44
45    // Rule of thumb:
46    // - CPU-bound: GOMAXPROCS = NumCPU
47    // - I/O-bound: GOMAXPROCS = NumCPU * 2
48    // - Mixed: Start with NumCPU, tune based on profiling
49
50    cpuBound := numCPU
51    ioBound := numCPU * 2
52
53    fmt.Printf("CPUs: %d\n", numCPU)
54    fmt.Printf("Recommended for CPU-bound: %d\n", cpuBound)
55    fmt.Printf("Recommended for I/O-bound: %d\n", ioBound)
56
57    runtime.GOMAXPROCS(cpuBound)
58}
59
60func main() {
61    fmt.Println("=== GOMAXPROCS Benchmark ===")
62    benchmarkGOMAXPROCS()
63
64    fmt.Println("\n=== GOMAXPROCS Optimization ===")
65    optimizeGOMAXPROCS()
66}
67// run

Profiling Integration

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "runtime/pprof"
  8    "os"
  9    "time"
 10)
 11
 12// CPU profiling
 13func runWithCPUProfile(filename string, work func()) {
 14    f, err := os.Create(filename)
 15    if err != nil {
 16        panic(err)
 17    }
 18    defer f.Close()
 19
 20    if err := pprof.StartCPUProfile(f); err != nil {
 21        panic(err)
 22    }
 23    defer pprof.StopCPUProfile()
 24
 25    work()
 26}
 27
 28// Memory profiling
 29func writeMemProfile(filename string) {
 30    f, err := os.Create(filename)
 31    if err != nil {
 32        panic(err)
 33    }
 34    defer f.Close()
 35
 36    runtime.GC() // Get up-to-date statistics
 37    if err := pprof.WriteHeapProfile(f); err != nil {
 38        panic(err)
 39    }
 40}
 41
 42// Goroutine profiling
 43func writeGoroutineProfile(filename string) {
 44    f, err := os.Create(filename)
 45    if err != nil {
 46        panic(err)
 47    }
 48    defer f.Close()
 49
 50    if err := pprof.Lookup("goroutine").WriteTo(f, 0); err != nil {
 51        panic(err)
 52    }
 53}
 54
 55// Example workload
 56func cpuIntensiveWork() {
 57    sum := 0
 58    for i := 0; i < 10000000; i++ {
 59        sum += i * i
 60    }
 61}
 62
 63func memoryIntensiveWork() {
 64    data := make([][]byte, 1000)
 65    for i := range data {
 66        data[i] = make([]byte, 1024*1024)
 67    }
 68    _ = data[0]
 69}
 70
 71func main() {
 72    fmt.Println("=== Profiling Example ===")
 73
 74    // CPU profile
 75    fmt.Println("Running CPU profile...")
 76    runWithCPUProfile("cpu.prof", func() {
 77        for i := 0; i < 10; i++ {
 78            cpuIntensiveWork()
 79        }
 80    })
 81    fmt.Println("CPU profile written to cpu.prof")
 82    fmt.Println("Analyze with: go tool pprof cpu.prof")
 83
 84    // Memory profile
 85    fmt.Println("\nRunning memory profile...")
 86    memoryIntensiveWork()
 87    writeMemProfile("mem.prof")
 88    fmt.Println("Memory profile written to mem.prof")
 89    fmt.Println("Analyze with: go tool pprof mem.prof")
 90
 91    // Goroutine profile
 92    fmt.Println("\nCreating goroutines...")
 93    for i := 0; i < 100; i++ {
 94        go func() {
 95            time.Sleep(time.Hour)
 96        }()
 97    }
 98
 99    writeGoroutineProfile("goroutine.prof")
100    fmt.Println("Goroutine profile written to goroutine.prof")
101    fmt.Println("Analyze with: go tool pprof goroutine.prof")
102}
103// run

Advanced Runtime Debugging

GODEBUG Environment Variable

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "os"
 7    "runtime"
 8)
 9
10// GODEBUG provides runtime debugging options
11func demonstrateGODEBUG() {
12    fmt.Println("=== GODEBUG Options ===\n")
13
14    // Common GODEBUG settings:
15    fmt.Println("Available GODEBUG settings:")
16    fmt.Println("  gctrace=1          - Print GC events")
17    fmt.Println("  schedtrace=X       - Print scheduler events every X ms")
18    fmt.Println("  scheddetail=1      - Detailed scheduler trace")
19    fmt.Println("  allocfreetrace=1   - Print allocation/free events")
20    fmt.Println("  madvdontneed=1     - Memory advising behavior")
21
22    // Check current GODEBUG
23    if godebug := os.Getenv("GODEBUG"); godebug != "" {
24        fmt.Printf("\nCurrent GODEBUG: %s\n", godebug)
25    } else {
26        fmt.Println("\nGODEBUG not set")
27        fmt.Println("\nTo enable GC tracing:")
28        fmt.Println("  export GODEBUG=gctrace=1")
29        fmt.Println("  go run main.go")
30    }
31}
32
33// Scheduler tracing
34func demonstrateSchedulerTrace() {
35    fmt.Println("\n=== Scheduler Information ===")
36
37    fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
38    fmt.Printf("NumCPU: %d\n", runtime.NumCPU())
39    fmt.Printf("NumGoroutine: %d\n", runtime.NumGoroutine())
40    fmt.Printf("Version: %s\n", runtime.Version())
41
42    fmt.Println("\nTo see scheduler trace:")
43    fmt.Println("  export GODEBUG=schedtrace=1000")
44    fmt.Println("  go run main.go")
45}
46
47func main() {
48    demonstrateGODEBUG()
49    demonstrateSchedulerTrace()
50}
51// run

Runtime Tracing

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "os"
 7    "runtime/trace"
 8    "sync"
 9    "time"
10)
11
12// Execution tracing for detailed analysis
13func runWithTrace(filename string, work func()) {
14    f, err := os.Create(filename)
15    if err != nil {
16        panic(err)
17    }
18    defer f.Close()
19
20    if err := trace.Start(f); err != nil {
21        panic(err)
22    }
23    defer trace.Stop()
24
25    work()
26}
27
28// Example traced workload
29func tracedWorkload() {
30    var wg sync.WaitGroup
31
32    // Create multiple goroutines
33    for i := 0; i < 10; i++ {
34        wg.Add(1)
35        go func(id int) {
36            defer wg.Done()
37
38            // Create trace region
39            ctx := context.Background()
40            defer trace.StartRegion(ctx, fmt.Sprintf("worker-%d", id)).End()
41
42            // Simulate work
43            time.Sleep(time.Duration(id) * 10 * time.Millisecond)
44
45            // CPU work
46            sum := 0
47            for j := 0; j < 1000000; j++ {
48                sum += j
49            }
50        }(i)
51    }
52
53    wg.Wait()
54}
55
56func main() {
57    fmt.Println("=== Runtime Tracing Example ===")
58
59    fmt.Println("Generating execution trace...")
60    runWithTrace("trace.out", tracedWorkload)
61
62    fmt.Println("Trace written to trace.out")
63    fmt.Println("\nAnalyze with:")
64    fmt.Println("  go tool trace trace.out")
65    fmt.Println("\nThis opens a web browser with:")
66    fmt.Println("  - Goroutine timeline")
67    fmt.Println("  - Scheduler events")
68    fmt.Println("  - GC events")
69    fmt.Println("  - User-defined regions")
70}
71// run

Production Best Practices

Runtime Monitoring

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "time"
  9)
 10
 11// Production-ready runtime monitor
 12type RuntimeMonitor struct {
 13    interval time.Duration
 14    ctx      context.Context
 15    cancel   context.CancelFunc
 16}
 17
 18func NewRuntimeMonitor(interval time.Duration) *RuntimeMonitor {
 19    ctx, cancel := context.WithCancel(context.Background())
 20
 21    return &RuntimeMonitor{
 22        interval: interval,
 23        ctx:      ctx,
 24        cancel:   cancel,
 25    }
 26}
 27
 28func Start() {
 29    go rm.monitor()
 30}
 31
 32func Stop() {
 33    rm.cancel()
 34}
 35
 36func monitor() {
 37    ticker := time.NewTicker(rm.interval)
 38    defer ticker.Stop()
 39
 40    for {
 41        select {
 42        case <-rm.ctx.Done():
 43            return
 44
 45        case <-ticker.C:
 46            rm.collectAndReport()
 47        }
 48    }
 49}
 50
 51func collectAndReport() {
 52    var m runtime.MemStats
 53    runtime.ReadMemStats(&m)
 54
 55    // Log key metrics
 56    fmt.Printf("[%s] Runtime Metrics:\n", time.Now().Format("15:04:05"))
 57    fmt.Printf("  Goroutines: %d\n", runtime.NumGoroutine())
 58    fmt.Printf("  Memory: %d MB\n", m.Alloc/1024/1024)
 59    fmt.Printf("  GC: %d cycles, %v total pause\n",
 60        m.NumGC, time.Duration(m.PauseTotalNs))
 61
 62    // Alert on thresholds
 63    if runtime.NumGoroutine() > 10000 {
 64        fmt.Println("  ⚠️  HIGH GOROUTINE COUNT!")
 65    }
 66
 67    if m.Alloc > 1024*1024*1024 { // 1 GB
 68        fmt.Println("  ⚠️  HIGH MEMORY USAGE!")
 69    }
 70}
 71
 72// Health check endpoint data
 73type HealthStatus struct {
 74    Goroutines int    `json:"goroutines"`
 75    MemoryMB   uint64 `json:"memory_mb"`
 76    NumGC      uint32 `json:"num_gc"`
 77    Uptime     string `json:"uptime"`
 78}
 79
 80var startTime = time.Now()
 81
 82func GetHealthStatus() HealthStatus {
 83    var m runtime.MemStats
 84    runtime.ReadMemStats(&m)
 85
 86    return HealthStatus{
 87        Goroutines: runtime.NumGoroutine(),
 88        MemoryMB:   m.Alloc / 1024 / 1024,
 89        NumGC:      m.NumGC,
 90        Uptime:     time.Since(startTime).String(),
 91    }
 92}
 93
 94func main() {
 95    fmt.Println("=== Production Runtime Monitoring ===\n")
 96
 97    // Start monitor
 98    monitor := NewRuntimeMonitor(2 * time.Second)
 99    monitor.Start()
100    defer monitor.Stop()
101
102    // Simulate application work
103    for i := 0; i < 5; i++ {
104        go func() {
105            time.Sleep(10 * time.Second)
106        }()
107
108        // Allocate some memory
109        _ = make([]byte, 10*1024*1024)
110
111        time.Sleep(time.Second)
112    }
113
114    // Wait for monitoring samples
115    time.Sleep(6 * time.Second)
116
117    // Health status
118    status := GetHealthStatus()
119    fmt.Printf("\n=== Health Status ===\n")
120    fmt.Printf("Goroutines: %d\n", status.Goroutines)
121    fmt.Printf("Memory: %d MB\n", status.MemoryMB)
122    fmt.Printf("GC Cycles: %d\n", status.NumGC)
123    fmt.Printf("Uptime: %s\n", status.Uptime)
124}
125// run

Advanced Memory Management Patterns

Understanding memory management at the runtime level enables you to build systems that perform optimally under load. This section explores advanced patterns for memory optimization, custom allocators, and zero-allocation techniques.

Memory Pooling and Object Reuse

Memory pooling reduces GC pressure by reusing objects instead of allocating new ones. The sync.Pool provides a way to cache objects between GC cycles.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "time"
  9)
 10
 11// Large buffer type that's expensive to allocate
 12type Buffer struct {
 13    data []byte
 14}
 15
 16// Pool-based buffer manager
 17type BufferPool struct {
 18    pool sync.Pool
 19    size int
 20}
 21
 22func NewBufferPool(size int) *BufferPool {
 23    return &BufferPool{
 24        size: size,
 25        pool: sync.Pool{
 26            New: func() interface{} {
 27                return &Buffer{
 28                    data: make([]byte, size),
 29                }
 30            },
 31        },
 32    }
 33}
 34
 35func (bp *BufferPool) Get() *Buffer {
 36    return bp.pool.Get().(*Buffer)
 37}
 38
 39func (bp *BufferPool) Put(buf *Buffer) {
 40    // Clear sensitive data before returning to pool
 41    for i := range buf.data {
 42        buf.data[i] = 0
 43    }
 44    bp.pool.Put(buf)
 45}
 46
 47// Benchmark without pooling
 48func allocateWithoutPool(n int, size int) time.Duration {
 49    start := time.Now()
 50
 51    for i := 0; i < n; i++ {
 52        buf := make([]byte, size)
 53        _ = buf
 54    }
 55
 56    return time.Since(start)
 57}
 58
 59// Benchmark with pooling
 60func allocateWithPool(n int, pool *BufferPool) time.Duration {
 61    start := time.Now()
 62
 63    for i := 0; i < n; i++ {
 64        buf := pool.Get()
 65        // Use buffer
 66        _ = buf.data
 67        pool.Put(buf)
 68    }
 69
 70    return time.Since(start)
 71}
 72
 73func main() {
 74    fmt.Println("=== Memory Pooling Performance ===\n")
 75
 76    const (
 77        iterations = 10000
 78        bufferSize = 64 * 1024 // 64KB buffers
 79    )
 80
 81    // Force GC before benchmarks
 82    runtime.GC()
 83
 84    // Measure memory stats before
 85    var m1 runtime.MemStats
 86    runtime.ReadMemStats(&m1)
 87
 88    // Without pooling
 89    fmt.Println("Testing without pooling...")
 90    durationNoPool := allocateWithoutPool(iterations, bufferSize)
 91
 92    var m2 runtime.MemStats
 93    runtime.ReadMemStats(&m2)
 94
 95    fmt.Printf("  Time: %v\n", durationNoPool)
 96    fmt.Printf("  Allocations: %d\n", m2.Mallocs-m1.Mallocs)
 97    fmt.Printf("  GC cycles: %d\n", m2.NumGC-m1.NumGC)
 98
 99    // Force GC and reset
100    runtime.GC()
101    time.Sleep(100 * time.Millisecond)
102
103    var m3 runtime.MemStats
104    runtime.ReadMemStats(&m3)
105
106    // With pooling
107    fmt.Println("\nTesting with pooling...")
108    pool := NewBufferPool(bufferSize)
109    durationPool := allocateWithPool(iterations, pool)
110
111    var m4 runtime.MemStats
112    runtime.ReadMemStats(&m4)
113
114    fmt.Printf("  Time: %v\n", durationPool)
115    fmt.Printf("  Allocations: %d\n", m4.Mallocs-m3.Mallocs)
116    fmt.Printf("  GC cycles: %d\n", m4.NumGC-m3.NumGC)
117
118    // Performance improvement
119    improvement := float64(durationNoPool) / float64(durationPool)
120    fmt.Printf("\n=== Results ===\n")
121    fmt.Printf("Speedup: %.2fx faster\n", improvement)
122    fmt.Printf("Allocation reduction: %.1f%%\n",
123        (1.0-float64(m4.Mallocs-m3.Mallocs)/float64(m2.Mallocs-m1.Mallocs))*100)
124}
125// run

Zero-Allocation String Building

String concatenation can cause excessive allocations. Understanding how to build strings efficiently is crucial for high-performance systems.

 1// run
 2package main
 3
 4import (
 5    "fmt"
 6    "runtime"
 7    "strings"
 8    "time"
 9)
10
11// Measure allocations for a function
12func measureAllocs(name string, fn func()) {
13    // Force GC
14    runtime.GC()
15
16    var m1, m2 runtime.MemStats
17    runtime.ReadMemStats(&m1)
18
19    start := time.Now()
20    fn()
21    duration := time.Since(start)
22
23    runtime.ReadMemStats(&m2)
24
25    fmt.Printf("%s:\n", name)
26    fmt.Printf("  Time: %v\n", duration)
27    fmt.Printf("  Allocations: %d\n", m2.Mallocs-m1.Mallocs)
28    fmt.Printf("  Bytes allocated: %d\n", m2.TotalAlloc-m1.TotalAlloc)
29    fmt.Println()
30}
31
32func main() {
33    fmt.Println("=== Zero-Allocation String Building ===\n")
34
35    const iterations = 10000
36    parts := []string{"Hello", " ", "World", "!", " ", "Go", " ", "Runtime"}
37
38    // Method 1: Naive concatenation (worst)
39    measureAllocs("Naive concatenation", func() {
40        for i := 0; i < iterations; i++ {
41            result := ""
42            for _, part := range parts {
43                result += part
44            }
45            _ = result
46        }
47    })
48
49    // Method 2: strings.Join (better)
50    measureAllocs("strings.Join", func() {
51        for i := 0; i < iterations; i++ {
52            result := strings.Join(parts, "")
53            _ = result
54        }
55    })
56
57    // Method 3: strings.Builder (best)
58    measureAllocs("strings.Builder", func() {
59        for i := 0; i < iterations; i++ {
60            var builder strings.Builder
61            builder.Grow(50) // Pre-allocate capacity
62            for _, part := range parts {
63                builder.WriteString(part)
64            }
65            result := builder.String()
66            _ = result
67        }
68    })
69
70    // Method 4: Reused strings.Builder (optimal)
71    measureAllocs("Reused strings.Builder", func() {
72        var builder strings.Builder
73        builder.Grow(50)
74
75        for i := 0; i < iterations; i++ {
76            builder.Reset()
77            for _, part := range parts {
78                builder.WriteString(part)
79            }
80            result := builder.String()
81            _ = result
82        }
83    })
84}
85// run

Custom Memory Allocators

For specialized workloads, custom allocators can dramatically improve performance by reducing GC pressure and fragmentation.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "unsafe"
  9)
 10
 11// Arena allocator for fixed-size allocations
 12type Arena struct {
 13    blockSize int
 14    blocks    [][]byte
 15    current   int
 16    offset    int
 17    mu        sync.Mutex
 18}
 19
 20func NewArena(blockSize, initialBlocks int) *Arena {
 21    a := &Arena{
 22        blockSize: blockSize,
 23        blocks:    make([][]byte, 0, initialBlocks),
 24    }
 25
 26    // Allocate initial block
 27    a.blocks = append(a.blocks, make([]byte, blockSize))
 28
 29    return a
 30}
 31
 32func (a *Arena) Alloc(size int) []byte {
 33    a.mu.Lock()
 34    defer a.mu.Unlock()
 35
 36    // Check if allocation fits in current block
 37    if a.offset+size > a.blockSize {
 38        // Need new block
 39        a.current++
 40        a.offset = 0
 41
 42        if a.current >= len(a.blocks) {
 43            // Allocate new block
 44            a.blocks = append(a.blocks, make([]byte, a.blockSize))
 45        }
 46    }
 47
 48    // Allocate from current block
 49    block := a.blocks[a.current]
 50    ptr := block[a.offset : a.offset+size]
 51    a.offset += size
 52
 53    return ptr
 54}
 55
 56func (a *Arena) Reset() {
 57    a.mu.Lock()
 58    defer a.mu.Unlock()
 59
 60    a.current = 0
 61    a.offset = 0
 62}
 63
 64func (a *Arena) Stats() (blocks int, used int, capacity int) {
 65    a.mu.Lock()
 66    defer a.mu.Unlock()
 67
 68    blocks = len(a.blocks)
 69    used = a.current*a.blockSize + a.offset
 70    capacity = len(a.blocks) * a.blockSize
 71
 72    return
 73}
 74
 75// Slab allocator for specific object types
 76type SlabAllocator struct {
 77    objectSize int
 78    slabs      [][]byte
 79    freeList   []unsafe.Pointer
 80    mu         sync.Mutex
 81}
 82
 83func NewSlabAllocator(objectSize, objectsPerSlab int) *SlabAllocator {
 84    sa := &SlabAllocator{
 85        objectSize: objectSize,
 86        slabs:      make([][]byte, 0),
 87        freeList:   make([]unsafe.Pointer, 0),
 88    }
 89
 90    // Allocate initial slab
 91    sa.addSlab(objectsPerSlab)
 92
 93    return sa
 94}
 95
 96func (sa *SlabAllocator) addSlab(objects int) {
 97    slab := make([]byte, sa.objectSize*objects)
 98    sa.slabs = append(sa.slabs, slab)
 99
100    // Add all objects to free list
101    for i := 0; i < objects; i++ {
102        offset := i * sa.objectSize
103        ptr := unsafe.Pointer(&slab[offset])
104        sa.freeList = append(sa.freeList, ptr)
105    }
106}
107
108func (sa *SlabAllocator) Alloc() unsafe.Pointer {
109    sa.mu.Lock()
110    defer sa.mu.Unlock()
111
112    if len(sa.freeList) == 0 {
113        // Need more objects
114        sa.addSlab(100)
115    }
116
117    // Pop from free list
118    ptr := sa.freeList[len(sa.freeList)-1]
119    sa.freeList = sa.freeList[:len(sa.freeList)-1]
120
121    return ptr
122}
123
124func (sa *SlabAllocator) Free(ptr unsafe.Pointer) {
125    sa.mu.Lock()
126    defer sa.mu.Unlock()
127
128    sa.freeList = append(sa.freeList, ptr)
129}
130
131func main() {
132    fmt.Println("=== Custom Memory Allocators ===\n")
133
134    // Arena allocator demo
135    fmt.Println("Arena Allocator:")
136    arena := NewArena(4096, 2)
137
138    // Allocate various sizes
139    allocations := []int{64, 128, 256, 512, 1024}
140    for _, size := range allocations {
141        buf := arena.Alloc(size)
142        fmt.Printf("  Allocated %d bytes: %p\n", size, &buf[0])
143    }
144
145    blocks, used, capacity := arena.Stats()
146    fmt.Printf("  Stats: %d blocks, %d/%d bytes used (%.1f%%)\n",
147        blocks, used, capacity, float64(used)/float64(capacity)*100)
148
149    // Reset and reuse
150    arena.Reset()
151    blocks, used, capacity = arena.Stats()
152    fmt.Printf("  After reset: %d blocks, %d/%d bytes used\n\n",
153        blocks, used, capacity)
154
155    // Slab allocator demo
156    fmt.Println("Slab Allocator:")
157    type Object struct {
158        id   int64
159        data [56]byte // Total 64 bytes
160    }
161
162    slab := NewSlabAllocator(int(unsafe.Sizeof(Object{})), 100)
163
164    // Allocate objects
165    objects := make([]unsafe.Pointer, 5)
166    for i := range objects {
167        objects[i] = slab.Alloc()
168        obj := (*Object)(objects[i])
169        obj.id = int64(i)
170        fmt.Printf("  Allocated object %d at %p\n", i, objects[i])
171    }
172
173    // Free objects
174    for i, ptr := range objects {
175        slab.Free(ptr)
176        fmt.Printf("  Freed object %d\n", i)
177    }
178
179    // Verify reuse
180    fmt.Println("\n  Reallocating (should reuse freed memory):")
181    for i := 0; i < 3; i++ {
182        ptr := slab.Alloc()
183        fmt.Printf("  Allocated object at %p\n", ptr)
184    }
185
186    // Memory comparison
187    fmt.Println("\n=== Memory Impact ===")
188    runtime.GC()
189
190    var m1 runtime.MemStats
191    runtime.ReadMemStats(&m1)
192
193    fmt.Printf("Heap objects: %d\n", m1.HeapObjects)
194    fmt.Printf("Heap allocation: %d KB\n", m1.HeapAlloc/1024)
195    fmt.Printf("Custom allocator overhead: ~%d KB\n",
196        (arena.blockSize*len(arena.blocks)+len(slab.slabs)*len(slab.slabs[0]))/1024)
197}
198// run

Memory-Efficient Data Structures

Choosing the right data structure can significantly impact memory usage and GC behavior.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "time"
  8)
  9
 10// Memory-inefficient: slice of pointers
 11type IneffClientList struct {
 12    clients []*Client
 13}
 14
 15// Memory-efficient: slice of values
 16type EffClientList struct {
 17    clients []Client
 18}
 19
 20type Client struct {
 21    ID        int64
 22    Name      [32]byte
 23    Score     float64
 24    Timestamp int64
 25}
 26
 27func measureMemory(name string, fn func()) {
 28    runtime.GC()
 29    time.Sleep(10 * time.Millisecond)
 30
 31    var m1 runtime.MemStats
 32    runtime.ReadMemStats(&m1)
 33
 34    start := time.Now()
 35    fn()
 36    duration := time.Since(start)
 37
 38    runtime.GC()
 39    time.Sleep(10 * time.Millisecond)
 40
 41    var m2 runtime.MemStats
 42    runtime.ReadMemStats(&m2)
 43
 44    fmt.Printf("%s:\n", name)
 45    fmt.Printf("  Duration: %v\n", duration)
 46    fmt.Printf("  Memory allocated: %d KB\n", (m2.TotalAlloc-m1.TotalAlloc)/1024)
 47    fmt.Printf("  Heap objects: %d\n", m2.HeapObjects-m1.HeapObjects)
 48    fmt.Printf("  GC runs: %d\n", m2.NumGC-m1.NumGC)
 49    fmt.Println()
 50}
 51
 52func main() {
 53    fmt.Println("=== Memory-Efficient Data Structures ===\n")
 54
 55    const numClients = 100000
 56
 57    // Inefficient: slice of pointers
 58    measureMemory("Slice of pointers", func() {
 59        list := &IneffClientList{
 60            clients: make([]*Client, 0, numClients),
 61        }
 62
 63        for i := 0; i < numClients; i++ {
 64            client := &Client{
 65                ID:        int64(i),
 66                Score:     float64(i) * 1.5,
 67                Timestamp: time.Now().Unix(),
 68            }
 69            copy(client.Name[:], fmt.Sprintf("Client%d", i))
 70            list.clients = append(list.clients, client)
 71        }
 72
 73        // Keep reference to prevent premature GC
 74        _ = list
 75    })
 76
 77    // Efficient: slice of values
 78    measureMemory("Slice of values", func() {
 79        list := &EffClientList{
 80            clients: make([]Client, 0, numClients),
 81        }
 82
 83        for i := 0; i < numClients; i++ {
 84            client := Client{
 85                ID:        int64(i),
 86                Score:     float64(i) * 1.5,
 87                Timestamp: time.Now().Unix(),
 88            }
 89            copy(client.Name[:], fmt.Sprintf("Client%d", i))
 90            list.clients = append(list.clients, client)
 91        }
 92
 93        // Keep reference to prevent premature GC
 94        _ = list
 95    })
 96
 97    // Comparison
 98    fmt.Println("=== Key Insights ===")
 99    fmt.Println("Slice of values advantages:")
100    fmt.Println("  - Better cache locality (contiguous memory)")
101    fmt.Println("  - Fewer heap allocations (1 vs N+1)")
102    fmt.Println("  - Reduced GC pressure (fewer pointers to trace)")
103    fmt.Println("  - Lower memory overhead (no pointer indirection)")
104    fmt.Println("\nUse pointers only when:")
105    fmt.Println("  - Objects are large and frequently copied")
106    fmt.Println("  - Sharing/mutation semantics are required")
107    fmt.Println("  - nil values are meaningful")
108}
109// run

Goroutine Lifecycle and Optimization

Understanding the complete lifecycle of goroutines and their interaction with the scheduler enables you to write concurrent code that scales efficiently. This section covers goroutine creation, scheduling, optimization, and termination patterns.

Goroutine Creation and Startup Cost

Every goroutine has a creation cost. Understanding this cost helps you make informed decisions about goroutine usage patterns.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "time"
  9)
 10
 11// Measure goroutine creation overhead
 12func measureCreation(n int) (duration time.Duration, memBytes uint64) {
 13    runtime.GC()
 14    time.Sleep(10 * time.Millisecond)
 15
 16    var m1 runtime.MemStats
 17    runtime.ReadMemStats(&m1)
 18
 19    start := time.Now()
 20
 21    var wg sync.WaitGroup
 22    for i := 0; i < n; i++ {
 23        wg.Add(1)
 24        go func() {
 25            defer wg.Done()
 26            // Minimal work
 27            _ = i * 2
 28        }()
 29    }
 30    wg.Wait()
 31
 32    duration = time.Since(start)
 33
 34    var m2 runtime.MemStats
 35    runtime.ReadMemStats(&m2)
 36
 37    memBytes = m2.TotalAlloc - m1.TotalAlloc
 38
 39    return
 40}
 41
 42// Goroutine pool pattern
 43type WorkerPool struct {
 44    workers int
 45    jobs    chan func()
 46    wg      sync.WaitGroup
 47}
 48
 49func NewWorkerPool(workers int) *WorkerPool {
 50    p := &WorkerPool{
 51        workers: workers,
 52        jobs:    make(chan func(), workers*2),
 53    }
 54
 55    // Start workers
 56    for i := 0; i < workers; i++ {
 57        p.wg.Add(1)
 58        go p.worker()
 59    }
 60
 61    return p
 62}
 63
 64func (p *WorkerPool) worker() {
 65    defer p.wg.Done()
 66
 67    for job := range p.jobs {
 68        job()
 69    }
 70}
 71
 72func (p *WorkerPool) Submit(job func()) {
 73    p.jobs <- job
 74}
 75
 76func (p *WorkerPool) Close() {
 77    close(p.jobs)
 78    p.wg.Wait()
 79}
 80
 81// Measure pool performance
 82func measurePool(pool *WorkerPool, n int) time.Duration {
 83    start := time.Now()
 84
 85    var wg sync.WaitGroup
 86    for i := 0; i < n; i++ {
 87        wg.Add(1)
 88        pool.Submit(func() {
 89            defer wg.Done()
 90            _ = i * 2
 91        })
 92    }
 93    wg.Wait()
 94
 95    return time.Since(start)
 96}
 97
 98func main() {
 99    fmt.Println("=== Goroutine Creation and Optimization ===\n")
100
101    // Test different scales
102    scales := []int{100, 1000, 10000}
103
104    for _, n := range scales {
105        duration, memBytes := measureCreation(n)
106
107        fmt.Printf("Creating %d goroutines:\n", n)
108        fmt.Printf("  Total time: %v\n", duration)
109        fmt.Printf("  Time per goroutine: %v\n", duration/time.Duration(n))
110        fmt.Printf("  Memory: %d KB (%.2f KB per goroutine)\n",
111            memBytes/1024, float64(memBytes)/float64(n)/1024)
112        fmt.Printf("  Throughput: %.0f goroutines/sec\n\n",
113            float64(n)/duration.Seconds())
114    }
115
116    // Compare with worker pool
117    fmt.Println("=== Worker Pool Comparison ===\n")
118
119    const jobs = 10000
120    numWorkers := []int{10, 50, 100, 500}
121
122    for _, workers := range numWorkers {
123        pool := NewWorkerPool(workers)
124        duration := measurePool(pool, jobs)
125        pool.Close()
126
127        fmt.Printf("%d workers processing %d jobs:\n", workers, jobs)
128        fmt.Printf("  Duration: %v\n", duration)
129        fmt.Printf("  Throughput: %.0f jobs/sec\n\n",
130            float64(jobs)/duration.Seconds())
131    }
132
133    // Recommendations
134    fmt.Println("=== Optimization Guidelines ===")
135    fmt.Println("Goroutine creation cost: ~2KB stack + scheduling overhead")
136    fmt.Println("\nUse goroutines per request when:")
137    fmt.Println("  - Jobs are long-running (>1ms)")
138    fmt.Println("  - Request rate is manageable (<10k/sec)")
139    fmt.Println("  - Simplicity is more important than efficiency")
140    fmt.Println("\nUse worker pools when:")
141    fmt.Println("  - Jobs are short (<1ms)")
142    fmt.Println("  - High request rates (>10k/sec)")
143    fmt.Println("  - Resource limits are important")
144    fmt.Println("  - Predictable resource usage is required")
145}
146// run

Goroutine Scheduling Patterns

Understanding how goroutines are scheduled helps you write code that cooperates well with the runtime scheduler.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "sync/atomic"
  9    "time"
 10)
 11
 12// Cooperative scheduling example
 13func cooperativeWork(id int, counter *int64) {
 14    for i := 0; i < 1000; i++ {
 15        atomic.AddInt64(counter, 1)
 16
 17        // Yield to scheduler periodically
 18        if i%100 == 0 {
 19            runtime.Gosched()
 20        }
 21    }
 22}
 23
 24// CPU-bound work without yielding
 25func cpuBoundWork(id int, counter *int64) {
 26    for i := 0; i < 1000; i++ {
 27        atomic.AddInt64(counter, 1)
 28        // No yielding - can starve other goroutines
 29    }
 30}
 31
 32// Measure scheduling fairness
 33func measureFairness(name string, workFn func(int, *int64), numGoroutines int) {
 34    fmt.Printf("\n%s (%d goroutines):\n", name, numGoroutines)
 35
 36    var counter int64
 37    var wg sync.WaitGroup
 38
 39    start := time.Now()
 40
 41    for i := 0; i < numGoroutines; i++ {
 42        wg.Add(1)
 43        go func(id int) {
 44            defer wg.Done()
 45            workFn(id, &counter)
 46        }(i)
 47    }
 48
 49    wg.Wait()
 50    duration := time.Since(start)
 51
 52    fmt.Printf("  Duration: %v\n", duration)
 53    fmt.Printf("  Total operations: %d\n", counter)
 54    fmt.Printf("  Throughput: %.0f ops/sec\n", float64(counter)/duration.Seconds())
 55}
 56
 57// Goroutine state tracking
 58type GoroutineTracker struct {
 59    states map[string]int64
 60    mu     sync.Mutex
 61}
 62
 63func NewGoroutineTracker() *GoroutineTracker {
 64    return &GoroutineTracker{
 65        states: make(map[string]int64),
 66    }
 67}
 68
 69func (gt *GoroutineTracker) Track(state string) {
 70    gt.mu.Lock()
 71    gt.states[state]++
 72    gt.mu.Unlock()
 73}
 74
 75func (gt *GoroutineTracker) Report() {
 76    gt.mu.Lock()
 77    defer gt.mu.Unlock()
 78
 79    fmt.Println("\nGoroutine State Distribution:")
 80    for state, count := range gt.states {
 81        fmt.Printf("  %s: %d\n", state, count)
 82    }
 83}
 84
 85// Simulate different goroutine states
 86func simulateStates(tracker *GoroutineTracker, wg *sync.WaitGroup) {
 87    defer wg.Done()
 88
 89    // Running
 90    tracker.Track("running")
 91    for i := 0; i < 1000; i++ {
 92        _ = i * i
 93    }
 94
 95    // Syscall (simulated with sleep)
 96    tracker.Track("syscall")
 97    time.Sleep(time.Millisecond)
 98
 99    // Waiting on channel
100    tracker.Track("waiting")
101    ch := make(chan int, 1)
102    ch <- 1
103    <-ch
104
105    // Runnable (yield)
106    tracker.Track("runnable")
107    runtime.Gosched()
108}
109
110func main() {
111    fmt.Println("=== Goroutine Scheduling Patterns ===")
112
113    // Set GOMAXPROCS to control parallelism
114    numCPU := runtime.NumCPU()
115    runtime.GOMAXPROCS(numCPU)
116    fmt.Printf("Running on %d CPUs\n", numCPU)
117
118    // Test cooperative vs non-cooperative scheduling
119    measureFairness("Cooperative scheduling",
120        cooperativeWork, 100)
121
122    measureFairness("CPU-bound scheduling",
123        cpuBoundWork, 100)
124
125    // Track goroutine states
126    fmt.Println("\n=== Goroutine State Tracking ===")
127
128    tracker := NewGoroutineTracker()
129    var wg sync.WaitGroup
130
131    for i := 0; i < 50; i++ {
132        wg.Add(1)
133        go simulateStates(tracker, &wg)
134    }
135
136    wg.Wait()
137    tracker.Report()
138
139    // Scheduler insights
140    fmt.Println("\n=== Scheduler Insights ===")
141    fmt.Println("Goroutine states:")
142    fmt.Println("  Running: Executing on a CPU")
143    fmt.Println("  Runnable: Waiting for CPU time")
144    fmt.Println("  Waiting: Blocked on I/O, channel, or sync")
145    fmt.Println("  Syscall: In system call")
146    fmt.Println("\nPreemption:")
147    fmt.Println("  Go 1.14+: Non-cooperative preemption via signals")
148    fmt.Println("  Goroutines can be preempted even in tight loops")
149    fmt.Println("\nBest Practices:")
150    fmt.Println("  - Avoid tight CPU loops without I/O")
151    fmt.Println("  - Use runtime.Gosched() for long computations")
152    fmt.Println("  - Channel operations naturally yield")
153    fmt.Println("  - System calls release P to other goroutines")
154}
155// run

Goroutine Leak Detection and Prevention

Goroutine leaks are a common source of production issues. This section covers detection and prevention strategies.

  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "time"
 10)
 11
 12// Leak detector tracks goroutine counts
 13type LeakDetector struct {
 14    baseline int
 15    mu       sync.Mutex
 16}
 17
 18func NewLeakDetector() *LeakDetector {
 19    runtime.GC()
 20    time.Sleep(10 * time.Millisecond)
 21
 22    return &LeakDetector{
 23        baseline: runtime.NumGoroutine(),
 24    }
 25}
 26
 27func (ld *LeakDetector) Check() (leaked int, ok bool) {
 28    ld.mu.Lock()
 29    defer ld.mu.Unlock()
 30
 31    runtime.GC()
 32    time.Sleep(10 * time.Millisecond)
 33
 34    current := runtime.NumGoroutine()
 35    leaked = current - ld.baseline
 36    ok = leaked == 0
 37
 38    return
 39}
 40
 41// Example 1: Leaked goroutine (missing context)
 42func leakyWorker(job <-chan int) {
 43    for {
 44        select {
 45        case n := <-job:
 46            _ = n * 2
 47        // Missing case for cancellation!
 48        }
 49    }
 50}
 51
 52// Example 2: Fixed with context
 53func fixedWorker(ctx context.Context, job <-chan int) {
 54    for {
 55        select {
 56        case <-ctx.Done():
 57            return
 58        case n := <-job:
 59            _ = n * 2
 60        }
 61    }
 62}
 63
 64// Example 3: Leaked goroutine (unbuffered channel)
 65func leakyProcessor() error {
 66    result := make(chan int) // Unbuffered!
 67
 68    go func() {
 69        // If caller doesn't read, this goroutine leaks
 70        result <- computeExpensiveValue()
 71    }()
 72
 73    // Simulate early return
 74    if time.Now().Unix()%2 == 0 {
 75        return fmt.Errorf("random error")
 76    }
 77
 78    return nil
 79}
 80
 81// Example 4: Fixed with buffered channel or context
 82func fixedProcessor(ctx context.Context) error {
 83    result := make(chan int, 1) // Buffered!
 84
 85    go func() {
 86        select {
 87        case <-ctx.Done():
 88            return
 89        case result <- computeExpensiveValue():
 90        }
 91    }()
 92
 93    // Simulate early return
 94    if time.Now().Unix()%2 == 0 {
 95        return fmt.Errorf("random error")
 96    }
 97
 98    select {
 99    case <-ctx.Done():
100        return ctx.Err()
101    case <-result:
102        return nil
103    }
104}
105
106func computeExpensiveValue() int {
107    time.Sleep(time.Millisecond)
108    return 42
109}
110
111// Production-ready goroutine manager
112type GoroutineManager struct {
113    ctx    context.Context
114    cancel context.CancelFunc
115    wg     sync.WaitGroup
116}
117
118func NewGoroutineManager() *GoroutineManager {
119    ctx, cancel := context.WithCancel(context.Background())
120
121    return &GoroutineManager{
122        ctx:    ctx,
123        cancel: cancel,
124    }
125}
126
127func (gm *GoroutineManager) Go(fn func(context.Context)) {
128    gm.wg.Add(1)
129    go func() {
130        defer gm.wg.Done()
131        fn(gm.ctx)
132    }()
133}
134
135func (gm *GoroutineManager) Shutdown(timeout time.Duration) error {
136    // Signal cancellation
137    gm.cancel()
138
139    // Wait with timeout
140    done := make(chan struct{})
141    go func() {
142        gm.wg.Wait()
143        close(done)
144    }()
145
146    select {
147    case <-done:
148        return nil
149    case <-time.After(timeout):
150        return fmt.Errorf("shutdown timeout exceeded")
151    }
152}
153
154func main() {
155    fmt.Println("=== Goroutine Leak Detection ===\n")
156
157    // Test 1: Detect leaky worker
158    fmt.Println("Test 1: Leaky worker")
159    detector1 := NewLeakDetector()
160
161    jobs := make(chan int)
162    go leakyWorker(jobs)
163    jobs <- 1
164
165    time.Sleep(100 * time.Millisecond)
166
167    leaked, ok := detector1.Check()
168    fmt.Printf("  Goroutines leaked: %d (ok=%v)\n\n", leaked, ok)
169
170    // Test 2: Fixed worker
171    fmt.Println("Test 2: Fixed worker with context")
172    detector2 := NewLeakDetector()
173
174    ctx, cancel := context.WithCancel(context.Background())
175    jobs2 := make(chan int)
176    go fixedWorker(ctx, jobs2)
177    jobs2 <- 1
178
179    time.Sleep(100 * time.Millisecond)
180    cancel()
181    time.Sleep(100 * time.Millisecond)
182
183    leaked, ok = detector2.Check()
184    fmt.Printf("  Goroutines leaked: %d (ok=%v)\n\n", leaked, ok)
185
186    // Test 3: Goroutine manager
187    fmt.Println("Test 3: Production goroutine manager")
188    detector3 := NewLeakDetector()
189
190    manager := NewGoroutineManager()
191
192    // Launch managed goroutines
193    for i := 0; i < 5; i++ {
194        manager.Go(func(ctx context.Context) {
195            ticker := time.NewTicker(100 * time.Millisecond)
196            defer ticker.Stop()
197
198            for {
199                select {
200                case <-ctx.Done():
201                    return
202                case <-ticker.C:
203                    // Do work
204                }
205            }
206        })
207    }
208
209    time.Sleep(500 * time.Millisecond)
210
211    // Graceful shutdown
212    if err := manager.Shutdown(time.Second); err != nil {
213        fmt.Printf("  Shutdown error: %v\n", err)
214    } else {
215        fmt.Println("  Shutdown successful")
216    }
217
218    leaked, ok = detector3.Check()
219    fmt.Printf("  Goroutines leaked: %d (ok=%v)\n\n", leaked, ok)
220
221    // Best practices
222    fmt.Println("=== Leak Prevention Best Practices ===")
223    fmt.Println("1. Always provide cancellation mechanism (context)")
224    fmt.Println("2. Use buffered channels when sender shouldn't block")
225    fmt.Println("3. Ensure all goroutine exit paths are covered")
226    fmt.Println("4. Implement graceful shutdown with timeouts")
227    fmt.Println("5. Monitor goroutine counts in production")
228    fmt.Println("6. Use goroutine managers for complex systems")
229    fmt.Println("7. Test with leak detectors in integration tests")
230}
231// run

Runtime Debugging Techniques

Mastering runtime debugging techniques enables you to diagnose and fix complex production issues quickly. This section covers advanced debugging tools and methodologies.

Stack Trace Analysis

Stack traces are your first tool for understanding goroutine behavior and debugging deadlocks or panics.

  1// run
  2package main
  3
  4import (
  5    "bytes"
  6    "fmt"
  7    "runtime"
  8    "runtime/debug"
  9    "strings"
 10    "sync"
 11    "time"
 12)
 13
 14// Capture and analyze stack traces
 15type StackAnalyzer struct {
 16    traces map[string]int
 17    mu     sync.Mutex
 18}
 19
 20func NewStackAnalyzer() *StackAnalyzer {
 21    return &StackAnalyzer{
 22        traces: make(map[string]int),
 23    }
 24}
 25
 26func (sa *StackAnalyzer) Capture() {
 27    // Get stack trace for current goroutine
 28    buf := make([]byte, 4096)
 29    n := runtime.Stack(buf, false)
 30    stack := string(buf[:n])
 31
 32    sa.mu.Lock()
 33    sa.traces[stack]++
 34    sa.mu.Unlock()
 35}
 36
 37func (sa *StackAnalyzer) CaptureAll() {
 38    // Get stacks for all goroutines
 39    buf := make([]byte, 1024*1024) // 1MB buffer
 40    n := runtime.Stack(buf, true)
 41
 42    sa.mu.Lock()
 43    sa.traces["all-goroutines"] = 1
 44    sa.mu.Unlock()
 45
 46    // Parse and count unique stacks
 47    stacks := string(buf[:n])
 48    sa.parseStacks(stacks)
 49}
 50
 51func (sa *StackAnalyzer) parseStacks(allStacks string) {
 52    // Split by goroutine boundaries
 53    goroutines := strings.Split(allStacks, "\n\n")
 54
 55    for _, g := range goroutines {
 56        if len(g) == 0 {
 57            continue
 58        }
 59
 60        // Extract the stack signature (function names)
 61        lines := strings.Split(g, "\n")
 62        signature := ""
 63        for _, line := range lines {
 64            if strings.Contains(line, "(") && strings.Contains(line, ")") {
 65                signature += line + "\n"
 66            }
 67        }
 68
 69        sa.mu.Lock()
 70        sa.traces[signature]++
 71        sa.mu.Unlock()
 72    }
 73}
 74
 75func (sa *StackAnalyzer) Report() {
 76    sa.mu.Lock()
 77    defer sa.mu.Unlock()
 78
 79    fmt.Println("=== Stack Trace Analysis ===")
 80
 81    for stack, count := range sa.traces {
 82        if count > 1 {
 83            fmt.Printf("\nFound %d goroutines with similar stack:\n", count)
 84            fmt.Println(stack)
 85        }
 86    }
 87}
 88
 89// Debug helper: print goroutine info
 90func printGoroutineInfo() {
 91    fmt.Println("=== Goroutine Information ===")
 92    fmt.Printf("Total goroutines: %d\n\n", runtime.NumGoroutine())
 93
 94    // Get stack traces
 95    buf := make([]byte, 1024*1024)
 96    n := runtime.Stack(buf, true)
 97
 98    fmt.Println("Stack traces:")
 99    fmt.Println(string(buf[:n]))
100}
101
102// Deadlock detector
103type DeadlockDetector struct {
104    timeout time.Duration
105}
106
107func NewDeadlockDetector(timeout time.Duration) *DeadlockDetector {
108    return &DeadlockDetector{timeout: timeout}
109}
110
111func (dd *DeadlockDetector) Watch(name string, fn func()) error {
112    done := make(chan struct{})
113
114    go func() {
115        fn()
116        close(done)
117    }()
118
119    select {
120    case <-done:
121        return nil
122    case <-time.After(dd.timeout):
123        // Potential deadlock detected
124        fmt.Printf("\n⚠️  Potential deadlock in %s\n", name)
125        fmt.Println("Current goroutines:")
126
127        buf := make([]byte, 1024*1024)
128        n := runtime.Stack(buf, true)
129        fmt.Println(string(buf[:n]))
130
131        return fmt.Errorf("operation timeout: potential deadlock")
132    }
133}
134
135// Panic recovery with stack trace
136func recoverWithStack(name string) {
137    if r := recover(); r != nil {
138        fmt.Printf("\n⚠️  Panic in %s: %v\n", name, r)
139        fmt.Println("Stack trace:")
140        debug.PrintStack()
141    }
142}
143
144func main() {
145    fmt.Println("=== Runtime Debugging Techniques ===\n")
146
147    // Example 1: Stack trace analysis
148    fmt.Println("Example 1: Stack Trace Analysis")
149    analyzer := NewStackAnalyzer()
150
151    // Create multiple goroutines with similar stacks
152    var wg sync.WaitGroup
153    for i := 0; i < 5; i++ {
154        wg.Add(1)
155        go func() {
156            defer wg.Done()
157            analyzer.Capture()
158            time.Sleep(100 * time.Millisecond)
159        }()
160    }
161    wg.Wait()
162
163    analyzer.CaptureAll()
164    analyzer.Report()
165
166    // Example 2: Deadlock detection
167    fmt.Println("\nExample 2: Deadlock Detection")
168    detector := NewDeadlockDetector(2 * time.Second)
169
170    // Non-blocking operation
171    err := detector.Watch("quick-operation", func() {
172        time.Sleep(100 * time.Millisecond)
173    })
174
175    if err != nil {
176        fmt.Printf("Error: %v\n", err)
177    } else {
178        fmt.Println("Operation completed successfully")
179    }
180
181    // Example 3: Panic recovery
182    fmt.Println("\nExample 3: Panic Recovery with Stack Trace")
183
184    func() {
185        defer recoverWithStack("example-function")
186
187        // Simulate panic
188        var buf []byte
189        _ = buf[100] // Index out of range
190    }()
191
192    fmt.Println("\nProgram continues after panic recovery")
193
194    // Example 4: Custom debug info
195    fmt.Println("\nExample 4: Custom Debug Information")
196    debugInfo := collectDebugInfo()
197    fmt.Println(debugInfo)
198}
199
200// Collect comprehensive debug information
201func collectDebugInfo() string {
202    var buf bytes.Buffer
203
204    buf.WriteString("=== Debug Information ===\n\n")
205
206    // Runtime info
207    buf.WriteString("Runtime:\n")
208    buf.WriteString(fmt.Sprintf("  Go version: %s\n", runtime.Version()))
209    buf.WriteString(fmt.Sprintf("  OS/Arch: %s/%s\n", runtime.GOOS, runtime.GOARCH))
210    buf.WriteString(fmt.Sprintf("  CPUs: %d\n", runtime.NumCPU()))
211    buf.WriteString(fmt.Sprintf("  GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0)))
212    buf.WriteString(fmt.Sprintf("  Goroutines: %d\n\n", runtime.NumGoroutine()))
213
214    // Memory stats
215    var m runtime.MemStats
216    runtime.ReadMemStats(&m)
217
218    buf.WriteString("Memory:\n")
219    buf.WriteString(fmt.Sprintf("  Alloc: %d MB\n", m.Alloc/1024/1024))
220    buf.WriteString(fmt.Sprintf("  TotalAlloc: %d MB\n", m.TotalAlloc/1024/1024))
221    buf.WriteString(fmt.Sprintf("  Sys: %d MB\n", m.Sys/1024/1024))
222    buf.WriteString(fmt.Sprintf("  NumGC: %d\n\n", m.NumGC))
223
224    // Build info
225    if info, ok := debug.ReadBuildInfo(); ok {
226        buf.WriteString("Build:\n")
227        buf.WriteString(fmt.Sprintf("  Path: %s\n", info.Path))
228        buf.WriteString(fmt.Sprintf("  Main: %s %s\n", info.Main.Path, info.Main.Version))
229    }
230
231    return buf.String()
232}
233// run

Runtime Metrics Dashboard

Building a real-time metrics dashboard helps you understand runtime behavior in production.

  1// run
  2package main
  3
  4import (
  5    "fmt"
  6    "runtime"
  7    "sync"
  8    "time"
  9)
 10
 11// Metrics collector
 12type RuntimeMetrics struct {
 13    timestamp     time.Time
 14    goroutines    int
 15    cgoCalls      int64
 16    memAlloc      uint64
 17    memTotal      uint64
 18    memSys        uint64
 19    numGC         uint32
 20    gcPauseTotal  uint64
 21    gcPauseLast   uint64
 22}
 23
 24// Metrics dashboard
 25type MetricsDashboard struct {
 26    metrics []RuntimeMetrics
 27    mu      sync.RWMutex
 28    maxSize int
 29}
 30
 31func NewMetricsDashboard(maxSize int) *MetricsDashboard {
 32    return &MetricsDashboard{
 33        metrics: make([]RuntimeMetrics, 0, maxSize),
 34        maxSize: maxSize,
 35    }
 36}
 37
 38func (md *MetricsDashboard) Collect() {
 39    var m runtime.MemStats
 40    runtime.ReadMemStats(&m)
 41
 42    metric := RuntimeMetrics{
 43        timestamp:    time.Now(),
 44        goroutines:   runtime.NumGoroutine(),
 45        cgoCalls:     runtime.NumCgoCall(),
 46        memAlloc:     m.Alloc,
 47        memTotal:     m.TotalAlloc,
 48        memSys:       m.Sys,
 49        numGC:        m.NumGC,
 50        gcPauseTotal: m.PauseTotalNs,
 51    }
 52
 53    if m.NumGC > 0 {
 54        metric.gcPauseLast = m.PauseNs[(m.NumGC+255)%256]
 55    }
 56
 57    md.mu.Lock()
 58    md.metrics = append(md.metrics, metric)
 59    if len(md.metrics) > md.maxSize {
 60        md.metrics = md.metrics[1:]
 61    }
 62    md.mu.Unlock()
 63}
 64
 65func (md *MetricsDashboard) Display() {
 66    md.mu.RLock()
 67    defer md.mu.RUnlock()
 68
 69    if len(md.metrics) == 0 {
 70        fmt.Println("No metrics collected yet")
 71        return
 72    }
 73
 74    latest := md.metrics[len(md.metrics)-1]
 75
 76    fmt.Println("\n╔════════════════════════════════════════════════════╗")
 77    fmt.Println("β•‘          Runtime Metrics Dashboard                β•‘")
 78    fmt.Println("╠════════════════════════════════════════════════════╣")
 79    fmt.Printf("β•‘ Time: %-43s β•‘\n", latest.timestamp.Format("2006-01-02 15:04:05"))
 80    fmt.Println("╠════════════════════════════════════════════════════╣")
 81    fmt.Printf("β•‘ Goroutines:        %7d                        β•‘\n", latest.goroutines)
 82    fmt.Printf("β•‘ CGo Calls:         %7d                        β•‘\n", latest.cgoCalls)
 83    fmt.Println("╠════════════════════════════════════════════════════╣")
 84    fmt.Printf("β•‘ Memory Alloc:      %7d MB                     β•‘\n", latest.memAlloc/1024/1024)
 85    fmt.Printf("β•‘ Memory Total:      %7d MB                     β•‘\n", latest.memTotal/1024/1024)
 86    fmt.Printf("β•‘ Memory Sys:        %7d MB                     β•‘\n", latest.memSys/1024/1024)
 87    fmt.Println("╠════════════════════════════════════════════════════╣")
 88    fmt.Printf("β•‘ GC Cycles:         %7d                        β•‘\n", latest.numGC)
 89    fmt.Printf("β•‘ GC Pause Total:    %7.2f ms                   β•‘\n",
 90        float64(latest.gcPauseTotal)/1e6)
 91    fmt.Printf("β•‘ GC Pause Last:     %7.2f ms                   β•‘\n",
 92        float64(latest.gcPauseLast)/1e6)
 93    fmt.Println("β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•")
 94
 95    // Show trends if we have enough data
 96    if len(md.metrics) >= 2 {
 97        md.displayTrends()
 98    }
 99}
100
101func (md *MetricsDashboard) displayTrends() {
102    first := md.metrics[0]
103    latest := md.metrics[len(md.metrics)-1]
104
105    fmt.Println("\n=== Trends (since monitoring started) ===")
106
107    goroutineDelta := latest.goroutines - first.goroutines
108    fmt.Printf("Goroutines: %+d (%s)\n",
109        goroutineDelta, trendSymbol(goroutineDelta))
110
111    memDelta := int64(latest.memAlloc) - int64(first.memAlloc)
112    fmt.Printf("Memory: %+d MB (%s)\n",
113        memDelta/1024/1024, trendSymbol(int(memDelta)))
114
115    gcDelta := int(latest.numGC) - int(first.numGC)
116    fmt.Printf("GC Cycles: %+d (%s)\n",
117        gcDelta, trendSymbol(gcDelta))
118}
119
120func trendSymbol(delta int) string {
121    if delta > 0 {
122        return "↑"
123    } else if delta < 0 {
124        return "↓"
125    }
126    return "β†’"
127}
128
129// Simulate application workload
130func simulateWorkload(duration time.Duration) {
131    var wg sync.WaitGroup
132    ctx := time.After(duration)
133
134    for {
135        select {
136        case <-ctx:
137            return
138        default:
139            // Spawn goroutines
140            for i := 0; i < 10; i++ {
141                wg.Add(1)
142                go func() {
143                    defer wg.Done()
144
145                    // Allocate memory
146                    data := make([]byte, 1024*100) // 100KB
147
148                    // Do some work
149                    for j := range data {
150                        data[j] = byte(j % 256)
151                    }
152
153                    time.Sleep(time.Millisecond * 10)
154                }()
155            }
156
157            time.Sleep(time.Millisecond * 100)
158        }
159    }
160}
161
162func main() {
163    fmt.Println("=== Runtime Metrics Dashboard ===")
164    fmt.Println("Starting real-time monitoring...")
165
166    dashboard := NewMetricsDashboard(100)
167
168    // Start metrics collection
169    done := make(chan struct{})
170    go func() {
171        ticker := time.NewTicker(time.Second)
172        defer ticker.Stop()
173
174        for {
175            select {
176            case <-done:
177                return
178            case <-ticker.C:
179                dashboard.Collect()
180                dashboard.Display()
181            }
182        }
183    }()
184
185    // Run workload
186    fmt.Println("\nRunning workload simulation...")
187    simulateWorkload(5 * time.Second)
188
189    // Final collection and display
190    time.Sleep(time.Second)
191    dashboard.Collect()
192    dashboard.Display()
193
194    close(done)
195
196    fmt.Println("\n=== Monitoring Complete ===")
197}
198// run

Practice Exercises

Exercise 1: Scheduler Analysis

🎯 Learning Objectives:

  • Understand Go's GMP scheduler behavior and performance characteristics
  • Master runtime metrics collection and analysis
  • Learn to optimize GOMAXPROCS for different workload types
  • Practice systematic performance measurement and comparison

🌍 Real-World Context:
Understanding scheduler behavior is crucial for high-performance Go services. At Google, optimizing GOMAXPROCS for different workload types improved their ad serving system throughput by 35%. Netflix uses scheduler analysis to determine optimal resource allocation for their video processing pipelines, while Cloudflare tunes their CDN edge servers for mixed HTTP/TCP workloads.

⏱️ Time Estimate: 75-90 minutes
πŸ“Š Difficulty: Advanced

Create a program that creates goroutines with different characteristics, monitors scheduler behavior using runtime metrics, experiments with various GOMAXPROCS values, and reports comprehensive analysis of goroutine distribution and execution times. This will help you understand how to optimize Go applications for different workload patterns.

Solution
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13type WorkloadType int
 14
 15const (
 16    CPUBound WorkloadType = iota
 17    IOBound
 18    Mixed
 19)
 20
 21type WorkloadStats struct {
 22    completed    atomic.Int64
 23    totalTime    atomic.Int64
 24    workloadType WorkloadType
 25}
 26
 27func recordCompletion(duration time.Duration) {
 28    ws.completed.Add(1)
 29    ws.totalTime.Add(int64(duration))
 30}
 31
 32func avgDuration() time.Duration {
 33    completed := ws.completed.Load()
 34    if completed == 0 {
 35        return 0
 36    }
 37    return time.Duration(ws.totalTime.Load() / completed)
 38}
 39
 40type SchedulerAnalyzer struct {
 41    cpuStats   *WorkloadStats
 42    ioStats    *WorkloadStats
 43    mixedStats *WorkloadStats
 44}
 45
 46func NewSchedulerAnalyzer() *SchedulerAnalyzer {
 47    return &SchedulerAnalyzer{
 48        cpuStats:   &WorkloadStats{workloadType: CPUBound},
 49        ioStats:    &WorkloadStats{workloadType: IOBound},
 50        mixedStats: &WorkloadStats{workloadType: Mixed},
 51    }
 52}
 53
 54func runCPUBound(ctx context.Context) {
 55    start := time.Now()
 56    defer sa.cpuStats.recordCompletion(time.Since(start))
 57
 58    sum := 0
 59    for i := 0; i < 1000000; i++ {
 60        sum += i * i
 61
 62        select {
 63        case <-ctx.Done():
 64            return
 65        default:
 66        }
 67    }
 68}
 69
 70func runIOBound(ctx context.Context) {
 71    start := time.Now()
 72    defer sa.ioStats.recordCompletion(time.Since(start))
 73
 74    select {
 75    case <-time.After(10 * time.Millisecond):
 76    case <-ctx.Done():
 77    }
 78}
 79
 80func runMixed(ctx context.Context) {
 81    start := time.Now()
 82    defer sa.mixedStats.recordCompletion(time.Since(start))
 83
 84    // Some CPU work
 85    sum := 0
 86    for i := 0; i < 100000; i++ {
 87        sum += i
 88    }
 89
 90    // Some I/O wait
 91    select {
 92    case <-time.After(5 * time.Millisecond):
 93    case <-ctx.Done():
 94    }
 95}
 96
 97func runWorkload(ctx context.Context, workloadType WorkloadType, count int) {
 98    var wg sync.WaitGroup
 99
100    for i := 0; i < count; i++ {
101        wg.Add(1)
102        go func() {
103            defer wg.Done()
104
105            switch workloadType {
106            case CPUBound:
107                sa.runCPUBound(ctx)
108            case IOBound:
109                sa.runIOBound(ctx)
110            case Mixed:
111                sa.runMixed(ctx)
112            }
113        }()
114    }
115
116    wg.Wait()
117}
118
119func analyze(procs int) {
120    runtime.GOMAXPROCS(procs)
121
122    fmt.Printf("\n=== Analysis with GOMAXPROCS=%d ===\n", procs)
123
124    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
125    defer cancel()
126
127    start := time.Now()
128
129    // Run mixed workload
130    var wg sync.WaitGroup
131
132    wg.Add(3)
133    go func() {
134        defer wg.Done()
135        sa.runWorkload(ctx, CPUBound, 50)
136    }()
137    go func() {
138        defer wg.Done()
139        sa.runWorkload(ctx, IOBound, 50)
140    }()
141    go func() {
142        defer wg.Done()
143        sa.runWorkload(ctx, Mixed, 50)
144    }()
145
146    wg.Wait()
147    totalTime := time.Since(start)
148
149    fmt.Printf("Total execution time: %v\n", totalTime)
150    fmt.Printf("CPU-bound tasks: %d\n",
151        sa.cpuStats.completed.Load(), sa.cpuStats.avgDuration())
152    fmt.Printf("I/O-bound tasks: %d\n",
153        sa.ioStats.completed.Load(), sa.ioStats.avgDuration())
154    fmt.Printf("Mixed tasks: %d\n",
155        sa.mixedStats.completed.Load(), sa.mixedStats.avgDuration())
156}
157
158func main() {
159    fmt.Println("=== Scheduler Analysis ===")
160    fmt.Printf("System CPUs: %d\n", runtime.NumCPU())
161
162    // Test with different GOMAXPROCS values
163    for _, procs := range []int{1, 2, 4, runtime.NumCPU()} {
164        analyzer := NewSchedulerAnalyzer()
165        analyzer.analyze(procs)
166    }
167}
168// run

Exercise 2: GC Optimization

🎯 Learning Objectives:

  • Master Go's garbage collection tuning and optimization techniques
  • Learn to implement memory-efficient data structures with object pooling
  • Understand GC pressure patterns and how to mitigate them
  • Practice memory performance monitoring and analysis

🌍 Real-World Context:
Garbage collection optimization is critical for high-throughput services. At Uber, GC optimization reduced API latency by 60% during peak traffic. Discord uses sophisticated object pooling to handle millions of concurrent chat connections without GC-induced latency spikes. Facebook's graph service team reduced memory usage by 40% through careful GC tuning and allocation patterns.

⏱️ Time Estimate: 90-120 minutes
πŸ“Š Difficulty: Advanced

Build a high-performance cache system that continuously monitors GC behavior, implements multiple strategies to reduce GC pressure, uses sync.Pool effectively for frequently allocated objects, and provides comprehensive memory and GC statistics for analysis and optimization.

Solution
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "runtime/debug"
  9    "sync"
 10    "time"
 11)
 12
 13type CacheEntry struct {
 14    key        string
 15    value      []byte
 16    expiration time.Time
 17}
 18
 19var entryPool = sync.Pool{
 20    New: func() interface{} {
 21        return &CacheEntry{
 22            value: make([]byte, 0, 4096),
 23        }
 24    },
 25}
 26
 27type GCOptimizedCache struct {
 28    mu      sync.RWMutex
 29    entries map[string]*CacheEntry
 30
 31    hits   uint64
 32    misses uint64
 33    evictions uint64
 34}
 35
 36func NewGCOptimizedCache() *GCOptimizedCache {
 37    cache := &GCOptimizedCache{
 38        entries: make(map[string]*CacheEntry),
 39    }
 40
 41    go cache.cleanup()
 42
 43    return cache
 44}
 45
 46func Set(key string, value []byte, ttl time.Duration) {
 47    entry := entryPool.Get().(*CacheEntry)
 48    entry.key = key
 49    entry.value = append(entry.value[:0], value...)
 50    entry.expiration = time.Now().Add(ttl)
 51
 52    c.mu.Lock()
 53    if old, exists := c.entries[key]; exists {
 54        c.returnEntry(old)
 55    }
 56    c.entries[key] = entry
 57    c.mu.Unlock()
 58}
 59
 60func Get(key string) {
 61    c.mu.RLock()
 62    entry, ok := c.entries[key]
 63    c.mu.RUnlock()
 64
 65    if !ok {
 66        c.misses++
 67        return nil, false
 68    }
 69
 70    if time.Now().After(entry.expiration) {
 71        c.Delete(key)
 72        c.misses++
 73        return nil, false
 74    }
 75
 76    c.hits++
 77
 78    // Return copy to prevent modification
 79    result := make([]byte, len(entry.value))
 80    copy(result, entry.value)
 81    return result, true
 82}
 83
 84func Delete(key string) {
 85    c.mu.Lock()
 86    if entry, ok := c.entries[key]; ok {
 87        delete(c.entries, key)
 88        c.returnEntry(entry)
 89        c.evictions++
 90    }
 91    c.mu.Unlock()
 92}
 93
 94func returnEntry(entry *CacheEntry) {
 95    entry.key = ""
 96    entry.value = entry.value[:0]
 97    entryPool.Put(entry)
 98}
 99
100func cleanup() {
101    ticker := time.NewTicker(time.Minute)
102    defer ticker.Stop()
103
104    for range ticker.C {
105        c.mu.Lock()
106        now := time.Now()
107
108        for key, entry := range c.entries {
109            if now.After(entry.expiration) {
110                delete(c.entries, key)
111                c.returnEntry(entry)
112                c.evictions++
113            }
114        }
115
116        c.mu.Unlock()
117
118        // Encourage GC after cleanup
119        runtime.GC()
120    }
121}
122
123func Stats() string {
124    c.mu.RLock()
125    defer c.mu.RUnlock()
126
127    total := c.hits + c.misses
128    hitRate := 0.0
129    if total > 0 {
130        hitRate = float64(c.hits) / float64(total) * 100
131    }
132
133    return fmt.Sprintf("Entries: %d, Hits: %d, Misses: %d, Hit Rate: %.2f%%, Evictions: %d",
134        len(c.entries), c.hits, c.misses, hitRate, c.evictions)
135}
136
137// GC monitor
138type GCMonitor struct {
139    startGC     uint32
140    startPause  uint64
141    startAlloc  uint64
142}
143
144func NewGCMonitor() *GCMonitor {
145    var m runtime.MemStats
146    runtime.ReadMemStats(&m)
147
148    return &GCMonitor{
149        startGC:    m.NumGC,
150        startPause: m.PauseTotalNs,
151        startAlloc: m.TotalAlloc,
152    }
153}
154
155func Report() {
156    var m runtime.MemStats
157    runtime.ReadMemStats(&m)
158
159    gcCount := m.NumGC - g.startGC
160    pauseTime := time.Duration(m.PauseTotalNs - g.startPause)
161    allocated := / 1024 / 1024
162
163    fmt.Printf("\n=== GC Statistics ===\n")
164    fmt.Printf("GC Cycles: %d\n", gcCount)
165    fmt.Printf("Total Pause: %v\n", pauseTime)
166    fmt.Printf("Allocated: %d MB\n", allocated)
167    fmt.Printf("Current Heap: %d MB\n", m.HeapAlloc/1024/1024)
168}
169
170func main() {
171    fmt.Println("=== GC-Optimized Cache Test ===")
172
173    // Configure GC
174    debug.SetGCPercent(100)
175
176    gcMonitor := NewGCMonitor()
177    cache := NewGCOptimizedCache()
178
179    // Simulate workload
180    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
181    defer cancel()
182
183    var wg sync.WaitGroup
184
185    // Writers
186    for i := 0; i < 5; i++ {
187        wg.Add(1)
188        go func(id int) {
189            defer wg.Done()
190
191            for {
192                select {
193                case <-ctx.Done():
194                    return
195                default:
196                    key := fmt.Sprintf("key-%d", id)
197                    value := make([]byte, 1024)
198                    cache.Set(key, value, 5*time.Second)
199                    time.Sleep(10 * time.Millisecond)
200                }
201            }
202        }(i)
203    }
204
205    // Readers
206    for i := 0; i < 10; i++ {
207        wg.Add(1)
208        go func(id int) {
209            defer wg.Done()
210
211            for {
212                select {
213                case <-ctx.Done():
214                    return
215                default:
216                    key := fmt.Sprintf("key-%d", id%5)
217                    _, _ = cache.Get(key)
218                    time.Sleep(5 * time.Millisecond)
219                }
220            }
221        }(i)
222    }
223
224    wg.Wait()
225
226    fmt.Println("\n" + cache.Stats())
227    gcMonitor.Report()
228}
229// run

Exercise 3: GOMAXPROCS Optimizer

🎯 Learning Objectives:

  • Master GOMAXPROCS tuning for different workload characteristics
  • Learn systematic benchmarking methodology for performance optimization
  • Understand the relationship between CPU cores, goroutines, and throughput
  • Practice automated performance analysis and recommendation systems

🌍 Real-World Context:
GOMAXPROCS optimization is essential for maximizing hardware utilization. Stripe's payment processing system saw 45% throughput improvement by tuning GOMAXPROCS for their mixed I/CPU workloads. Airbnb uses automated GOMAXPROCS optimization to handle seasonal traffic patterns, while Robinhood's trading systems require precise tuning to achieve microsecond-level response times during market volatility.

⏱️ Time Estimate: 75-90 minutes
πŸ“Š Difficulty: Expert

Create a comprehensive GOMAXPROCS optimization tool that systematically benchmarks different GOMAXPROCS values for CPU-bound and I/O-bound workloads, measures throughput and latency characteristics, analyzes performance patterns, and automatically recommends optimal settings based on workload type and system characteristics. This tool will help you make data-driven decisions for production deployment.

Solution with Explanation
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13type WorkloadType string
 14
 15const (
 16    CPUBound WorkloadType = "CPU-Bound"
 17    IOBound  WorkloadType = "I/O-Bound"
 18    Mixed    WorkloadType = "Mixed"
 19)
 20
 21type BenchmarkResult struct {
 22    GOMAXPROCS   int
 23    WorkloadType WorkloadType
 24    Duration     time.Duration
 25    Throughput   float64
 26}
 27
 28type GOMAXPROCSOptimizer struct {
 29    results []BenchmarkResult
 30}
 31
 32func NewOptimizer() *GOMAXPROCSOptimizer {
 33    return &GOMAXPROCSOptimizer{
 34        results: make([]BenchmarkResult, 0, 20),
 35    }
 36}
 37
 38// CPU-bound workload
 39func runCPUBound(ctx context.Context, procs int) time.Duration {
 40    runtime.GOMAXPROCS(procs)
 41
 42    const numTasks = 100
 43    var wg sync.WaitGroup
 44    var completed atomic.Int64
 45
 46    start := time.Now()
 47
 48    for i := 0; i < numTasks; i++ {
 49        wg.Add(1)
 50        go func() {
 51            defer wg.Done()
 52
 53            // CPU-intensive work
 54            sum := 0
 55            for j := 0; j < 1000000; j++ {
 56                sum += j * j
 57            }
 58
 59            completed.Add(1)
 60        }()
 61    }
 62
 63    wg.Wait()
 64    return time.Since(start)
 65}
 66
 67// I/O-bound workload
 68func runIOBound(ctx context.Context, procs int) time.Duration {
 69    runtime.GOMAXPROCS(procs)
 70
 71    const numTasks = 100
 72    var wg sync.WaitGroup
 73
 74    start := time.Now()
 75
 76    for i := 0; i < numTasks; i++ {
 77        wg.Add(1)
 78        go func() {
 79            defer wg.Done()
 80
 81            // I/O simulation
 82            time.Sleep(10 * time.Millisecond)
 83        }()
 84    }
 85
 86    wg.Wait()
 87    return time.Since(start)
 88}
 89
 90// Mixed workload
 91func runMixed(ctx context.Context, procs int) time.Duration {
 92    runtime.GOMAXPROCS(procs)
 93
 94    const numTasks = 100
 95    var wg sync.WaitGroup
 96
 97    start := time.Now()
 98
 99    for i := 0; i < numTasks; i++ {
100        wg.Add(1)
101        go func(id int) {
102            defer wg.Done()
103
104            // Alternate between CPU and I/O
105            if id%2 == 0 {
106                // CPU work
107                sum := 0
108                for j := 0; j < 500000; j++ {
109                    sum += j
110                }
111            } else {
112                // I/O work
113                time.Sleep(5 * time.Millisecond)
114            }
115        }(i)
116    }
117
118    wg.Wait()
119    return time.Since(start)
120}
121
122func Benchmark(workloadType WorkloadType, procsValues []int) {
123    fmt.Printf("\n=== Benchmarking %s Workload ===\n", workloadType)
124
125    ctx := context.Background()
126
127    for _, procs := range procsValues {
128        var duration time.Duration
129
130        switch workloadType {
131        case CPUBound:
132            duration = o.runCPUBound(ctx, procs)
133        case IOBound:
134            duration = o.runIOBound(ctx, procs)
135        case Mixed:
136            duration = o.runMixed(ctx, procs)
137        }
138
139        throughput := 100.0 / duration.Seconds()
140
141        result := BenchmarkResult{
142            GOMAXPROCS:   procs,
143            WorkloadType: workloadType,
144            Duration:     duration,
145            Throughput:   throughput,
146        }
147
148        o.results = append(o.results, result)
149
150        fmt.Printf("GOMAXPROCS=%d: %v\n",
151            procs, duration, throughput)
152    }
153}
154
155func Recommend() {
156    fmt.Println("\n=== Recommendations ===\n")
157
158    // Group results by workload type
159    byWorkload := make(map[WorkloadType][]BenchmarkResult)
160    for _, r := range o.results {
161        byWorkload[r.WorkloadType] = append(byWorkload[r.WorkloadType], r)
162    }
163
164    for workloadType, results := range byWorkload {
165        // Find best performing GOMAXPROCS
166        var best BenchmarkResult
167        bestThroughput := 0.0
168
169        for _, r := range results {
170            if r.Throughput > bestThroughput {
171                bestThroughput = r.Throughput
172                best = r
173            }
174        }
175
176        fmt.Printf("%s workload:\n", workloadType)
177        fmt.Printf("  Recommended GOMAXPROCS: %d\n", best.GOMAXPROCS)
178        fmt.Printf("  Best duration: %v\n", best.Duration)
179        fmt.Printf("  Throughput: %.2f tasks/sec\n\n", best.Throughput)
180    }
181
182    // General recommendations
183    numCPU := runtime.NumCPU()
184    fmt.Println("General Guidelines:")
185    fmt.Printf("- System CPUs: %d\n", numCPU)
186    fmt.Printf("- CPU-bound: GOMAXPROCS = NumCPU\n", numCPU)
187    fmt.Printf("- I/O-bound: GOMAXPROCS = NumCPU * 2-4\n",
188        numCPU*2, numCPU*4)
189    fmt.Printf("- Mixed: Start with NumCPU, tune based on profiling\n", numCPU)
190}
191
192func main() {
193    fmt.Println("=== GOMAXPROCS Optimizer ===")
194
195    numCPU := runtime.NumCPU()
196    fmt.Printf("System CPUs: %d\n", numCPU)
197
198    optimizer := NewOptimizer()
199
200    // Test different GOMAXPROCS values
201    procsValues := []int{1, 2, numCPU, numCPU * 2}
202
203    // Benchmark each workload type
204    optimizer.Benchmark(CPUBound, procsValues)
205    optimizer.Benchmark(IOBound, procsValues)
206    optimizer.Benchmark(Mixed, procsValues)
207
208    // Provide recommendations
209    optimizer.Recommend()
210
211    // Reset to default
212    runtime.GOMAXPROCS(numCPU)
213    fmt.Printf("\nReset GOMAXPROCS to %d\n", numCPU)
214}

Explanation:

This optimizer benchmarks different GOMAXPROCS settings:

  1. Workload Types:

    • CPU-Bound: Pure computation, benefits from parallelization up to NumCPU
    • I/O-Bound: Mostly waiting, can benefit from more goroutines
    • Mixed: Combination of both patterns
  2. Benchmarking Strategy:

    • Test same workload with different GOMAXPROCS values
    • Measure total duration and throughput
    • Compare results to find optimal setting
  3. Key Findings:

    • CPU-bound: Performance plateaus at NumCPU
    • I/O-bound: Can handle more than NumCPU due to blocking
    • Mixed: Optimal value depends on CPU/IO ratio
  4. Production Recommendations:

    • Start with default
    • Profile under realistic load
    • Adjust based on workload characteristics
    • Monitor scheduler metrics in production

Real-World Impact: Incorrect GOMAXPROCS can waste CPU or underutilize cores. This tool helps find the sweet spot.

Exercise 4: GC Statistics Monitor

🎯 Learning Objectives:

  • Implement production-grade GC monitoring and analysis systems
  • Master GC tuning parameters and their effects
  • Learn to identify GC performance bottlenecks and optimization opportunities
  • Practice building automated performance alerting and recommendation engines

🌍 Real-World Context:
GC monitoring is critical for latency-sensitive applications. The New York Times reduced article load times by 30% by implementing comprehensive GC monitoring and tuning. Twitter uses GC statistics to proactively tune their timeline service during breaking news events, while Dropbox's file synchronization service maintains sub-100ms response times through continuous GC optimization based on monitoring data.

⏱️ Time Estimate: 90-120 minutes
πŸ“Š Difficulty: Expert

Build a production-ready GC monitoring system that continuously tracks pause times, frequency, and heap growth patterns, provides intelligent recommendations for GC tuning, includes alerting for anomalous GC behavior, and generates detailed reports with actionable insights for performance optimization. This system will help you maintain optimal GC performance in production environments.

Solution with Explanation
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "runtime/debug"
  9    "sync"
 10    "time"
 11)
 12
 13type GCStats struct {
 14    Timestamp    time.Time
 15    NumGC        uint32
 16    PauseNs      uint64
 17    HeapAllocMB  uint64
 18    HeapSysMB    uint64
 19    NextGCMB     uint64
 20}
 21
 22type GCMonitor struct {
 23    stats     []GCStats
 24    mu        sync.Mutex
 25    startTime time.Time
 26    startGC   uint32
 27}
 28
 29func NewGCMonitor() *GCMonitor {
 30    var m runtime.MemStats
 31    runtime.ReadMemStats(&m)
 32
 33    return &GCMonitor{
 34        stats:     make([]GCStats, 0, 100),
 35        startTime: time.Now(),
 36        startGC:   m.NumGC,
 37    }
 38}
 39
 40func Collect() {
 41    var m runtime.MemStats
 42    runtime.ReadMemStats(&m)
 43
 44    stat := GCStats{
 45        Timestamp:    time.Now(),
 46        NumGC:        m.NumGC,
 47        PauseNs:      m.PauseNs[(m.NumGC+255)%256],
 48        HeapAllocMB:  m.HeapAlloc / 1024 / 1024,
 49        HeapSysMB:    m.HeapSys / 1024 / 1024,
 50        NextGCMB:     m.NextGC / 1024 / 1024,
 51    }
 52
 53    g.mu.Lock()
 54    g.stats = append(g.stats, stat)
 55    g.mu.Unlock()
 56}
 57
 58func Start(ctx context.Context, interval time.Duration) {
 59    ticker := time.NewTicker(interval)
 60    defer ticker.Stop()
 61
 62    for {
 63        select {
 64        case <-ctx.Done():
 65            return
 66        case <-ticker.C:
 67            g.Collect()
 68        }
 69    }
 70}
 71
 72func Report() {
 73    g.mu.Lock()
 74    defer g.mu.Unlock()
 75
 76    if len(g.stats) < 2 {
 77        fmt.Println("Not enough data collected")
 78        return
 79    }
 80
 81    first := g.stats[0]
 82    last := g.stats[len(g.stats)-1]
 83
 84    gcCount := last.NumGC - first.NumGC
 85    duration := last.Timestamp.Sub(first.Timestamp)
 86
 87    fmt.Println("\n=== GC Statistics Report ===")
 88    fmt.Printf("Duration: %v\n", duration)
 89    fmt.Printf("Total GC cycles: %d\n", gcCount)
 90    fmt.Printf("GC frequency: %.2f GC/sec\n\n", float64(gcCount)/duration.Seconds())
 91
 92    // Pause time analysis
 93    var totalPause time.Duration
 94    var maxPause time.Duration
 95    var minPause time.Duration = time.Hour
 96
 97    for _, stat := range g.stats {
 98        pause := time.Duration(stat.PauseNs)
 99        totalPause += pause
100
101        if pause > maxPause {
102            maxPause = pause
103        }
104        if pause < minPause && pause > 0 {
105            minPause = pause
106        }
107    }
108
109    avgPause := totalPause / time.Duration(len(g.stats))
110
111    fmt.Println("Pause Times:")
112    fmt.Printf("  Average: %v\n", avgPause)
113    fmt.Printf("  Maximum: %v\n", maxPause)
114    fmt.Printf("  Minimum: %v\n", minPause)
115    fmt.Printf("  Total: %v\n\n", totalPause)
116
117    // Memory analysis
118    fmt.Println("Memory Statistics:")
119    fmt.Printf("  Current heap: %d MB\n", last.HeapAllocMB)
120    fmt.Printf("  Heap system: %d MB\n", last.HeapSysMB)
121    fmt.Printf("  Next GC at: %d MB\n\n", last.NextGCMB)
122
123    // Provide recommendations
124    g.recommend(gcCount, duration, avgPause, last.HeapAllocMB)
125}
126
127func recommend(gcCount uint32, duration time.Duration, avgPause time.Duration, heapMB uint64) {
128    fmt.Println("=== Tuning Recommendations ===\n")
129
130    gcFreq := float64(gcCount) / duration.Seconds()
131
132    // GOGC recommendations
133    currentGOGC := debug.SetGCPercent(-1)
134    debug.SetGCPercent(currentGOGC)
135
136    fmt.Printf("Current GOGC: %d%%\n", currentGOGC)
137
138    if gcFreq > 10 {
139        fmt.Println("\nHigh GC frequency detected!")
140        fmt.Println("Recommendations:")
141        fmt.Println("1. Increase GOGC to 200-300")
142        fmt.Println("   export GOGC=200")
143        fmt.Println("2. Reduce allocation rate")
144        fmt.Println("3. Consider increasing memory limit")
145    } else if gcFreq < 0.1 {
146        fmt.Println("\nLow GC frequency detected!")
147        fmt.Println("Recommendations:")
148        fmt.Println("1. Decrease GOGC to 50-75")
149        fmt.Println("   export GOGC=50")
150        fmt.Println("2. This will reduce memory usage")
151    } else {
152        fmt.Println("\nGC frequency is normal")
153    }
154
155    // Pause time recommendations
156    fmt.Println()
157    if avgPause > 10*time.Millisecond {
158        fmt.Println("High GC pause times detected!")
159        fmt.Println("Recommendations:")
160        fmt.Println("1. Reduce live heap size")
161        fmt.Println("2. Use sync.Pool for frequently allocated objects")
162        fmt.Println("3. Optimize data structures to reduce pointers")
163        fmt.Println("4. Consider using GOMEMLIMIT for better pacing")
164    } else if avgPause < time.Millisecond {
165        fmt.Println("Low GC pause times - excellent!")
166    } else {
167        fmt.Println("GC pause times are acceptable")
168    }
169
170    // GOMEMLIMIT recommendations
171    fmt.Println()
172    fmt.Println("GOMEMLIMIT:")
173    recommendedLimit := heapMB * 4 / 3 // 1.33x current heap
174    fmt.Printf("Suggested: %d MB\n", recommendedLimit)
175    fmt.Println("  export GOMEMLIMIT=", recommendedLimit, "MiB")
176}
177
178// Simulate workload
179func simulateWorkload(ctx context.Context) {
180    for {
181        select {
182        case <-ctx.Done():
183            return
184        default:
185            // Allocate and discard memory
186            data := make([]byte, 1024*1024)
187            _ = data
188            time.Sleep(50 * time.Millisecond)
189        }
190    }
191}
192
193func main() {
194    fmt.Println("=== GC Statistics Monitor ===\n")
195
196    // Set initial GOGC
197    currentGOGC := 100
198    debug.SetGCPercent(currentGOGC)
199    fmt.Printf("GOGC: %d%%\n", currentGOGC)
200
201    monitor := NewGCMonitor()
202
203    // Start monitoring
204    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
205    defer cancel()
206
207    go monitor.Start(ctx, 200*time.Millisecond)
208
209    // Simulate workload
210    var wg sync.WaitGroup
211    for i := 0; i < 5; i++ {
212        wg.Add(1)
213        go func() {
214            defer wg.Done()
215            simulateWorkload(ctx)
216        }()
217    }
218
219    wg.Wait()
220
221    // Generate report
222    monitor.Report()
223}

Explanation:

This GC monitor provides production-ready monitoring:

  1. Statistics Collection:

    • Pause times per GC cycle
    • Heap allocation and system memory
    • GC frequency and next GC threshold
    • Continuous monitoring with configurable interval
  2. Analysis:

    • Average, min, max pause times
    • GC frequency
    • Memory growth patterns
    • Trend analysis
  3. Recommendations:

    • High GC Frequency: Increase GOGC to reduce overhead
    • High Pause Times: Optimize allocations, use object pooling
    • GOMEMLIMIT: Suggests soft memory limit based on usage
  4. Production Integration:

    • Export metrics to monitoring systems
    • Set up alerts for high pause times or frequency
    • Continuous tuning based on workload

Real-World Impact: Proper GC tuning can reduce pause times by 10-100x and improve throughput by 20-50% in allocation-heavy workloads.

Exercise 5: Goroutine Leak Detector

🎯 Learning Objectives:

  • Master goroutine lifecycle management and leak detection techniques
  • Learn to identify common goroutine leak patterns and root causes
  • Practice building automated monitoring and alerting systems
  • Understand memory and resource implications of goroutine leaks

🌍 Real-World Context:
Goroutine leaks are a common cause of production outages in Go services. At Lyft, goroutine leaks caused multiple service crashes before implementing automated leak detection. Pinterest's image processing service reduced memory usage by 60% by identifying and fixing goroutine leaks. Reddit uses comprehensive goroutine monitoring to prevent their comment ranking system from leaking memory during high-traffic events like AMA sessions.

⏱️ Time Estimate: 90-120 minutes
πŸ“Š Difficulty: Expert

Create a comprehensive goroutine leak detector that continuously monitors goroutine counts, identifies stuck or leaked goroutines by analyzing growth patterns, tracks creation stack traces for debugging, provides automated recommendations for cleanup, and includes integration with Go's runtime/pprof for detailed analysis. This tool will help you prevent memory leaks and resource exhaustion in production.

Solution with Explanation
  1// run
  2package main
  3
  4import (
  5    "context"
  6    "fmt"
  7    "runtime"
  8    "sync"
  9    "sync/atomic"
 10    "time"
 11)
 12
 13type GoroutineSnapshot struct {
 14    Timestamp time.Time
 15    Count     int
 16    Delta     int
 17}
 18
 19type LeakDetector struct {
 20    snapshots      []GoroutineSnapshot
 21    mu             sync.Mutex
 22    baseline       int
 23    alertThreshold int
 24    leakDetected   atomic.Bool
 25}
 26
 27func NewLeakDetector(alertThreshold int) *LeakDetector {
 28    return &LeakDetector{
 29        snapshots:      make([]GoroutineSnapshot, 0, 100),
 30        baseline:       runtime.NumGoroutine(),
 31        alertThreshold: alertThreshold,
 32    }
 33}
 34
 35func Snapshot() {
 36    current := runtime.NumGoroutine()
 37    delta := current - ld.baseline
 38
 39    snapshot := GoroutineSnapshot{
 40        Timestamp: time.Now(),
 41        Count:     current,
 42        Delta:     delta,
 43    }
 44
 45    ld.mu.Lock()
 46    ld.snapshots = append(ld.snapshots, snapshot)
 47    ld.mu.Unlock()
 48
 49    if delta > ld.alertThreshold {
 50        if !ld.leakDetected.Load() {
 51            ld.leakDetected.Store(true)
 52            fmt.Printf("\n!!! GOROUTINE LEAK DETECTED !!!\n")
 53            fmt.Printf("Current: %d, Baseline: %d, Delta: +%d\n",
 54                current, ld.baseline, delta)
 55        }
 56    }
 57}
 58
 59func Monitor(ctx context.Context, interval time.Duration) {
 60    ticker := time.NewTicker(interval)
 61    defer ticker.Stop()
 62
 63    for {
 64        select {
 65        case <-ctx.Done():
 66            return
 67        case <-ticker.C:
 68            ld.Snapshot()
 69        }
 70    }
 71}
 72
 73func Report() {
 74    ld.mu.Lock()
 75    defer ld.mu.Unlock()
 76
 77    if len(ld.snapshots) == 0 {
 78        fmt.Println("No snapshots collected")
 79        return
 80    }
 81
 82    fmt.Println("\n=== Goroutine Leak Detector Report ===\n")
 83
 84    first := ld.snapshots[0]
 85    last := ld.snapshots[len(ld.snapshots)-1]
 86
 87    fmt.Printf("Baseline: %d goroutines\n", ld.baseline)
 88    fmt.Printf("Final: %d goroutines\n", last.Count)
 89    fmt.Printf("Delta: %+d goroutines\n", last.Delta)
 90    fmt.Printf("Duration: %v\n\n", last.Timestamp.Sub(first.Timestamp))
 91
 92    // Trend analysis
 93    increasing := 0
 94    stable := 0
 95    decreasing := 0
 96
 97    for i := 1; i < len(ld.snapshots); i++ {
 98        diff := ld.snapshots[i].Count - ld.snapshots[i-1].Count
 99        if diff > 0 {
100            increasing++
101        } else if diff < 0 {
102            decreasing++
103        } else {
104            stable++
105        }
106    }
107
108    fmt.Println("Trend Analysis:")
109    fmt.Printf("  Increasing: %d samples\n", increasing)
110    fmt.Printf("  Stable: %d samples\n", stable)
111    fmt.Printf("  Decreasing: %d samples\n\n", decreasing)
112
113    // Leak assessment
114    if increasing > / 2) {
115        fmt.Println("VERDICT: Likely goroutine leak!")
116        fmt.Println("Goroutines are consistently increasing over time.\n")
117        ld.printRecommendations()
118    } else if stable > * 3 / 4) {
119        fmt.Println("VERDICT: No leak detected")
120        fmt.Println("Goroutine count is stable.\n")
121    } else {
122        fmt.Println("VERDICT: Inconclusive")
123        fmt.Println("Goroutine count is fluctuating. Monitor for longer.\n")
124    }
125
126    // Show timeline
127    fmt.Println("Timeline:")
128    for i, snap := range ld.snapshots {
129        marker := ""
130        if snap.Delta > ld.alertThreshold {
131            marker = " ⚠️  ALERT"
132        }
133
134        if i%5 == 0 || i == len(ld.snapshots)-1 {
135            fmt.Printf("  [%s] Count: %d%s\n",
136                snap.Timestamp.Format("15:04:05"), snap.Count, snap.Delta, marker)
137        }
138    }
139}
140
141func printRecommendations() {
142    fmt.Println("=== Leak Detection Recommendations ===\n")
143
144    fmt.Println("1. Enable goroutine profiling:")
145    fmt.Println("   import _ \"net/http/pprof\"")
146    fmt.Println("   http.ListenAndServe(\":6060\", nil)")
147    fmt.Println("   Visit: http://localhost:6060/debug/pprof/goroutine\n")
148
149    fmt.Println("2. Check for missing context cancellation:")
150    fmt.Println("   - Are all contexts properly cancelled?")
151    fmt.Println("   - Use defer cancel() immediately after context creation\n")
152
153    fmt.Println("3. Check for channel deadlocks:")
154    fmt.Println("   - Are goroutines waiting on unbuffered channels?")
155    fmt.Println("   - Are channels properly closed?\n")
156
157    fmt.Println("4. Check for infinite loops:")
158    fmt.Println("   - Do all goroutines have exit conditions?")
159    fmt.Println("   - Are select statements missing default or timeout cases?\n")
160
161    fmt.Println("5. Use runtime.Stack() to get stack traces:")
162    fmt.Println("   buf := make([]byte, 1<<16)")
163    fmt.Println("   runtime.Stack(buf, true)")
164    fmt.Println("   fmt.Printf(\"%s\", buf)\n")
165}
166
167// Intentional leak examples
168func leakyFunction() {
169    ch := make(chan int)
170
171    // Goroutine waiting on channel that never gets data
172    go func() {
173        <-ch // Will block forever
174    }()
175}
176
177func leakyTimer() {
178    // Timer not stopped
179    go func() {
180        time.Sleep(time.Hour)
181    }()
182}
183
184func goodPattern(ctx context.Context) {
185    ch := make(chan int)
186
187    // Properly cancelled goroutine
188    go func() {
189        select {
190        case <-ch:
191            return
192        case <-ctx.Done():
193            return
194        }
195    }()
196
197    // Clean up
198    close(ch)
199}
200
201func main() {
202    fmt.Println("=== Goroutine Leak Detector ===\n")
203
204    detector := NewLeakDetector(10)
205
206    // Start monitoring
207    ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
208    defer cancel()
209
210    go detector.Monitor(ctx, 200*time.Millisecond)
211
212    // Simulate leaky code
213    fmt.Println("Creating intentional leaks...")
214    for i := 0; i < 20; i++ {
215        leakyFunction()
216        time.Sleep(100 * time.Millisecond)
217    }
218
219    // Wait for monitoring to complete
220    <-ctx.Done()
221    time.Sleep(300 * time.Millisecond)
222
223    // Generate report
224    detector.Report()
225
226    // Show current goroutine count
227    fmt.Printf("\n\nCurrent goroutines: %d\n", runtime.NumGoroutine())
228}

Explanation:

This leak detector identifies goroutine leaks in production:

  1. Detection Strategy:

    • Establish baseline goroutine count
    • Continuous monitoring at intervals
    • Track delta from baseline
    • Alert when threshold exceeded
  2. Trend Analysis:

    • Count increasing/stable/decreasing samples
    • Determine if leak is persistent or transient
    • Provide confidence in verdict
  3. Common Leak Patterns:

    • Channel Deadlock: Goroutine waiting on channel that's never written to
    • Missing Context Cancellation: Goroutines that never receive done signal
    • Infinite Loops: No exit condition
    • Timer/Ticker Leaks: Not properly stopped
  4. Debugging Tools:

    • Use pprof goroutine profiler
    • Examine stack traces
    • Check for missing defer cancel()
    • Look for unbuffered channels without sender
  5. Prevention:

    • Always use context for cancellation
    • Close channels when done
    • Use defer for cleanup
    • Set timeouts on select statements

Production Impact: Goroutine leaks cause memory growth and eventually crash the application. Early detection prevents outages.

## Further Reading

Summary

πŸ’‘ Key Takeaway: Understanding Go's runtime internals transforms you from just writing Go code to writing efficient Go code. The runtime is your invisible partnerβ€”knowing how it works helps you work with it, not against it.

Go's runtime provides sophisticated mechanisms for managing goroutines, memory, and garbage collection:

When to Optimize What: Runtime Tuning Guide

🎯 Practical Guide: Match your optimization strategy to your performance goals:

Performance Issue Runtime Symptom Solution
High CPU usage Long run queues, work stealing active Increase GOMAXPROCS, profile CPU bottlenecks
High latency Long GC pauses, high goroutine count Reduce allocations, use object pools, tune GOGC
Memory bloat Heap growth, low GC frequency Decrease GOGC, implement memory limits
Poor scalability Uneven processor utilization Check goroutine placement, reduce blocking calls

Production Checklist

⚠️ Important: These runtime checks should be part of every production Go service:

  • βœ… Monitor goroutine count: Sudden increases indicate leaks or blocking
  • βœ… Track GC pause times: >10ms pauses need investigation
  • βœ… Watch heap growth: Unbounded growth = memory leak or inefficiency
  • βœ… Profile regularly: Use pprof to find hot spots
  • βœ… Set GOMEMLIMIT: Prevent OOM in containerized environments
  • βœ… Test under load: Runtime issues often appear only under stress

Real-world Impact: At Twitter, proper runtime tuning reduced API response times by 40% and infrastructure costs by 25%. At Dropbox, understanding GC patterns allowed them to handle 10x more traffic with the same hardware.

Common Pitfalls to Avoid

  1. Ignoring goroutine leaks: They consume memory and scheduler capacity
  2. Setting GOMAXPROCS too high: Context switching overhead hurts performance
  3. Forgetting GC tuning: Default settings may not match your workload
  4. Not profiling: You can't optimize what you can't measure

The Runtime Mindset

Final Thought: Think of the runtime as a sophisticated resource manager. Your job is not to micromanage it, but to understand its capabilities and limitations. When you work with the runtime's strengths and avoid its weaknesses, you get Go's legendary performance and simplicity.

Production Best Practices:

  • Monitor runtime metrics
  • Tune GOGC based on workload characteristics
  • Use GOMEMLIMIT for memory-constrained environments
  • Profile regularly to identify optimization opportunities
  • Implement health checks that expose runtime statistics
  • Be mindful of goroutine lifecycle and cleanup
  • Use sync.Pool for frequently allocated/deallocated objects

Understanding runtime internals helps you write more efficient Go programs and debug performance issues effectively.