Why Memory Optimization Matters
Think of memory management like running a restaurant. If you keep ordering new ingredients for every dish instead of reusing what's in your pantry, you'll waste money, create waste, and slow down service. Memory optimization is the art of keeping your pantry organized and reusing ingredients efficiently.
In Go applications, smart memory management can reduce cloud costs by 50-90% and eliminate performance bottlenecks. It's the difference between services that scale smoothly to millions of users and services that crash under load.
💡 Key Takeaway: Memory optimization isn't just about using less memory—it's about using memory more efficiently to get better performance and lower costs.
Real-World Impact:
Discord - Reduced memory by 90% with one change:
- Before: 5GB per instance
- After: 500MB per instance
- Savings: $1M+/year in infrastructure costs
Twitch - GC optimization for live streaming:
- Problem: 100ms GC pauses during peak
- Solution: Escape analysis + memory pooling
- Result: GC pauses < 1ms, zero buffering
Memory vs Performance:
Impact of Memory Allocations:
├─ Zero allocations: 50ms
├─ 1 alloc/op: 150ms
├─ 10 allocs/op: 800ms
└─ 100 allocs/op: 5,000ms
GC Impact on Latency:
├─ 100MB heap: p99 = 5ms
├─ 1GB heap: p99 = 50ms
├─ 10GB heap: p99 = 500ms
└─ Optimization: Keep heap <1GB per service
Learning Objectives
By the end of this tutorial, you will master:
Core Concepts:
- Go's memory allocation model and garbage collection
- Stack vs heap allocation and escape analysis
- Memory pooling and object reuse patterns
- GC tuning and performance optimization
Practical Skills:
- Identifying and eliminating unnecessary allocations
- Implementing efficient memory pools
- Using pprof to find memory bottlenecks
- Optimizing data structures for memory efficiency
Production Patterns:
- Zero-allocation techniques for hot paths
- Memory-aware caching strategies
- Performance testing and benchmarking
- Real-world memory optimization case studies
Core Concepts
Understanding Go's Memory Model
Memory management is critical for building high-performance Go applications. While Go's garbage collector handles most memory management automatically, understanding how memory works and applying optimization techniques can dramatically improve application performance, reduce memory footprint, and minimize GC pressure—often by 10-100x.
Memory Allocation Regions:
Go manages memory in three primary regions:
Stack Storage:
- Function-local variables
- Automatically cleaned up when function returns
- Extremely fast allocation and deallocation
- Limited size (~1MB per goroutine, grows dynamically)
Heap Storage:
- Long-lived data that survives function calls
- Managed by garbage collector
- Slower allocation and requires cleanup
- Much larger capacity
Static Storage:
- Global variables and constants
- Allocated at program startup
- Fixed size for entire program duration
Memory Allocation Lifecycle
Understanding the complete lifecycle of memory allocation helps you write more efficient code:
1package main
2
3import (
4 "fmt"
5 "runtime"
6)
7
8func demonstrateLifecycle() {
9 // 1. Stack Allocation Phase
10 x := 42 // Allocated on stack, instant
11
12 // 2. Heap Allocation Phase
13 ptr := new(int) // Allocated on heap
14 *ptr = 100
15
16 // 3. Usage Phase
17 fmt.Printf("Stack value: %d, Heap value: %d\n", x, *ptr)
18
19 // 4. Cleanup Phase
20 // x: Cleaned up automatically when function returns
21 // ptr: Becomes garbage when no longer referenced
22 // GC will eventually collect it
23}
24
25func main() {
26 var m runtime.MemStats
27 runtime.ReadMemStats(&m)
28 fmt.Printf("Before: Heap %d MB\n", m.Alloc/1024/1024)
29
30 demonstrateLifecycle()
31
32 runtime.GC() // Force garbage collection
33 runtime.ReadMemStats(&m)
34 fmt.Printf("After GC: Heap %d MB\n", m.Alloc/1024/1024)
35}
Why Memory Optimization Matters
Performance: Fewer allocations = faster code
Latency: Reduced GC pauses improve response times
Scalability: Lower memory footprint = more capacity
Cost: Less memory usage = lower infrastructure costs
The Golden Rule: Keep as much data on the stack as possible, reuse heap objects when you can't, and profile to find where it matters most.
Memory Management Patterns
Different memory patterns suit different use cases. Understanding when to use each pattern is crucial for optimization:
1package main
2
3import (
4 "fmt"
5 "time"
6)
7
8// Pattern 1: Short-lived temporary data (use stack)
9func computeTemporary() int {
10 temp := make([]int, 100) // Stack allocated if small enough
11 sum := 0
12 for i := range temp {
13 temp[i] = i
14 sum += temp[i]
15 }
16 return sum // temp discarded after return
17}
18
19// Pattern 2: Long-lived shared data (use heap)
20type Cache struct {
21 data map[string][]byte
22}
23
24func newCache() *Cache {
25 return &Cache{
26 data: make(map[string][]byte), // Heap allocated
27 }
28}
29
30// Pattern 3: High-frequency temporary objects (use pool)
31var bufferPool = sync.Pool{
32 New: func() interface{} {
33 return make([]byte, 4096)
34 },
35}
36
37func processWithPool(data []byte) {
38 buf := bufferPool.Get().([]byte)
39 defer bufferPool.Put(buf)
40 // Use buf...
41}
42
43// Pattern 4: Fixed-size ring buffer (pre-allocated)
44type RingBuffer struct {
45 data [1024]byte // Pre-allocated, no GC pressure
46 head int
47 tail int
48 count int
49}
50
51func main() {
52 // Demonstrate different patterns
53 fmt.Println("Temporary:", computeTemporary())
54
55 cache := newCache()
56 cache.data["key"] = []byte("value")
57
58 processWithPool([]byte("test"))
59
60 rb := &RingBuffer{}
61 fmt.Printf("Ring buffer ready: %d bytes\n", len(rb.data))
62}
Practical Examples
Getting Started with Memory Analysis
Let's start by learning how to measure memory usage and identify allocation patterns in your Go applications.
1package main
2
3import (
4 "fmt"
5 "runtime"
6 "time"
7)
8
9// run
10func main() {
11 // Get current memory statistics
12 var m runtime.MemStats
13 runtime.ReadMemStats(&m)
14
15 fmt.Printf("=== Current Memory Usage ===\n")
16 fmt.Printf("Heap allocated: %d MB\n", m.Alloc/1024/1024)
17 fmt.Printf("Total allocated: %d MB\n", m.TotalAlloc/1024/1024)
18 fmt.Printf("System memory: %d MB\n", m.Sys/1024/1024)
19 fmt.Printf("GC cycles: %d\n", m.NumGC)
20 fmt.Printf("Number of goroutines: %d\n", runtime.NumGoroutine())
21
22 // Simulate some allocations
23 fmt.Println("\n=== Allocating 100MB ===")
24 data := make([]byte, 100*1024*1024)
25 _ = data
26
27 runtime.ReadMemStats(&m)
28 fmt.Printf("Heap after allocation: %d MB\n", m.Alloc/1024/1024)
29 fmt.Printf("Total allocated: %d MB\n", m.TotalAlloc/1024/1024)
30
31 // Force GC and measure
32 fmt.Println("\n=== After GC ===")
33 data = nil // Release reference
34 runtime.GC()
35 time.Sleep(100 * time.Millisecond) // Give GC time to complete
36
37 runtime.ReadMemStats(&m)
38 fmt.Printf("Heap after GC: %d MB\n", m.Alloc/1024/1024)
39 fmt.Printf("GC cycles: %d\n", m.NumGC)
40
41 // Key metrics to understand:
42 // - Alloc: Memory currently in use
43 // - TotalAlloc: Total memory ever allocated
44 // - Sys: Total memory obtained from OS
45 // - NumGC: How many garbage collections have run
46}
Advanced Memory Statistics
1package main
2
3import (
4 "fmt"
5 "runtime"
6 "time"
7)
8
9func printDetailedMemStats() {
10 var m runtime.MemStats
11 runtime.ReadMemStats(&m)
12
13 fmt.Printf("=== Detailed Memory Statistics ===\n")
14
15 // General statistics
16 fmt.Printf("\nGeneral:\n")
17 fmt.Printf(" Alloc: %10d bytes (%d MB)\n", m.Alloc, m.Alloc/1024/1024)
18 fmt.Printf(" TotalAlloc: %10d bytes (%d MB)\n", m.TotalAlloc, m.TotalAlloc/1024/1024)
19 fmt.Printf(" Sys: %10d bytes (%d MB)\n", m.Sys, m.Sys/1024/1024)
20 fmt.Printf(" Lookups: %10d\n", m.Lookups)
21 fmt.Printf(" Mallocs: %10d\n", m.Mallocs)
22 fmt.Printf(" Frees: %10d\n", m.Frees)
23
24 // Heap statistics
25 fmt.Printf("\nHeap:\n")
26 fmt.Printf(" HeapAlloc: %10d bytes (%d MB)\n", m.HeapAlloc, m.HeapAlloc/1024/1024)
27 fmt.Printf(" HeapSys: %10d bytes (%d MB)\n", m.HeapSys, m.HeapSys/1024/1024)
28 fmt.Printf(" HeapIdle: %10d bytes (%d MB)\n", m.HeapIdle, m.HeapIdle/1024/1024)
29 fmt.Printf(" HeapInuse: %10d bytes (%d MB)\n", m.HeapInuse, m.HeapInuse/1024/1024)
30 fmt.Printf(" HeapReleased: %10d bytes (%d MB)\n", m.HeapReleased, m.HeapReleased/1024/1024)
31 fmt.Printf(" HeapObjects: %10d\n", m.HeapObjects)
32
33 // GC statistics
34 fmt.Printf("\nGarbage Collection:\n")
35 fmt.Printf(" NumGC: %10d\n", m.NumGC)
36 fmt.Printf(" NumForcedGC: %10d\n", m.NumForcedGC)
37 fmt.Printf(" GCCPUFraction:%10.6f\n", m.GCCPUFraction)
38
39 if m.NumGC > 0 {
40 lastPause := time.Duration(m.PauseNs[(m.NumGC+255)%256])
41 fmt.Printf(" LastPause: %10v\n", lastPause)
42
43 // Calculate average pause time
44 var totalPause time.Duration
45 numSamples := uint32(256)
46 if m.NumGC < 256 {
47 numSamples = m.NumGC
48 }
49 for i := uint32(0); i < numSamples; i++ {
50 totalPause += time.Duration(m.PauseNs[i])
51 }
52 avgPause := totalPause / time.Duration(numSamples)
53 fmt.Printf(" AvgPause: %10v\n", avgPause)
54 }
55
56 // Stack statistics
57 fmt.Printf("\nStack:\n")
58 fmt.Printf(" StackInuse: %10d bytes (%d MB)\n", m.StackInuse, m.StackInuse/1024/1024)
59 fmt.Printf(" StackSys: %10d bytes (%d MB)\n", m.StackSys, m.StackSys/1024/1024)
60}
61
62func main() {
63 printDetailedMemStats()
64}
Understanding Stack vs Heap Allocation
Think of stack vs heap allocation like cooking in your kitchen:
- Stack: Using a temporary cutting board that gets cleaned up automatically when you're done
- Heap: Storing ingredients in the main pantry where someone needs to organize them later
Stack allocation is fast and automatically cleaned up when the function returns:
1package main
2
3import "fmt"
4
5// run
6// Stack allocation - value doesn't escape
7func stackAllocation() int {
8 x := 42 // Allocated on stack
9 return x
10}
11
12// Heap allocation - value escapes via pointer
13func heapAllocation() *int {
14 x := 42 // Allocated on heap
15 return &x // Someone else needs this later!
16}
17
18// Demonstrating the difference
19func compareAllocations() {
20 // Stack: fast, no GC pressure
21 for i := 0; i < 1000; i++ {
22 _ = stackAllocation()
23 }
24
25 // Heap: slower, creates GC pressure
26 pointers := make([]*int, 1000)
27 for i := 0; i < 1000; i++ {
28 pointers[i] = heapAllocation()
29 }
30 _ = pointers
31}
32
33func main() {
34 // Stack allocation: fast, no GC pressure
35 a := stackAllocation()
36 fmt.Printf("Stack allocated value: %d\n", a)
37
38 // Heap allocation: slower, adds GC pressure
39 b := heapAllocation()
40 fmt.Printf("Heap allocated value: %d\n", *b)
41
42 compareAllocations()
43 fmt.Println("Allocation comparison complete")
44}
Real-world Example: A web server processing 1M requests per second reduced latency by 40% after moving temporary buffers from heap to stack allocation.
Stack vs Heap Decision Factors
The compiler uses sophisticated analysis to decide whether a variable belongs on the stack or heap:
1package main
2
3import "fmt"
4
5type Data struct {
6 values [100]int
7}
8
9// Stays on stack - value returned
10func returnsValue() Data {
11 return Data{} // Entire struct on stack
12}
13
14// Goes to heap - pointer returned
15func returnsPointer() *Data {
16 return &Data{} // Escapes to heap
17}
18
19// Stays on stack - pointer not leaked
20func internalPointer() int {
21 d := Data{}
22 ptr := &d // Pointer doesn't escape
23 return ptr.values[0]
24}
25
26// Goes to heap - stored in slice
27func storedInSlice(slice []*Data) {
28 d := Data{}
29 slice = append(slice, &d) // d escapes
30}
31
32// Size matters
33func largeAllocation() {
34 // Large objects go to heap regardless
35 var large [1000000]int // Too large for stack
36 _ = large
37}
38
39// Compile-time unknown size
40func dynamicSize(n int) {
41 // Dynamic size means heap
42 slice := make([]int, n) // Size unknown at compile time
43 _ = slice
44}
45
46func main() {
47 // Different allocation patterns
48 v := returnsValue()
49 fmt.Printf("Value: %v\n", v.values[0])
50
51 p := returnsPointer()
52 fmt.Printf("Pointer: %v\n", p.values[0])
53
54 fmt.Printf("Internal: %d\n", internalPointer())
55
56 slice := make([]*Data, 0)
57 storedInSlice(slice)
58
59 largeAllocation()
60 dynamicSize(100)
61}
Memory Alignment for Efficiency
Memory alignment is like packing items in a box. If you throw items in randomly, you waste space. If you pack them thoughtfully, you fit more in the same box.
1package main
2
3import (
4 "fmt"
5 "unsafe"
6)
7
8// run
9// Poorly aligned struct
10// Like packing: [small item] [big item] [small item] = wasted space
11type BadStruct struct {
12 a bool // 1 byte + 7 padding bytes
13 b int64 // 8 bytes
14 c bool // 1 byte + 7 padding bytes
15 d int32 // 4 bytes + 4 padding bytes
16}
17
18// Well aligned struct
19// Like packing: [big items first] [medium] [small items] = efficient
20type GoodStruct struct {
21 b int64 // 8 bytes
22 d int32 // 4 bytes
23 a bool // 1 byte
24 c bool // 1 byte + 2 padding bytes
25}
26
27// Optimal alignment
28type OptimalStruct struct {
29 // 8-byte aligned fields first
30 b int64 // 8 bytes
31
32 // 4-byte aligned fields
33 d int32 // 4 bytes
34
35 // Smaller fields packed together
36 a bool // 1 byte
37 c bool // 1 byte
38 e bool // 1 byte
39 f bool // 1 byte (no padding needed!)
40}
41
42func main() {
43 fmt.Printf("BadStruct size: %d bytes\n", unsafe.Sizeof(BadStruct{}))
44 fmt.Printf("GoodStruct size: %d bytes\n", unsafe.Sizeof(GoodStruct{}))
45 fmt.Printf("OptimalStruct size: %d bytes\n", unsafe.Sizeof(OptimalStruct{}))
46
47 // Calculate savings
48 badSize := unsafe.Sizeof(BadStruct{})
49 goodSize := unsafe.Sizeof(GoodStruct{})
50 optimalSize := unsafe.Sizeof(OptimalStruct{})
51
52 fmt.Printf("\nSavings:\n")
53 fmt.Printf(" Good vs Bad: %d bytes (%.1f%% reduction)\n",
54 badSize-goodSize,
55 float64(badSize-goodSize)/float64(badSize)*100)
56 fmt.Printf(" Optimal vs Bad: %d bytes (%.1f%% reduction)\n",
57 badSize-optimalSize,
58 float64(badSize-optimalSize)/float64(badSize)*100)
59
60 // Impact on arrays
61 const arraySize = 10000
62 fmt.Printf("\nArray of %d structs:\n", arraySize)
63 fmt.Printf(" BadStruct array: %d KB\n", badSize*arraySize/1024)
64 fmt.Printf(" GoodStruct array: %d KB\n", goodSize*arraySize/1024)
65 fmt.Printf(" OptimalStruct array: %d KB\n", optimalSize*arraySize/1024)
66}
Key Points:
- Order struct fields by size (largest first)
- Group similar-sized fields together
- This can reduce memory usage by 30-50%
- Critical for arrays of structs
- Impacts cache locality and performance
Practical Alignment Examples
1package main
2
3import (
4 "fmt"
5 "unsafe"
6)
7
8// Real-world example: User struct
9type UserBad struct {
10 active bool // 1 byte + 7 padding
11 id int64 // 8 bytes
12 verified bool // 1 byte + 3 padding
13 age int32 // 4 bytes
14 premium bool // 1 byte + 7 padding
15 timestamp int64 // 8 bytes
16}
17
18type UserGood struct {
19 id int64 // 8 bytes
20 timestamp int64 // 8 bytes
21 age int32 // 4 bytes
22 active bool // 1 byte
23 verified bool // 1 byte
24 premium bool // 1 byte
25 _ byte // 1 byte padding (explicit)
26}
27
28// Cache line optimization (64 bytes)
29type CacheLineOptimized struct {
30 // Hot fields (frequently accessed together)
31 id int64 // 8 bytes
32 counter int64 // 8 bytes
33 timestamp int64 // 8 bytes
34 active bool // 1 byte
35 _ [39]byte // Padding to 64 bytes
36
37 // Cold fields (separate cache line)
38 metadata string
39 extra map[string]interface{}
40}
41
42func main() {
43 fmt.Printf("UserBad size: %d bytes\n", unsafe.Sizeof(UserBad{}))
44 fmt.Printf("UserGood size: %d bytes\n", unsafe.Sizeof(UserGood{}))
45 fmt.Printf("CacheLineOptimized size: %d bytes\n", unsafe.Sizeof(CacheLineOptimized{}))
46
47 // Memory savings for 1 million users
48 const million = 1000000
49 badTotal := unsafe.Sizeof(UserBad{}) * million
50 goodTotal := unsafe.Sizeof(UserGood{}) * million
51
52 fmt.Printf("\nFor 1 million users:\n")
53 fmt.Printf(" Bad design: %.1f MB\n", float64(badTotal)/1024/1024)
54 fmt.Printf(" Good design: %.1f MB\n", float64(goodTotal)/1024/1024)
55 fmt.Printf(" Savings: %.1f MB\n", float64(badTotal-goodTotal)/1024/1024)
56}
Zero-Allocation String Operations
String concatenation is a common source of unnecessary allocations. Here's how to avoid it:
1package main
2
3import (
4 "fmt"
5 "strings"
6 "time"
7)
8
9// run
10// BAD: Creates multiple allocations in loop
11func badConcat(parts []string) string {
12 result := ""
13 for _, part := range parts {
14 result += part // Each += creates new string!
15 }
16 return result
17}
18
19// GOOD: Single allocation with pre-allocation
20func goodConcat(parts []string) string {
21 // Calculate total length first
22 totalLen := 0
23 for _, part := range parts {
24 totalLen += len(part)
25 }
26
27 // Pre-allocate and build
28 var b strings.Builder
29 b.Grow(totalLen) // Pre-allocate exact size
30
31 for _, part := range parts {
32 b.WriteString(part)
33 }
34 return b.String()
35}
36
37// BEST: Using Join for simple cases
38func bestConcat(parts []string) string {
39 return strings.Join(parts, "")
40}
41
42func benchmark() {
43 parts := make([]string, 100)
44 for i := range parts {
45 parts[i] = "part"
46 }
47
48 // Benchmark bad version
49 start := time.Now()
50 for i := 0; i < 1000; i++ {
51 _ = badConcat(parts)
52 }
53 badTime := time.Since(start)
54
55 // Benchmark good version
56 start = time.Now()
57 for i := 0; i < 1000; i++ {
58 _ = goodConcat(parts)
59 }
60 goodTime := time.Since(start)
61
62 // Benchmark best version
63 start = time.Now()
64 for i := 0; i < 1000; i++ {
65 _ = bestConcat(parts)
66 }
67 bestTime := time.Since(start)
68
69 fmt.Printf("Bad version: %v\n", badTime)
70 fmt.Printf("Good version: %v (%.1fx faster)\n", goodTime, float64(badTime)/float64(goodTime))
71 fmt.Printf("Best version: %v (%.1fx faster)\n", bestTime, float64(badTime)/float64(bestTime))
72}
73
74func main() {
75 parts := []string{"Hello", " ", "World", "!"}
76
77 // Test all versions
78 result1 := badConcat(parts)
79 result2 := goodConcat(parts)
80 result3 := bestConcat(parts)
81
82 fmt.Printf("Bad version result: %s\n", result1)
83 fmt.Printf("Good version result: %s\n", result2)
84 fmt.Printf("Best version result: %s\n", result3)
85
86 fmt.Println("\nRunning benchmark...")
87 benchmark()
88}
Performance Impact:
- Bad version: O(n²) allocations for n parts
- Good version: O(1) allocations
- Real-world impact: 10-100x faster for large number of parts
String Builder Advanced Techniques
1package main
2
3import (
4 "fmt"
5 "strings"
6)
7
8// Technique 1: Efficient string building with size hint
9func buildHTML(items []string) string {
10 // Estimate size: tag overhead + content
11 estimatedSize := len(items) * 20 // Rough estimate
12 for _, item := range items {
13 estimatedSize += len(item)
14 }
15
16 var b strings.Builder
17 b.Grow(estimatedSize)
18
19 b.WriteString("<ul>\n")
20 for _, item := range items {
21 b.WriteString(" <li>")
22 b.WriteString(item)
23 b.WriteString("</li>\n")
24 }
25 b.WriteString("</ul>")
26
27 return b.String()
28}
29
30// Technique 2: Reusable builder
31type StringBuilderPool struct {
32 pool *sync.Pool
33}
34
35func NewStringBuilderPool() *StringBuilderPool {
36 return &StringBuilderPool{
37 pool: &sync.Pool{
38 New: func() interface{} {
39 return &strings.Builder{}
40 },
41 },
42 }
43}
44
45func (p *StringBuilderPool) Get() *strings.Builder {
46 return p.pool.Get().(*strings.Builder)
47}
48
49func (p *StringBuilderPool) Put(b *strings.Builder) {
50 b.Reset()
51 p.pool.Put(b)
52}
53
54// Technique 3: Fast integer to string
55func fastIntToString(n int) string {
56 if n == 0 {
57 return "0"
58 }
59
60 var b strings.Builder
61 b.Grow(20) // Max digits for int64
62
63 if n < 0 {
64 b.WriteByte('-')
65 n = -n
66 }
67
68 // Convert to string efficiently
69 var digits [20]byte
70 i := len(digits)
71 for n > 0 {
72 i--
73 digits[i] = byte('0' + n%10)
74 n /= 10
75 }
76
77 b.Write(digits[i:])
78 return b.String()
79}
80
81func main() {
82 // HTML generation example
83 items := []string{"Apple", "Banana", "Cherry", "Date"}
84 html := buildHTML(items)
85 fmt.Println(html)
86
87 // String builder pool example
88 pool := NewStringBuilderPool()
89 builder := pool.Get()
90 builder.WriteString("Hello, ")
91 builder.WriteString("World!")
92 result := builder.String()
93 pool.Put(builder)
94 fmt.Println(result)
95
96 // Fast int to string
97 fmt.Println(fastIntToString(12345))
98 fmt.Println(fastIntToString(-67890))
99}
Object Pooling for Performance
Object pooling reuses expensive-to-create objects instead of allocating new ones each time.
1package main
2
3import (
4 "bytes"
5 "fmt"
6 "sync"
7 "time"
8)
9
10// run
11// Global pool for buffers
12var bufferPool = sync.Pool{
13 New: func() interface{} {
14 // Create new buffer when pool is empty
15 return new(bytes.Buffer)
16 },
17}
18
19// BAD: Allocate new buffer every time
20func badProcess(data []byte) string {
21 buf := bytes.NewBuffer(data) // New allocation every call
22 buf.WriteString(" - processed")
23 return buf.String()
24}
25
26// GOOD: Reuse buffers from pool
27func goodProcess(data []byte) string {
28 // Get buffer from pool
29 buf := bufferPool.Get().(*bytes.Buffer)
30 defer bufferPool.Put(buf) // Return to pool when done
31
32 // Reset and use buffer
33 buf.Reset()
34 buf.Write(data)
35 buf.WriteString(" - processed")
36 return buf.String()
37}
38
39func benchmark() {
40 data := []byte("Hello, World!")
41
42 // Benchmark bad version
43 start := time.Now()
44 for i := 0; i < 10000; i++ {
45 _ = badProcess(data)
46 }
47 badDuration := time.Since(start)
48
49 // Benchmark good version
50 start = time.Now()
51 for i := 0; i < 10000; i++ {
52 _ = goodProcess(data)
53 }
54 goodDuration := time.Since(start)
55
56 fmt.Printf("Bad version: %v\n", badDuration)
57 fmt.Printf("Good version: %v\n", goodDuration)
58 fmt.Printf("Speedup: %.2fx\n", float64(badDuration)/float64(goodDuration))
59}
60
61func main() {
62 data := []byte("Test data")
63
64 fmt.Println("Bad version:", badProcess(data))
65 fmt.Println("Good version:", goodProcess(data))
66
67 fmt.Println("\nRunning benchmark...")
68 benchmark()
69}
Real-World Impact: Discord reduced memory usage by 90% by implementing proper object pooling patterns.
Common Patterns and Pitfalls
Escape Analysis
Escape analysis is like a smart assistant that decides whether your data should stay on your temporary workbench or need to be stored in the main warehouse. The compiler analyzes how you use your variables to make this decision automatically.
What's Happening: The compiler performs escape analysis during compilation to determine the lifetime and scope of variables. If a variable's lifetime is confined to a function, it can be allocated on the stack—which is fast and automatically cleaned up. If the variable might outlive the function, it must be allocated on the heap and managed by the garbage collector.
⚠️ Important: Understanding escape analysis helps you write code that naturally stays on the stack, dramatically improving performance without extra effort.
Viewing Escape Analysis
Think of the -m flag like a window into the compiler's mind—it shows you exactly why it made stack vs heap decisions. This is incredibly valuable for optimization.
1go build -gcflags='-m' main.go
1package main
2
3import "fmt"
4
5// Does NOT escape - stays on stack
6func noEscape() {
7 x := make([]int, 100)
8 fmt.Println(len(x))
9}
10
11// DOES escape - moves to heap
12func escapes() *[]int {
13 x := make([]int, 100)
14 return &x // Escaping to heap - someone needs this later
15}
16
17// Escapes via interface
18func escapesInterface() {
19 x := 42
20 fmt.Println(x) // x escapes to heap for fmt.Println
21}
22
23func main() {
24 noEscape()
25 ptr := escapes()
26 fmt.Println(len(*ptr))
27 escapesInterface()
28}
29
30// Compiler output with -gcflags='-m':
31// ./main.go:7:11: make([]int, 100) does not escape
32// ./main.go:13:11: make([]int, 100) escapes to heap
33// ./main.go:14:9: &x escapes to heap
34// ./main.go:19:14: x escapes to heap
💡 Key Takeaway: Look for "escapes to heap" messages—each one is an opportunity to potentially improve performance by redesigning your code.
Common Escape Scenarios
Understanding why variables escape helps you design better code. Here are the most common "escape triggers":
1package main
2
3import "fmt"
4
5type Data struct {
6 values []int
7}
8
9// Scenario 1: Return pointer - ESCAPES
10func returnPointer() *Data {
11 d := Data{values: make([]int, 10)}
12 return &d // d escapes to heap - someone needs it later
13}
14
15// Scenario 2: Store in global - ESCAPES
16var global *Data
17
18func storeInGlobal() {
19 d := Data{values: make([]int, 10)}
20 global = &d // d escapes to heap - global lives forever
21}
22
23// Scenario 3: Send to channel - ESCAPES
24func sendToChannel(ch chan *Data) {
25 d := Data{values: make([]int, 10)}
26 ch <- &d // d escapes to heap - another goroutine will receive it
27}
28
29// Scenario 4: Interface conversion - MAY ESCAPE
30func useInterface(v interface{}) {
31 fmt.Println(v) // v escapes to heap - interface{} forces heap allocation
32}
33
34// Scenario 5: Large allocation - ESCAPES
35func largeAllocation() {
36 // Large arrays escape to heap - stack is typically only a few MB
37 arr := [1000000]int{} // Escapes due to size
38 _ = arr
39}
40
41// Scenario 6: Unknown size - ESCAPES
42func dynamicSize(n int) {
43 // Non-constant size escapes - compiler doesn't know if it fits on stack
44 arr := make([]int, n) // Escapes if n not constant
45 _ = arr
46}
47
48// Scenario 7: Closure capture - MAY ESCAPE
49func closureCapture() func() int {
50 x := 42
51 return func() int {
52 return x // x escapes - closure outlives function
53 }
54}
55
56// Optimization: Use value receiver to avoid escape
57func (d Data) ProcessValue() {
58 // d is passed by value - no escape
59 for i := range d.values {
60 d.values[i]++
61 }
62}
63
64// Pointer receiver - may cause escape
65func (d *Data) ProcessPointer() {
66 // d might escape depending on usage
67 for i := range d.values {
68 d.values[i]++
69 }
70}
71
72func main() {
73 // Demonstrate each scenario
74 ptr := returnPointer()
75 fmt.Printf("Returned pointer: %v\n", len(ptr.values))
76
77 storeInGlobal()
78 fmt.Printf("Global: %v\n", global != nil)
79
80 ch := make(chan *Data, 1)
81 sendToChannel(ch)
82 data := <-ch
83 fmt.Printf("From channel: %v\n", len(data.values))
84
85 useInterface(42)
86
87 largeAllocation()
88 dynamicSize(100)
89
90 fn := closureCapture()
91 fmt.Printf("Closure: %d\n", fn())
92}
⚠️ Important: Interface{} parameters often cause unexpected escapes. If you're in a hot path, avoid interface{} and use specific types.
Real-World Example: A JSON parsing library was redesigned to avoid interface{} allocations, reducing parsing time by 40% and eliminating 90% of heap allocations.
Preventing Escapes
Now let's learn practical techniques to keep our data on the stack where it belongs.
Common Pitfalls and Solutions:
1package main
2
3import (
4 "fmt"
5 "strings"
6)
7
8// Bad: String concatenation in a loop
9func badConcat(strs []string) string {
10 result := ""
11 for _, s := range strs {
12 result += s // Each += causes reallocation and escape
13 }
14 return result
15}
16
17// Good: Use strings.Builder to prevent escapes
18func goodConcat(strs []string) string {
19 var b strings.Builder
20 // Pre-allocate capacity if known
21 totalLen := 0
22 for _, s := range strs {
23 totalLen += len(s)
24 }
25 b.Grow(totalLen)
26
27 for _, s := range strs {
28 b.WriteString(s)
29 }
30 return b.String()
31}
32
33// Example: Avoiding escape with fixed-size buffers
34func processWithFixedBuffer() {
35 const maxSize = 1024
36 var buf [maxSize]byte // Stack-allocated
37
38 // Use buf for processing...
39 copy(buf[:], "Hello, World!")
40 fmt.Printf("Processed: %s\n", buf[:13])
41}
42
43// Bad: Dynamic size escapes
44func processWithDynamicBuffer(size int) {
45 buf := make([]byte, size) // Heap-allocated
46 _ = buf
47}
48
49// Good: Return value instead of pointer
50func createValueNotPointer() Data {
51 return Data{values: make([]int, 10)}
52}
53
54// Bad: Return pointer (escapes)
55func createPointer() *Data {
56 return &Data{values: make([]int, 10)}
57}
58
59type Data struct {
60 values []int
61}
62
63func main() {
64 strs := []string{"Hello", " ", "World"}
65
66 result1 := badConcat(strs)
67 result2 := goodConcat(strs)
68
69 fmt.Println("Bad concat:", result1)
70 fmt.Println("Good concat:", result2)
71
72 processWithFixedBuffer()
73 processWithDynamicBuffer(1024)
74
75 // Value vs pointer
76 val := createValueNotPointer()
77 ptr := createPointer()
78
79 fmt.Printf("Value: %v\n", len(val.values))
80 fmt.Printf("Pointer: %v\n", len(ptr.values))
81}
💡 Key Takeaway: When working with strings in loops, always use strings.Builder with pre-allocated capacity. This single pattern can prevent thousands of allocations in hot paths.
When to use X vs Y:
- Use string concatenation: For 2-3 strings, simple and readable
- Use strings.Builder: For 4+ strings or loops, much more efficient
- Use fixed arrays: When you know the maximum size at compile time
- Use dynamic slices: When size varies and can't be predicted
Real-World Example: A log processing service was handling 1M log entries per second with string concatenation, causing 50MB/sec of allocations. Switching to strings.Builder with pre-allocation reduced allocations by 95% and improved throughput by 3x.
Advanced Escape Analysis Patterns
1package main
2
3import "fmt"
4
5// Pattern 1: Escaping through indirect assignment
6type Container struct {
7 data *Data
8}
9
10type Data struct {
11 value int
12}
13
14func indirectEscape() *Container {
15 c := Container{}
16 d := Data{value: 42}
17 c.data = &d // d escapes through c
18 return &c
19}
20
21// Pattern 2: Escaping through slice
22func sliceEscape() []*Data {
23 result := make([]*Data, 10)
24 for i := range result {
25 d := Data{value: i}
26 result[i] = &d // Each d escapes
27 }
28 return result
29}
30
31// Pattern 3: Avoiding escape with value semantics
32func noEscapeValue() []Data {
33 result := make([]Data, 10)
34 for i := range result {
35 result[i] = Data{value: i} // No escape
36 }
37 return result
38}
39
40// Pattern 4: Escaping through defer
41func deferEscape() {
42 x := 42
43 defer fmt.Println(&x) // x escapes for defer
44}
45
46// Pattern 5: Preventing escape with inline
47func processInline(data []int) int {
48 sum := 0
49 for _, v := range data {
50 sum += v // No escape, inlined
51 }
52 return sum
53}
54
55// Pattern 6: Pool to avoid escape
56var dataPool = sync.Pool{
57 New: func() interface{} {
58 return &Data{}
59 },
60}
61
62func usePool() {
63 d := dataPool.Get().(*Data)
64 defer dataPool.Put(d)
65 d.value = 100
66 // Use d without escape
67}
68
69func main() {
70 c := indirectEscape()
71 fmt.Printf("Indirect: %v\n", c.data.value)
72
73 slice1 := sliceEscape()
74 slice2 := noEscapeValue()
75 fmt.Printf("Slice escape: %d items\n", len(slice1))
76 fmt.Printf("Slice no escape: %d items\n", len(slice2))
77
78 deferEscape()
79
80 data := []int{1, 2, 3, 4, 5}
81 sum := processInline(data)
82 fmt.Printf("Sum: %d\n", sum)
83
84 usePool()
85}
Memory Pooling with sync.Pool
sync.Pool provides a way to reuse objects and reduce allocation pressure.
Why This Works: sync.Pool maintains a cache of reusable objects. When you Get() an object, it's pulled from the pool instead of allocating new memory. When you Put() it back, it becomes available for reuse. This dramatically reduces allocation rate in high-throughput scenarios. The pool automatically clears during GC, preventing unbounded growth.
Basic sync.Pool Usage
1package main
2
3import (
4 "bytes"
5 "fmt"
6 "sync"
7)
8
9// run
10var bufferPool = sync.Pool{
11 New: func() interface{} {
12 // Create new buffer if pool is empty
13 return new(bytes.Buffer)
14 },
15}
16
17func processData(data []byte) {
18 // Get buffer from pool
19 buf := bufferPool.Get().(*bytes.Buffer)
20
21 // Reset buffer
22 buf.Reset()
23
24 // Use the buffer
25 buf.Write(data)
26 fmt.Println(buf.String())
27
28 // Return buffer to pool
29 bufferPool.Put(buf)
30}
31
32func main() {
33 data := []byte("Hello, World!")
34
35 // Process multiple times - buffers are reused
36 for i := 0; i < 3; i++ {
37 fmt.Printf("Iteration %d: ", i+1)
38 processData(data)
39 }
40
41 fmt.Println("All iterations complete - buffers were reused")
42}
Pool for Custom Types
1package main
2
3import (
4 "fmt"
5 "sync"
6)
7
8type Request struct {
9 ID int
10 Payload []byte
11 Result []byte
12}
13
14var requestPool = sync.Pool{
15 New: func() interface{} {
16 return &Request{
17 Payload: make([]byte, 0, 4096),
18 Result: make([]byte, 0, 4096),
19 }
20 },
21}
22
23func GetRequest() *Request {
24 return requestPool.Get().(*Request)
25}
26
27func PutRequest(req *Request) {
28 // Reset fields before returning to pool
29 req.ID = 0
30 req.Payload = req.Payload[:0]
31 req.Result = req.Result[:0]
32
33 requestPool.Put(req)
34}
35
36func handleRequest(id int, payload []byte) []byte {
37 // Acquire from pool
38 req := GetRequest()
39 defer PutRequest(req) // Return to pool when done
40
41 // Use request
42 req.ID = id
43 req.Payload = append(req.Payload, payload...)
44
45 // Process...
46 req.Result = append(req.Result, []byte("Processed: ")...)
47 req.Result = append(req.Result, req.Payload...)
48
49 // Return a copy
50 result := make([]byte, len(req.Result))
51 copy(result, req.Result)
52 return result
53}
54
55func main() {
56 for i := 0; i < 5; i++ {
57 payload := []byte(fmt.Sprintf("Request %d", i))
58 result := handleRequest(i, payload)
59 fmt.Println(string(result))
60 }
61}
Pool Best Practices
1package main
2
3import (
4 "fmt"
5 "sync"
6 "sync/atomic"
7)
8
9type PooledObject struct {
10 data []byte
11 refs int32 // Reference counter for debugging
12}
13
14// Good pool with proper initialization
15var goodPool = sync.Pool{
16 New: func() interface{} {
17 return &PooledObject{
18 data: make([]byte, 0, 1024), // Pre-allocate capacity
19 }
20 },
21}
22
23func useGoodPool() {
24 obj := goodPool.Get().(*PooledObject)
25 atomic.AddInt32(&obj.refs, 1)
26
27 defer func() {
28 // IMPORTANT: Reset state before returning
29 obj.data = obj.data[:0]
30 atomic.StoreInt32(&obj.refs, 0)
31 goodPool.Put(obj)
32 }()
33
34 // Use obj...
35 obj.data = append(obj.data, []byte("data")...)
36 fmt.Printf("Good pool: %s (refs: %d)\n", obj.data, obj.refs)
37}
38
39// Common mistake: Not resetting state
40var badPool = sync.Pool{
41 New: func() interface{} {
42 return &PooledObject{
43 data: make([]byte, 0, 1024),
44 }
45 },
46}
47
48func useBadPool() {
49 obj := badPool.Get().(*PooledObject)
50 defer badPool.Put(obj) // BUG: Not resetting state!
51
52 // This appends to whatever was left from previous use
53 obj.data = append(obj.data, []byte("data")...)
54 fmt.Printf("Bad pool: %s (may contain old data!)\n", obj.data)
55}
56
57// Pool size monitoring
58type MonitoredPool struct {
59 pool sync.Pool
60 gets int64
61 puts int64
62 creates int64
63}
64
65func NewMonitoredPool(newFunc func() interface{}) *MonitoredPool {
66 mp := &MonitoredPool{}
67 mp.pool.New = func() interface{} {
68 atomic.AddInt64(&mp.creates, 1)
69 return newFunc()
70 }
71 return mp
72}
73
74func (mp *MonitoredPool) Get() interface{} {
75 atomic.AddInt64(&mp.gets, 1)
76 return mp.pool.Get()
77}
78
79func (mp *MonitoredPool) Put(obj interface{}) {
80 atomic.AddInt64(&mp.puts, 1)
81 mp.pool.Put(obj)
82}
83
84func (mp *MonitoredPool) Stats() (gets, puts, creates int64) {
85 return atomic.LoadInt64(&mp.gets),
86 atomic.LoadInt64(&mp.puts),
87 atomic.LoadInt64(&mp.creates)
88}
89
90func main() {
91 fmt.Println("=== Good Pool Usage ===")
92 for i := 0; i < 3; i++ {
93 useGoodPool()
94 }
95
96 fmt.Println("\n=== Bad Pool Usage (bug demonstration) ===")
97 for i := 0; i < 3; i++ {
98 useBadPool()
99 }
100
101 fmt.Println("\n=== Monitored Pool ===")
102 monPool := NewMonitoredPool(func() interface{} {
103 return &PooledObject{data: make([]byte, 0, 1024)}
104 })
105
106 // Use pool
107 for i := 0; i < 10; i++ {
108 obj := monPool.Get().(*PooledObject)
109 monPool.Put(obj)
110 }
111
112 gets, puts, creates := monPool.Stats()
113 fmt.Printf("Gets: %d, Puts: %d, Creates: %d\n", gets, puts, creates)
114 fmt.Printf("Reuse rate: %.1f%%\n", float64(gets-creates)/float64(gets)*100)
115}
Advanced Pool Patterns
1package main
2
3import (
4 "fmt"
5 "sync"
6)
7
8// Pattern 1: Sized pools for different object sizes
9type SizedPool struct {
10 small sync.Pool
11 medium sync.Pool
12 large sync.Pool
13}
14
15func NewSizedPool() *SizedPool {
16 return &SizedPool{
17 small: sync.Pool{New: func() interface{} { return make([]byte, 0, 1024) }},
18 medium: sync.Pool{New: func() interface{} { return make([]byte, 0, 4096) }},
19 large: sync.Pool{New: func() interface{} { return make([]byte, 0, 16384) }},
20 }
21}
22
23func (sp *SizedPool) Get(size int) []byte {
24 switch {
25 case size <= 1024:
26 return sp.small.Get().([]byte)
27 case size <= 4096:
28 return sp.medium.Get().([]byte)
29 default:
30 return sp.large.Get().([]byte)
31 }
32}
33
34func (sp *SizedPool) Put(buf []byte) {
35 cap := cap(buf)
36 buf = buf[:0] // Reset length
37
38 switch {
39 case cap <= 1024:
40 sp.small.Put(buf)
41 case cap <= 4096:
42 sp.medium.Put(buf)
43 case cap <= 16384:
44 sp.large.Put(buf)
45 }
46}
47
48// Pattern 2: Type-safe pool wrapper
49type TypedPool[T any] struct {
50 pool sync.Pool
51}
52
53func NewTypedPool[T any](newFunc func() *T) *TypedPool[T] {
54 return &TypedPool[T]{
55 pool: sync.Pool{
56 New: func() interface{} {
57 return newFunc()
58 },
59 },
60 }
61}
62
63func (tp *TypedPool[T]) Get() *T {
64 return tp.pool.Get().(*T)
65}
66
67func (tp *TypedPool[T]) Put(obj *T) {
68 tp.pool.Put(obj)
69}
70
71// Pattern 3: Pool with cleanup function
72type CleanablePool struct {
73 pool sync.Pool
74 cleanup func(interface{})
75}
76
77func NewCleanablePool(newFunc func() interface{}, cleanup func(interface{})) *CleanablePool {
78 return &CleanablePool{
79 pool: sync.Pool{New: newFunc},
80 cleanup: cleanup,
81 }
82}
83
84func (cp *CleanablePool) Get() interface{} {
85 return cp.pool.Get()
86}
87
88func (cp *CleanablePool) Put(obj interface{}) {
89 cp.cleanup(obj)
90 cp.pool.Put(obj)
91}
92
93func main() {
94 // Test sized pool
95 sp := NewSizedPool()
96 buf1 := sp.Get(512)
97 buf2 := sp.Get(2048)
98 fmt.Printf("Got buffer 1: cap=%d\n", cap(buf1))
99 fmt.Printf("Got buffer 2: cap=%d\n", cap(buf2))
100 sp.Put(buf1)
101 sp.Put(buf2)
102
103 // Test typed pool
104 type User struct {
105 ID int
106 Name string
107 }
108
109 userPool := NewTypedPool(func() *User { return &User{} })
110 user := userPool.Get()
111 user.ID = 1
112 user.Name = "Alice"
113 fmt.Printf("User: %+v\n", user)
114 userPool.Put(user)
115
116 // Test cleanable pool
117 cleanPool := NewCleanablePool(
118 func() interface{} {
119 return &bytes.Buffer{}
120 },
121 func(obj interface{}) {
122 obj.(*bytes.Buffer).Reset()
123 },
124 )
125
126 buf := cleanPool.Get().(*bytes.Buffer)
127 buf.WriteString("test")
128 fmt.Printf("Buffer: %s\n", buf.String())
129 cleanPool.Put(buf) // Automatically reset
130}
Reducing Allocations
Pre-allocating Slices
1package main
2
3import (
4 "fmt"
5 "time"
6)
7
8// run
9// Bad: Multiple allocations as slice grows
10func badAppend() []int {
11 var result []int
12 for i := 0; i < 1000; i++ {
13 result = append(result, i) // Reallocates many times
14 }
15 return result
16}
17
18// Good: Pre-allocate capacity
19func goodAppend() []int {
20 result := make([]int, 0, 1000) // Pre-allocate capacity
21 for i := 0; i < 1000; i++ {
22 result = append(result, i) // No reallocations
23 }
24 return result
25}
26
27// Best: Pre-allocate exact size if known
28func bestAppend() []int {
29 result := make([]int, 1000) // Pre-allocate with length
30 for i := 0; i < 1000; i++ {
31 result[i] = i // Direct assignment, no append
32 }
33 return result
34}
35
36func benchmarkAppends() {
37 iterations := 1000
38
39 // Benchmark bad
40 start := time.Now()
41 for i := 0; i < iterations; i++ {
42 _ = badAppend()
43 }
44 badTime := time.Since(start)
45
46 // Benchmark good
47 start = time.Now()
48 for i := 0; i < iterations; i++ {
49 _ = goodAppend()
50 }
51 goodTime := time.Since(start)
52
53 // Benchmark best
54 start = time.Now()
55 for i := 0; i < iterations; i++ {
56 _ = bestAppend()
57 }
58 bestTime := time.Since(start)
59
60 fmt.Printf("Bad (no pre-alloc): %v\n", badTime)
61 fmt.Printf("Good (pre-alloc cap): %v (%.1fx faster)\n",
62 goodTime, float64(badTime)/float64(goodTime))
63 fmt.Printf("Best (pre-alloc len): %v (%.1fx faster)\n",
64 bestTime, float64(badTime)/float64(bestTime))
65}
66
67func main() {
68 // Create each version
69 bad := badAppend()
70 good := goodAppend()
71 best := bestAppend()
72
73 fmt.Printf("Bad result length: %d\n", len(bad))
74 fmt.Printf("Good result length: %d\n", len(good))
75 fmt.Printf("Best result length: %d\n", len(best))
76
77 fmt.Println("\nRunning benchmarks...")
78 benchmarkAppends()
79
80 // Show allocation counts
81 fmt.Println("\nApproximate allocations:")
82 fmt.Println("Bad: ~20+ allocations (grows exponentially)")
83 fmt.Println("Good: 1 allocation")
84 fmt.Println("Best: 1 allocation")
85}
Slice Pre-allocation Strategies
1package main
2
3import "fmt"
4
5// Strategy 1: Known size at compile time
6func fixedSize() []int {
7 return make([]int, 100) // Exact size known
8}
9
10// Strategy 2: Estimated size from input
11func estimatedSize(input []string) []int {
12 // Estimate: one int per string
13 result := make([]int, 0, len(input))
14 for _, s := range input {
15 result = append(result, len(s))
16 }
17 return result
18}
19
20// Strategy 3: Growing with known upper bound
21func boundedGrowth(max int) []int {
22 result := make([]int, 0, max)
23 for i := 0; i < max; i++ {
24 if i%2 == 0 {
25 result = append(result, i)
26 }
27 }
28 return result
29}
30
31// Strategy 4: Batch allocation for nested structures
32func batchAllocation(rows, cols int) [][]int {
33 // Allocate all memory at once
34 backing := make([]int, rows*cols)
35 result := make([][]int, rows)
36
37 for i := range result {
38 result[i] = backing[i*cols : (i+1)*cols : (i+1)*cols]
39 }
40
41 return result
42}
43
44// Strategy 5: Over-allocate for append-heavy workloads
45func overAllocate(expectedSize int) []int {
46 // Allocate 25% more for growth room
47 capacity := expectedSize + expectedSize/4
48 return make([]int, 0, capacity)
49}
50
51func main() {
52 s1 := fixedSize()
53 fmt.Printf("Fixed size: len=%d cap=%d\n", len(s1), cap(s1))
54
55 input := []string{"hello", "world", "foo", "bar"}
56 s2 := estimatedSize(input)
57 fmt.Printf("Estimated size: len=%d cap=%d\n", len(s2), cap(s2))
58
59 s3 := boundedGrowth(100)
60 fmt.Printf("Bounded growth: len=%d cap=%d\n", len(s3), cap(s3))
61
62 matrix := batchAllocation(10, 20)
63 fmt.Printf("Batch allocation: %d rows x %d cols\n", len(matrix), len(matrix[0]))
64
65 s4 := overAllocate(100)
66 fmt.Printf("Over-allocated: len=%d cap=%d\n", len(s4), cap(s4))
67}
String Building Optimization
1package main
2
3import (
4 "fmt"
5 "strings"
6 "time"
7)
8
9// run
10// Bad: Many allocations
11func badStringConcat(parts []string) string {
12 result := ""
13 for _, part := range parts {
14 result += part // Each concat allocates new string
15 }
16 return result
17}
18
19// Good: strings.Builder
20func goodStringConcat(parts []string) string {
21 var b strings.Builder
22 for _, part := range parts {
23 b.WriteString(part)
24 }
25 return b.String()
26}
27
28// Best: Pre-sized strings.Builder
29func bestStringConcat(parts []string) string {
30 var b strings.Builder
31
32 // Calculate total length
33 totalLen := 0
34 for _, part := range parts {
35 totalLen += len(part)
36 }
37
38 // Pre-allocate
39 b.Grow(totalLen)
40
41 // Build string
42 for _, part := range parts {
43 b.WriteString(part)
44 }
45
46 return b.String()
47}
48
49func benchmarkStrings() {
50 parts := make([]string, 100)
51 for i := range parts {
52 parts[i] = "part"
53 }
54
55 iterations := 1000
56
57 // Bad
58 start := time.Now()
59 for i := 0; i < iterations; i++ {
60 _ = badStringConcat(parts)
61 }
62 badTime := time.Since(start)
63
64 // Good
65 start = time.Now()
66 for i := 0; i < iterations; i++ {
67 _ = goodStringConcat(parts)
68 }
69 goodTime := time.Since(start)
70
71 // Best
72 start = time.Now()
73 for i := 0; i < iterations; i++ {
74 _ = bestStringConcat(parts)
75 }
76 bestTime := time.Since(start)
77
78 fmt.Printf("Bad (concat): %v\n", badTime)
79 fmt.Printf("Good (builder): %v (%.1fx faster)\n",
80 goodTime, float64(badTime)/float64(goodTime))
81 fmt.Printf("Best (pre-sized): %v (%.1fx faster)\n",
82 bestTime, float64(badTime)/float64(bestTime))
83}
84
85func main() {
86 parts := []string{"Hello", " ", "World", "!"}
87
88 result1 := badStringConcat(parts)
89 result2 := goodStringConcat(parts)
90 result3 := bestStringConcat(parts)
91
92 fmt.Printf("Bad: %s\n", result1)
93 fmt.Printf("Good: %s\n", result2)
94 fmt.Printf("Best: %s\n", result3)
95
96 fmt.Println("\nRunning benchmarks...")
97 benchmarkStrings()
98
99 fmt.Println("\nApproximate allocations for 100 parts:")
100 fmt.Println("Bad: ~100 allocations (one per concat)")
101 fmt.Println("Good: ~3-5 allocations (builder grows)")
102 fmt.Println("Best: 1 allocation (pre-sized)")
103}
Avoiding Interface Allocations
1package main
2
3import "fmt"
4
5// run
6// Bad: Allocates on every call
7func badPrint(value int) {
8 fmt.Println(value) // value escapes to interface{}
9}
10
11// Better: Batch operations
12func betterPrint(values []int) {
13 for _, v := range values {
14 fmt.Println(v)
15 }
16}
17
18// Best: Use specific types when possible
19func bestPrint(values []int) {
20 for _, v := range values {
21 // Use functions that don't require interface{}
22 s := fmt.Sprintf("%d", v)
23 fmt.Println(s)
24 }
25}
26
27// Avoid reflection when possible
28type Stringer interface {
29 String() string
30}
31
32type User struct {
33 Name string
34 Age int
35}
36
37// Implement String() to avoid reflection
38func (u User) String() string {
39 return fmt.Sprintf("User{Name: %s, Age: %d}", u.Name, u.Age)
40}
41
42func main() {
43 values := []int{1, 2, 3, 4, 5}
44
45 fmt.Println("Bad version:")
46 for _, v := range values {
47 badPrint(v)
48 }
49
50 fmt.Println("\nBetter version:")
51 betterPrint(values)
52
53 fmt.Println("\nBest version:")
54 bestPrint(values)
55
56 fmt.Println("\nUser with String():")
57 user := User{Name: "Alice", Age: 30}
58 fmt.Println(user) // Uses String() method
59}
Slice and Map Reuse
1package main
2
3import (
4 "fmt"
5 "sync"
6)
7
8// run
9// Reuse slices by reslicing
10func reuseSlice() {
11 buf := make([]byte, 1024)
12
13 for i := 0; i < 5; i++ {
14 // Reuse buf by reslicing
15 data := buf[:0] // Clear but keep capacity
16
17 // Fill data...
18 msg := fmt.Sprintf("iteration %d data", i)
19 data = append(data, []byte(msg)...)
20
21 fmt.Printf("Iteration %d: %s (cap: %d)\n", i, string(data), cap(data))
22 }
23}
24
25// Reuse maps by clearing
26func reuseMap() {
27 m := make(map[string]int, 100) // Pre-sized
28
29 for i := 0; i < 5; i++ {
30 // Clear map for reuse (Go 1.21+)
31 clear(m)
32
33 // Or manually:
34 // for k := range m {
35 // delete(m, k)
36 // }
37
38 // Refill map...
39 m["iteration"] = i
40 m["value"] = i * 10
41
42 fmt.Printf("Iteration %d: %v\n", i, m)
43 }
44}
45
46// Pool maps for reuse
47var mapPool = sync.Pool{
48 New: func() interface{} {
49 return make(map[string]int, 100)
50 },
51}
52
53func useMapFromPool() {
54 m := mapPool.Get().(map[string]int)
55 defer func() {
56 // Clear before returning
57 clear(m)
58 mapPool.Put(m)
59 }()
60
61 // Use m...
62 m["key1"] = 42
63 m["key2"] = 100
64 fmt.Printf("Map from pool: %v\n", m)
65}
66
67func main() {
68 fmt.Println("=== Reusing Slices ===")
69 reuseSlice()
70
71 fmt.Println("\n=== Reusing Maps ===")
72 reuseMap()
73
74 fmt.Println("\n=== Map Pool ===")
75 for i := 0; i < 3; i++ {
76 useMapFromPool()
77 }
78}
Memory Profiling with pprof
Enabling Memory Profiling
1package main
2
3import (
4 "fmt"
5 "os"
6 "runtime"
7 "runtime/pprof"
8)
9
10func allocateMemory() {
11 // Simulate memory allocation
12 for i := 0; i < 1000; i++ {
13 _ = make([]byte, 1024*1024) // 1MB each
14 }
15}
16
17func main() {
18 // Create memory profile file
19 f, err := os.Create("mem.prof")
20 if err != nil {
21 panic(err)
22 }
23 defer f.Close()
24
25 // Do work that allocates memory
26 allocateMemory()
27
28 // Force GC to get accurate stats
29 runtime.GC()
30
31 // Write heap profile
32 if err := pprof.WriteHeapProfile(f); err != nil {
33 panic(err)
34 }
35
36 fmt.Println("Memory profile written to mem.prof")
37 fmt.Println("\nAnalyze with:")
38 fmt.Println(" go tool pprof mem.prof")
39 fmt.Println("\nCommands in pprof:")
40 fmt.Println(" top - Show top memory consumers")
41 fmt.Println(" list - Show annotated source")
42 fmt.Println(" web - Generate visualization")
43}
HTTP Profiling Endpoint
1package main
2
3import (
4 "fmt"
5 "log"
6 "net/http"
7 _ "net/http/pprof" // Registers /debug/pprof/ handlers
8 "time"
9)
10
11func simulateWork() {
12 // Simulate some work with allocations
13 for {
14 data := make([]byte, 1024*1024) // 1MB
15 _ = data
16 time.Sleep(100 * time.Millisecond)
17 }
18}
19
20func main() {
21 // Start background work
22 go simulateWork()
23
24 // Start HTTP server with pprof
25 fmt.Println("Profiling server running on :6060")
26 fmt.Println("\nAccess profiles at:")
27 fmt.Println(" http://localhost:6060/debug/pprof/")
28 fmt.Println(" http://localhost:6060/debug/pprof/heap")
29 fmt.Println(" http://localhost:6060/debug/pprof/allocs")
30 fmt.Println("\nDownload and analyze:")
31 fmt.Println(" go tool pprof http://localhost:6060/debug/pprof/heap")
32 fmt.Println(" go tool pprof -alloc_objects http://localhost:6060/debug/pprof/allocs")
33
34 log.Fatal(http.ListenAndServe("localhost:6060", nil))
35}
Allocation Profiling
1package main
2
3import (
4 "fmt"
5 "runtime"
6)
7
8func measureAllocations(fn func()) {
9 var m1, m2 runtime.MemStats
10
11 // GC before measurement
12 runtime.GC()
13 runtime.ReadMemStats(&m1)
14
15 // Run function
16 fn()
17
18 // Measure after
19 runtime.ReadMemStats(&m2)
20
21 // Calculate allocations
22 allocations := m2.TotalAlloc - m1.TotalAlloc
23 numAllocs := m2.Mallocs - m1.Mallocs
24
25 fmt.Printf("Allocated: %d bytes in %d allocations\n",
26 allocations, numAllocs)
27 fmt.Printf("Average: %.2f bytes per allocation\n",
28 float64(allocations)/float64(numAllocs))
29}
30
31func testFunction() {
32 // Some allocation-heavy code
33 for i := 0; i < 1000; i++ {
34 _ = make([]int, 100)
35 }
36}
37
38func optimizedFunction() {
39 // Pre-allocate and reuse
40 buf := make([]int, 100)
41 for i := 0; i < 1000; i++ {
42 _ = buf[:0] // Reuse buffer
43 }
44}
45
46func main() {
47 fmt.Println("=== Test Function (heavy allocations) ===")
48 measureAllocations(testFunction)
49
50 fmt.Println("\n=== Optimized Function (reuses buffer) ===")
51 measureAllocations(optimizedFunction)
52}
GC Tuning Strategies
Understanding GC Behavior
1package main
2
3import (
4 "fmt"
5 "runtime"
6 "runtime/debug"
7 "time"
8)
9
10// run
11func monitorGC() {
12 // Get current GC stats
13 var stats debug.GCStats
14 debug.ReadGCStats(&stats)
15
16 fmt.Printf("=== GC Statistics ===\n")
17 fmt.Printf("Number of GCs: %d\n", stats.NumGC)
18 fmt.Printf("Total pause time: %v\n", stats.PauseTotal)
19
20 if len(stats.Pause) > 0 {
21 fmt.Printf("Last GC pause: %v\n", stats.Pause[0])
22
23 // Calculate average pause
24 var total time.Duration
25 for _, pause := range stats.Pause {
26 total += pause
27 }
28 avg := total / time.Duration(len(stats.Pause))
29 fmt.Printf("Average pause: %v\n", avg)
30 }
31}
32
33func trackGCCycles() {
34 var m runtime.MemStats
35
36 fmt.Println("\n=== Tracking GC Cycles ===")
37 for i := 0; i < 5; i++ {
38 // Allocate memory
39 _ = make([]byte, 10*1024*1024) // 10MB
40
41 runtime.ReadMemStats(&m)
42 lastPause := time.Duration(m.PauseNs[(m.NumGC+255)%256])
43
44 fmt.Printf("Iteration %d:\n", i)
45 fmt.Printf(" Heap: %d MB\n", m.HeapAlloc/1024/1024)
46 fmt.Printf(" GCs: %d\n", m.NumGC)
47 fmt.Printf(" Last Pause: %v\n", lastPause)
48
49 time.Sleep(100 * time.Millisecond)
50 }
51}
52
53func main() {
54 monitorGC()
55 trackGCCycles()
56
57 // Final stats
58 fmt.Println("\n=== Final Statistics ===")
59 monitorGC()
60}
GOGC Tuning
What's Happening: GOGC controls the GC's aggressiveness. It's a percentage: GOGC=100 means "trigger GC when heap grows 100% since last GC". Higher values = less frequent GC but more memory usage. Lower values = more frequent GC but lower memory footprint. This is a trade-off between CPU and memory overhead.
1package main
2
3import (
4 "fmt"
5 "runtime"
6 "runtime/debug"
7 "time"
8)
9
10func demonstrateGOGC() {
11 // GOGC default is 100
12
13 // Get current GOGC value
14 currentGOGC := debug.SetGCPercent(-1) // -1 = query current
15 debug.SetGCPercent(currentGOGC) // Restore
16
17 fmt.Printf("Current GOGC: %d%%\n", currentGOGC)
18
19 // Test with different GOGC values
20 testGOGC(50) // Aggressive
21 testGOGC(100) // Default
22 testGOGC(200) // Conservative
23}
24
25func testGOGC(gogc int) {
26 fmt.Printf("\n=== Testing GOGC=%d ===\n", gogc)
27
28 oldGOGC := debug.SetGCPercent(gogc)
29 defer debug.SetGCPercent(oldGOGC)
30
31 var m1, m2 runtime.MemStats
32 runtime.ReadMemStats(&m1)
33
34 // Allocate memory
35 start := time.Now()
36 for i := 0; i < 1000; i++ {
37 _ = make([]byte, 1024*1024) // 1MB each
38 }
39 elapsed := time.Since(start)
40
41 runtime.ReadMemStats(&m2)
42
43 fmt.Printf("Time: %v\n", elapsed)
44 fmt.Printf("GC runs: %d\n", m2.NumGC-m1.NumGC)
45 fmt.Printf("Heap peak: %d MB\n", m2.HeapAlloc/1024/1024)
46}
47
48func main() {
49 demonstrateGOGC()
50}
51
52// Set GOGC via environment variable:
53// GOGC=200 go run main.go
54
55// GOGC values:
56// - GOGC=100: GC when heap doubles (default)
57// - GOGC=200: GC when heap triples
58// - GOGC=50: GC when heap grows 50%
59// - GOGC=off: Disable GC (dangerous!)
Controlling GC Frequency
1package main
2
3import (
4 "fmt"
5 "runtime"
6 "runtime/debug"
7 "time"
8)
9
10func lowLatencyGC() {
11 fmt.Println("=== Low Latency Configuration ===")
12
13 // For low-latency applications: smaller target, more frequent GC
14 debug.SetGCPercent(50) // GC more frequently
15
16 // Increase GC parallelism if available
17 runtime.GOMAXPROCS(runtime.NumCPU())
18
19 fmt.Printf("GOGC: 50%% (more frequent GC)\n")
20 fmt.Printf("GOMAXPROCS: %d\n", runtime.GOMAXPROCS(0))
21}
22
23func highThroughputGC() {
24 fmt.Println("\n=== High Throughput Configuration ===")
25
26 // For batch processing: larger target, less frequent GC
27 debug.SetGCPercent(200) // GC less frequently
28
29 fmt.Printf("GOGC: 200%% (less frequent GC)\n")
30
31 // Optionally disable GC during critical sections
32 fmt.Println("\nDisabling GC for critical section...")
33 debug.SetGCPercent(-1) // Disable GC
34
35 // Do intensive work...
36 time.Sleep(100 * time.Millisecond)
37 fmt.Println("Critical section complete")
38
39 // Manually trigger GC when safe
40 runtime.GC()
41 fmt.Println("Manual GC triggered")
42
43 // Re-enable automatic GC
44 debug.SetGCPercent(100)
45 fmt.Println("Automatic GC re-enabled")
46}
47
48func memoryConstrainedGC() {
49 fmt.Println("\n=== Memory Constrained Configuration ===")
50
51 // For memory-constrained environments
52 debug.SetGCPercent(50) // More aggressive GC
53 debug.SetMemoryLimit(512 * 1024 * 1024) // 512MB limit
54
55 fmt.Printf("GOGC: 50%%\n")
56 fmt.Printf("Memory limit: 512 MB\n")
57
58 // Monitor and log memory usage
59 go func() {
60 ticker := time.NewTicker(2 * time.Second)
61 defer ticker.Stop()
62
63 for i := 0; i < 3; i++ {
64 <-ticker.C
65 var m runtime.MemStats
66 runtime.ReadMemStats(&m)
67 fmt.Printf(" [Monitor] Heap: %d MB, Sys: %d MB, NumGC: %d\n",
68 m.HeapAlloc/1024/1024,
69 m.Sys/1024/1024,
70 m.NumGC)
71 }
72 }()
73
74 time.Sleep(7 * time.Second)
75}
76
77func main() {
78 lowLatencyGC()
79 time.Sleep(1 * time.Second)
80
81 highThroughputGC()
82 time.Sleep(1 * time.Second)
83
84 memoryConstrainedGC()
85}
Zero-Allocation Techniques
Advanced techniques to eliminate heap allocations in hot paths.
String Building Without Allocations
Why This Works: In Go, strings are immutable and converting between []byte and string normally requires a copy. The unsafe conversions below bypass this copy by reinterpreting the memory pointer directly. This is extremely fast but dangerous—if the underlying byte slice is modified, the "immutable" string changes too. Use only when you control the lifecycle completely.
1package main
2
3import (
4 "fmt"
5 "strings"
6 "unsafe"
7)
8
9// UnsafeString converts byte slice to string without allocation
10// WARNING: Unsafe! Only use when you control the byte slice lifecycle
11func UnsafeString(b []byte) string {
12 return *(*string)(unsafe.Pointer(&b))
13}
14
15// UnsafeBytes converts string to byte slice without allocation
16// WARNING: Unsafe! Do not modify the returned slice
17func UnsafeBytes(s string) []byte {
18 return *(*[]byte)(unsafe.Pointer(
19 &struct {
20 string
21 Cap int
22 }{s, len(s)},
23 ))
24}
25
26// ZeroAllocConcat concatenates strings efficiently
27func ZeroAllocConcat(parts ...string) string {
28 // Calculate total length
29 n := 0
30 for _, p := range parts {
31 n += len(p)
32 }
33
34 // Single allocation for result
35 var b strings.Builder
36 b.Grow(n)
37
38 for _, p := range parts {
39 b.WriteString(p)
40 }
41
42 return b.String()
43}
44
45func main() {
46 // Safe zero-allocation concatenation
47 result := ZeroAllocConcat("Hello", " ", "World", "!")
48 fmt.Println("Concatenated:", result)
49
50 // Unsafe conversions (use with caution)
51 data := []byte("test data")
52 str := UnsafeString(data)
53 fmt.Println("Unsafe string:", str)
54
55 // WARNING: Modifying data will also modify str!
56 // data[0] = 'T' // This would change str too!
57}
Stack-Only Data Structures
1package main
2
3import "fmt"
4
5// run
6// StackArray uses array instead of slice to stay on stack
7type StackArray[T any] struct {
8 data [16]T
9 len int
10}
11
12func (s *StackArray[T]) Append(v T) bool {
13 if s.len >= len(s.data) {
14 return false // Full
15 }
16 s.data[s.len] = v
17 s.len++
18 return true
19}
20
21func (s *StackArray[T]) Get(i int) (T, bool) {
22 var zero T
23 if i < 0 || i >= s.len {
24 return zero, false
25 }
26 return s.data[i], true
27}
28
29func (s *StackArray[T]) Len() int {
30 return s.len
31}
32
33// Example: Processing without heap allocation
34func ProcessItems(items []int) int {
35 var result StackArray[int] // Stays on stack
36
37 for _, item := range items {
38 if item%2 == 0 {
39 result.Append(item * 2)
40 }
41 }
42
43 sum := 0
44 for i := 0; i < result.Len(); i++ {
45 if v, ok := result.Get(i); ok {
46 sum += v
47 }
48 }
49
50 return sum
51}
52
53func main() {
54 items := []int{1, 2, 3, 4, 5, 6, 7, 8}
55 sum := ProcessItems(items)
56 fmt.Printf("Sum of even items * 2: %d\n", sum)
57
58 // Demonstrate StackArray usage
59 var arr StackArray[string]
60 arr.Append("Hello")
61 arr.Append("World")
62 arr.Append("!")
63
64 fmt.Printf("StackArray length: %d\n", arr.Len())
65 for i := 0; i < arr.Len(); i++ {
66 if v, ok := arr.Get(i); ok {
67 fmt.Printf(" [%d]: %s\n", i, v)
68 }
69 }
70}
Interface-Free Code for Hot Paths
1package main
2
3import (
4 "fmt"
5 "time"
6)
7
8// run
9// Avoiding interface{} and type assertions in hot paths
10
11// Bad: Uses interface, causes allocations
12type BadProcessor struct{}
13
14func (p *BadProcessor) Process(data interface{}) interface{} {
15 // Type assertion causes allocation
16 if v, ok := data.(int); ok {
17 return v * 2
18 }
19 return nil
20}
21
22// Good: Type-specific, zero allocations
23type GoodProcessor struct{}
24
25func (p *GoodProcessor) ProcessInt(data int) int {
26 return data * 2 // No allocations
27}
28
29func (p *GoodProcessor) ProcessString(data string) string {
30 return data + data // Single allocation for result
31}
32
33// Benchmark comparison
34func BenchmarkBadProcessor() time.Duration {
35 p := &BadProcessor{}
36 start := time.Now()
37
38 for i := 0; i < 100000; i++ {
39 p.Process(i)
40 }
41
42 return time.Since(start)
43}
44
45func BenchmarkGoodProcessor() time.Duration {
46 p := &GoodProcessor{}
47 start := time.Now()
48
49 for i := 0; i < 100000; i++ {
50 p.ProcessInt(i)
51 }
52
53 return time.Since(start)
54}
55
56func main() {
57 badTime := BenchmarkBadProcessor()
58 goodTime := BenchmarkGoodProcessor()
59
60 fmt.Printf("Bad (interface): %v\n", badTime)
61 fmt.Printf("Good (typed): %v\n", goodTime)
62 fmt.Printf("Speedup: %.2fx\n", float64(badTime)/float64(goodTime))
63
64 fmt.Println("\nKey lesson: Avoid interface{} in hot paths!")
65}
Buffer Reuse Patterns
1package main
2
3import (
4 "bytes"
5 "fmt"
6 "sync"
7)
8
9// run
10// BufferPool for zero-allocation buffer reuse
11var bufferPool = sync.Pool{
12 New: func() interface{} {
13 return new(bytes.Buffer)
14 },
15}
16
17func GetBuffer() *bytes.Buffer {
18 return bufferPool.Get().(*bytes.Buffer)
19}
20
21func PutBuffer(buf *bytes.Buffer) {
22 buf.Reset()
23 bufferPool.Put(buf)
24}
25
26// Example: JSON encoding without allocations
27func EncodeJSON(data map[string]string) []byte {
28 buf := GetBuffer()
29 defer PutBuffer(buf)
30
31 buf.WriteByte('{')
32 first := true
33
34 for k, v := range data {
35 if !first {
36 buf.WriteByte(',')
37 }
38 first = false
39
40 buf.WriteByte('"')
41 buf.WriteString(k)
42 buf.WriteString(`":"`)
43 buf.WriteString(v)
44 buf.WriteByte('"')
45 }
46
47 buf.WriteByte('}')
48
49 // Copy result before returning buffer to pool
50 result := make([]byte, buf.Len())
51 copy(result, buf.Bytes())
52
53 return result
54}
55
56func main() {
57 data := map[string]string{
58 "name": "John",
59 "email": "john@example.com",
60 "city": "New York",
61 }
62
63 // Encode multiple times to demonstrate pooling
64 for i := 0; i < 3; i++ {
65 json := EncodeJSON(data)
66 fmt.Printf("Encoding %d: %s\n", i+1, string(json))
67 }
68
69 fmt.Println("\nBuffers were reused from pool!")
70}
Slice Tricks to Avoid Allocations
1package main
2
3import "fmt"
4
5// run
6// InPlaceFilter filters slice without allocations
7func InPlaceFilter(data []int, predicate func(int) bool) []int {
8 n := 0
9 for _, x := range data {
10 if predicate(x) {
11 data[n] = x
12 n++
13 }
14 }
15 return data[:n]
16}
17
18// InPlaceUnique removes duplicates without allocations
19func InPlaceUnique(data []int) []int {
20 if len(data) == 0 {
21 return data
22 }
23
24 j := 0
25 for i := 1; i < len(data); i++ {
26 if data[i] != data[j] {
27 j++
28 data[j] = data[i]
29 }
30 }
31
32 return data[:j+1]
33}
34
35// ReverseInPlace reverses slice without allocations
36func ReverseInPlace(data []int) {
37 for i := 0; i < len(data)/2; i++ {
38 j := len(data) - 1 - i
39 data[i], data[j] = data[j], data[i]
40 }
41}
42
43func main() {
44 // Filter even numbers in place
45 numbers := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
46 fmt.Printf("Original: %v\n", numbers)
47
48 filtered := InPlaceFilter(numbers, func(n int) bool { return n%2 == 0 })
49 fmt.Printf("Filtered (even): %v\n", filtered)
50
51 // Remove duplicates
52 sorted := []int{1, 1, 2, 2, 3, 3, 4, 5, 5}
53 fmt.Printf("\nWith duplicates: %v\n", sorted)
54
55 unique := InPlaceUnique(sorted)
56 fmt.Printf("Unique: %v\n", unique)
57
58 // Reverse in place
59 data := []int{1, 2, 3, 4, 5}
60 fmt.Printf("\nOriginal: %v\n", data)
61
62 ReverseInPlace(data)
63 fmt.Printf("Reversed: %v\n", data)
64
65 fmt.Println("\nAll operations were zero-allocation!")
66}
Compile-Time String Operations
1package main
2
3import "fmt"
4
5// run
6// Using constants for zero-runtime-cost string operations
7const (
8 Prefix = "user:"
9 Suffix = ":data"
10
11 // Concatenated at compile time
12 UserDataKey = Prefix + "id" + Suffix
13
14 // More complex compile-time strings
15 APIVersion = "v1"
16 APIBasePath = "/api/" + APIVersion
17 UsersPath = APIBasePath + "/users"
18 PostsPath = APIBasePath + "/posts"
19)
20
21// Code generation for repeated patterns
22//go:generate stringer -type=Status
23
24type Status int
25
26const (
27 StatusPending Status = iota
28 StatusActive
29 StatusComplete
30 StatusFailed
31)
32
33func main() {
34 // No allocation - resolved at compile time
35 fmt.Println("User data key:", UserDataKey)
36 fmt.Println("Users path:", UsersPath)
37 fmt.Println("Posts path:", PostsPath)
38
39 // Demonstrate that these are compile-time constants
40 fmt.Printf("\nAll paths computed at compile time!\n")
41
42 // Status example
43 status := StatusActive
44 fmt.Printf("\nCurrent status: %v\n", status)
45}
Summary
Memory optimization is like being a good housekeeper for your application's memory. The key is to be mindful of what you allocate, how long you keep it, and when you clean it up.
✅ Memory Optimization Best Practices:
- Use stack allocation whenever possible
- Pre-allocate slices and maps with known capacity
- Reuse buffers and objects with sync.Pool
- Avoid interface{} allocations in hot paths
- Profile memory usage to find optimization opportunities
- Design with garbage collection in mind
⚠️ Common Memory Mistakes:
- Creating unnecessary pointers
- Growing slices without pre-allocation
- String concatenation in loops
- Ignoring escape analysis warnings
- Forgetting to reset pooled objects before reuse
💡 Key Takeaway: The goal isn't to eliminate all allocations—it's to eliminate unnecessary allocations and manage the necessary ones efficiently. Focus on hot paths and large-scale operations where small optimizations compound.
Remember: A single optimization that saves 1MB per operation becomes 1GB saved at 1000 operations per second.
Further Reading
Official Documentation
- Go Memory Model - Official memory model specification
- Garbage Collector Design - GC implementation details
- Escape Analysis - Official escape analysis documentation
Books and Articles
- The Go Programming Language - Memory management chapters
- High Performance Go - Memory optimization patterns
- Go Proverbs - Memory-related proverbs
Tools
- pprof - Memory profiling
- go-torch - Flame graphs for memory
- go tool trace - Memory allocation tracing
Practice Exercises
Exercise 1: Escape Analysis Optimization
Learning Objective: Master escape analysis to eliminate unnecessary heap allocations and improve performance by keeping data on the stack.
Context: In high-performance systems, heap allocations trigger garbage collection pauses that can cause latency spikes. Companies like Discord reduced memory usage by 90% by optimizing escape patterns, transforming user experience from buffering issues to smooth real-time communication.
Difficulty: Intermediate | Time: 15-20 minutes
Identify and fix escape analysis issues in the following code to eliminate heap allocations and reduce GC pressure:
1package main
2
3import "fmt"
4
5type User struct {
6 ID int
7 Name string
8}
9
10func createUser(id int, name string) *User {
11 return &User{ID: id, Name: name}
12}
13
14func processUsers(count int) {
15 for i := 0; i < count; i++ {
16 user := createUser(i, fmt.Sprintf("User%d", i))
17 fmt.Printf("Processing: %s\n", user.Name)
18 }
19}
20
21func main() {
22 processUsers(1000)
23}
Task: Optimize to reduce heap allocations and keep data on the stack where possible.
Solution
1package main
2
3import (
4 "fmt"
5 "strconv"
6)
7
8type User struct {
9 ID int
10 Name string
11}
12
13// Optimization 1: Return value instead of pointer
14func createUser(id int, name string) User {
15 return User{ID: id, Name: name}
16}
17
18// Optimization 2: Avoid fmt.Sprintf
19func formatUserName(id int) string {
20 return "User" + strconv.Itoa(id)
21}
22
23// Optimization 3: Avoid fmt.Printf in hot path
24func processUsers(count int) {
25 for i := 0; i < count; i++ {
26 name := formatUserName(i)
27 user := createUser(i, name)
28 // Direct string concatenation instead of Printf
29 output := "Processing: " + user.Name + "\n"
30 fmt.Print(output)
31 }
32}
33
34func main() {
35 processUsers(1000)
36}
37
38// Further optimization: Reuse buffer
39func processUsersOptimized(count int) {
40 var user User
41 nameBuffer := make([]byte, 0, 64)
42
43 for i := 0; i < count; i++ {
44 // Reuse user struct
45 user.ID = i
46 nameBuffer = nameBuffer[:0]
47 nameBuffer = append(nameBuffer, []byte("User")...)
48 nameBuffer = strconv.AppendInt(nameBuffer, int64(i), 10)
49 user.Name = string(nameBuffer)
50
51 // Process user...
52 _ = user
53 }
54}
Exercise 2: Implement Efficient Object Pool
Learning Objective: Design and implement high-performance object pooling systems to eliminate allocation overhead in memory-intensive applications.
Context: Object pooling is critical for high-throughput systems where frequent allocations cause GC pressure. Redis and other database systems use sophisticated pooling strategies to handle millions of operations per second without memory fragmentation or GC pauses.
Difficulty: Advanced | Time: 25-30 minutes
Create an efficient object pool for []byte buffers that eliminates allocation overhead in hot paths:
- Pre-allocates buffers of various sizes
- Returns the appropriately sized buffer for each request
- Tracks pool efficiency with hit rate metrics
- Handles concurrent access safely
- Implements automatic cleanup for unused buffers
Solution
1package main
2
3import (
4 "fmt"
5 "sync"
6 "sync/atomic"
7)
8
9type BufferPool struct {
10 pools []*sync.Pool
11 sizes []int
12 hits uint64
13 misses uint64
14}
15
16func NewBufferPool(sizes []int) *BufferPool {
17 bp := &BufferPool{
18 pools: make([]*sync.Pool, len(sizes)),
19 sizes: sizes,
20 }
21
22 for i, size := range sizes {
23 sz := size // Capture for closure
24 bp.pools[i] = &sync.Pool{
25 New: func() interface{} {
26 atomic.AddUint64(&bp.misses, 1)
27 return make([]byte, 0, sz)
28 },
29 }
30 }
31
32 return bp
33}
34
35func (bp *BufferPool) Get(size int) []byte {
36 // Find appropriate pool
37 for i, poolSize := range bp.sizes {
38 if size <= poolSize {
39 buf := bp.pools[i].Get().([]byte)
40 atomic.AddUint64(&bp.hits, 1)
41 return buf[:0] // Reset length but keep capacity
42 }
43 }
44
45 // Size too large, allocate directly
46 atomic.AddUint64(&bp.misses, 1)
47 return make([]byte, 0, size)
48}
49
50func (bp *BufferPool) Put(buf []byte) {
51 capacity := cap(buf)
52
53 // Find appropriate pool
54 for i, poolSize := range bp.sizes {
55 if capacity == poolSize {
56 bp.pools[i].Put(buf)
57 return
58 }
59 }
60
61 // Don't pool buffers that don't match our sizes
62}
63
64func (bp *BufferPool) Stats() (hits, misses uint64, hitRate float64) {
65 hits = atomic.LoadUint64(&bp.hits)
66 misses = atomic.LoadUint64(&bp.misses)
67 total := hits + misses
68 if total > 0 {
69 hitRate = float64(hits) / float64(total)
70 }
71 return
72}
73
74func main() {
75 // Create pool with different buffer sizes
76 pool := NewBufferPool([]int{1024, 4096, 16384})
77
78 // Simulate usage
79 for i := 0; i < 1000; i++ {
80 size := 1024
81 if i%3 == 0 {
82 size = 4096
83 }
84
85 buf := pool.Get(size)
86 // Use buffer...
87 buf = append(buf, []byte("data")...)
88 // Return to pool
89 pool.Put(buf)
90 }
91
92 // Print stats
93 hits, misses, hitRate := pool.Stats()
94 fmt.Printf("Hits: %d, Misses: %d, Hit Rate: %.2f%%\n",
95 hits, misses, hitRate*100)
96}
Exercise 3: Memory-Efficient String Processing
Learning Objective: Master streaming data processing techniques to handle large datasets efficiently without loading entire files into memory.
Context: Processing large files efficiently is crucial for log analysis, data processing pipelines, and ETL systems. Companies like Netflix process terabytes of log data daily using memory-efficient streaming techniques that allow processing files larger than available RAM.
Difficulty: Intermediate | Time: 20-25 minutes
Process a large file line by line without loading the entire file into memory while counting word frequencies efficiently:
- Read files using buffered streaming to avoid loading entire content
- Implement memory-efficient word frequency counting
- Reuse buffers and minimize string allocations
- Handle files larger than available RAM gracefully
- Track memory usage and processing performance
Solution
1package main
2
3import (
4 "bufio"
5 "bytes"
6 "fmt"
7 "os"
8 "strings"
9 "sync"
10)
11
12type WordCounter struct {
13 counts map[string]int
14 pool *sync.Pool
15}
16
17func NewWordCounter() *WordCounter {
18 return &WordCounter{
19 counts: make(map[string]int, 10000), // Pre-size
20 pool: &sync.Pool{
21 New: func() interface{} {
22 return make([]string, 0, 100)
23 },
24 },
25 }
26}
27
28func (wc *WordCounter) processLine(line string) {
29 // Get word buffer from pool
30 words := wc.pool.Get().([]string)
31 words = words[:0] // Clear but keep capacity
32
33 // Split line into words
34 start := 0
35 for i, r := range line {
36 if r == ' ' || r == '\t' || r == '\n' {
37 if i > start {
38 word := strings.ToLower(line[start:i])
39 words = append(words, word)
40 }
41 start = i + 1
42 }
43 }
44 // Last word
45 if start < len(line) {
46 word := strings.ToLower(line[start:])
47 words = append(words, word)
48 }
49
50 // Count words
51 for _, word := range words {
52 wc.counts[word]++
53 }
54
55 // Return buffer to pool
56 wc.pool.Put(words)
57}
58
59func processFile(filename string) error {
60 file, err := os.Open(filename)
61 if err != nil {
62 return err
63 }
64 defer file.Close()
65
66 counter := NewWordCounter()
67
68 // Use buffered scanner for line-by-line reading
69 scanner := bufio.NewScanner(file)
70
71 // Increase buffer size for long lines
72 buf := make([]byte, 0, 1024*1024) // 1MB buffer
73 scanner.Buffer(buf, 10*1024*1024) // 10MB max
74
75 lineCount := 0
76 for scanner.Scan() {
77 line := scanner.Text()
78 counter.processLine(line)
79 lineCount++
80
81 // Progress indicator
82 if lineCount%100000 == 0 {
83 fmt.Printf("Processed %d lines\n", lineCount)
84 }
85 }
86
87 if err := scanner.Err(); err != nil {
88 return err
89 }
90
91 // Print top 10 words
92 fmt.Printf("\nProcessed %d lines total\n", lineCount)
93 fmt.Printf("Unique words: %d\n", len(counter.counts))
94
95 return nil
96}
97
98func main() {
99 // Create test file
100 testFile := "test_large_file.txt"
101 createTestFile(testFile)
102 defer os.Remove(testFile)
103
104 if err := processFile(testFile); err != nil {
105 fmt.Fprintf(os.Stderr, "Error: %v\n", err)
106 os.Exit(1)
107 }
108}
109
110func createTestFile(filename string) {
111 f, _ := os.Create(filename)
112 defer f.Close()
113
114 writer := bufio.NewWriter(f)
115 for i := 0; i < 10000; i++ {
116 fmt.Fprintf(writer, "This is line %d with some test words and data\n", i)
117 }
118 writer.Flush()
119}
Exercise 4: GC-Friendly Data Structure
Learning Objective: Design memory-efficient data structures that minimize garbage collection pressure through careful memory layout and reuse patterns.
Context: High-frequency trading systems and real-time data processing pipelines require data structures that don't trigger frequent garbage collections. Trading firms like Citadel use custom ring buffers to process millions of market data updates per second with microsecond latency.
Difficulty: Advanced | Time: 30-35 minutes
Design a ring buffer that minimizes GC pressure by reusing memory and avoiding pointer overhead in performance-critical applications:
- Implement a fixed-size ring buffer with pre-allocated memory
- Use byte arrays instead of pointers to eliminate heap allocations
- Provide thread-safe operations for concurrent access
- Support efficient bulk operations to minimize per-item overhead
- Include metrics to track allocation patterns and GC impact
Solution
1package main
2
3import (
4 "fmt"
5 "sync"
6)
7
8// GC-friendly ring buffer with no pointers in elements
9type RingBuffer struct {
10 buffer []byte // Single backing array
11 stride int // Size of each element
12 head int // Write position
13 tail int // Read position
14 count int // Number of elements
15 cap int // Capacity
16 mu sync.Mutex
17}
18
19func NewRingBuffer(capacity, elementSize int) *RingBuffer {
20 return &RingBuffer{
21 buffer: make([]byte, capacity*elementSize),
22 stride: elementSize,
23 cap: capacity,
24 }
25}
26
27func (rb *RingBuffer) Write(data []byte) bool {
28 rb.mu.Lock()
29 defer rb.mu.Unlock()
30
31 if rb.count >= rb.cap {
32 return false // Buffer full
33 }
34
35 if len(data) != rb.stride {
36 return false // Invalid size
37 }
38
39 // Copy data into buffer
40 offset := rb.head * rb.stride
41 copy(rb.buffer[offset:offset+rb.stride], data)
42
43 rb.head = (rb.head + 1) % rb.cap
44 rb.count++
45
46 return true
47}
48
49func (rb *RingBuffer) Read(data []byte) bool {
50 rb.mu.Lock()
51 defer rb.mu.Unlock()
52
53 if rb.count == 0 {
54 return false // Buffer empty
55 }
56
57 if len(data) != rb.stride {
58 return false // Invalid size
59 }
60
61 // Copy data from buffer
62 offset := rb.tail * rb.stride
63 copy(data, rb.buffer[offset:offset+rb.stride])
64
65 rb.tail = (rb.tail + 1) % rb.cap
66 rb.count--
67
68 return true
69}
70
71func (rb *RingBuffer) Len() int {
72 rb.mu.Lock()
73 defer rb.mu.Unlock()
74 return rb.count
75}
76
77func main() {
78 // Create ring buffer for 1000 64-byte elements
79 rb := NewRingBuffer(1000, 64)
80
81 // Write data
82 data := make([]byte, 64)
83 for i := 0; i < 100; i++ {
84 copy(data, fmt.Sprintf("Message %d", i))
85 if !rb.Write(data) {
86 fmt.Println("Buffer full!")
87 break
88 }
89 }
90
91 fmt.Printf("Written 100 messages, buffer length: %d\n", rb.Len())
92
93 // Read data
94 readBuf := make([]byte, 64)
95 count := 0
96 for rb.Len() > 0 && count < 5 {
97 if rb.Read(readBuf) {
98 fmt.Printf("Read: %s\n", string(readBuf[:20]))
99 count++
100 }
101 }
102
103 fmt.Printf("\nRemaining in buffer: %d\n", rb.Len())
104}
Exercise 5: Memory-Aware Cache
Learning Objective: Build intelligent caching systems that automatically manage memory usage through pressure-aware eviction strategies.
Context: Memory-aware caching is essential for microservices and distributed systems running in resource-constrained environments. Cloud platforms like AWS Lambda have strict memory limits, and efficient cache management can be the difference between successful function execution and out-of-memory errors.
Difficulty: Advanced | Time: 25-30 minutes
Implement a sophisticated cache that automatically evicts entries when memory usage exceeds configurable thresholds:
- Monitor memory usage in real-time using runtime stats
- Implement multiple eviction strategies (LRU, size-based)
- Provide memory pressure detection and automatic cleanup
- Support configurable memory limits with safety margins
- Include metrics for cache hit rates and memory efficiency
- Handle concurrent access safely with minimal locking overhead
Solution
1package main
2
3import (
4 "fmt"
5 "runtime"
6 "sync"
7 "time"
8)
9
10type CacheEntry struct {
11 key string
12 value []byte
13 size int
14 timestamp time.Time
15}
16
17type MemoryAwareCache struct {
18 entries map[string]*CacheEntry
19 maxBytes int
20 currentBytes int
21 mu sync.RWMutex
22 evictionCount int
23 hitCount int
24 missCount int
25}
26
27func NewMemoryAwareCache(maxBytes int) *MemoryAwareCache {
28 cache := &MemoryAwareCache{
29 entries: make(map[string]*CacheEntry),
30 maxBytes: maxBytes,
31 }
32
33 // Start background memory monitor
34 go cache.monitorMemory()
35
36 return cache
37}
38
39func (c *MemoryAwareCache) Set(key string, value []byte) {
40 c.mu.Lock()
41 defer c.mu.Unlock()
42
43 size := len(key) + len(value)
44
45 // Remove old entry if exists
46 if old, exists := c.entries[key]; exists {
47 c.currentBytes -= old.size
48 }
49
50 // Evict entries if needed
51 for c.currentBytes+size > c.maxBytes && len(c.entries) > 0 {
52 c.evictOldest()
53 }
54
55 // Add new entry
56 c.entries[key] = &CacheEntry{
57 key: key,
58 value: value,
59 size: size,
60 timestamp: time.Now(),
61 }
62 c.currentBytes += size
63}
64
65func (c *MemoryAwareCache) Get(key string) ([]byte, bool) {
66 c.mu.RLock()
67 defer c.mu.RUnlock()
68
69 if entry, exists := c.entries[key]; exists {
70 // Update timestamp for LRU
71 entry.timestamp = time.Now()
72 c.hitCount++
73 return entry.value, true
74 }
75
76 c.missCount++
77 return nil, false
78}
79
80func (c *MemoryAwareCache) evictOldest() {
81 var oldest *CacheEntry
82 for _, entry := range c.entries {
83 if oldest == nil || entry.timestamp.Before(oldest.timestamp) {
84 oldest = entry
85 }
86 }
87
88 if oldest != nil {
89 delete(c.entries, oldest.key)
90 c.currentBytes -= oldest.size
91 c.evictionCount++
92 }
93}
94
95func (c *MemoryAwareCache) monitorMemory() {
96 ticker := time.NewTicker(5 * time.Second)
97 defer ticker.Stop()
98
99 for range ticker.C {
100 var m runtime.MemStats
101 runtime.ReadMemStats(&m)
102
103 c.mu.RLock()
104 usage := c.currentBytes
105 count := len(c.entries)
106 evictions := c.evictionCount
107 hits := c.hitCount
108 misses := c.missCount
109 c.mu.RUnlock()
110
111 hitRate := 0.0
112 if hits+misses > 0 {
113 hitRate = float64(hits) / float64(hits+misses) * 100
114 }
115
116 fmt.Printf("[Cache] entries=%d, bytes=%d, evictions=%d, hit-rate=%.1f%%, heap=%dMB\n",
117 count, usage, evictions, hitRate, m.HeapAlloc/1024/1024)
118 }
119}
120
121func main() {
122 // Create cache with 1MB limit
123 cache := NewMemoryAwareCache(1 * 1024 * 1024)
124
125 // Add entries
126 fmt.Println("Adding entries to cache...")
127 for i := 0; i < 200; i++ {
128 key := fmt.Sprintf("key%d", i)
129 value := make([]byte, 10*1024) // 10KB each
130 cache.Set(key, value)
131 }
132
133 fmt.Println("\nRetrieving entries...")
134 // Retrieve some entries
135 for i := 0; i < 50; i++ {
136 key := fmt.Sprintf("key%d", i)
137 if value, ok := cache.Get(key); ok {
138 fmt.Printf("Found %s: %d bytes\n", key, len(value))
139 } else {
140 fmt.Printf("Not found: %s (evicted)\n", key)
141 }
142 }
143
144 // Keep running to see monitoring output
145 fmt.Println("\nMonitoring cache (will run for 20 seconds)...")
146 time.Sleep(20 * time.Second)
147}