Exercise: Memory Allocator
Difficulty - Advanced
Learning Objectives
- Implement custom memory allocation strategies
- Manage memory pools and arenas
- Handle memory alignment requirements
- Implement slab allocation
- Track memory usage and fragmentation
- Understand unsafe operations
Problem Statement
Create a custom memory allocator with arena-based allocation and memory pooling for high-performance applications.
Core Components
1package allocator
2
3import (
4 "unsafe"
5)
6
7type Arena struct {
8 buf []byte
9 offset uintptr
10 size uintptr
11}
12
13type Pool struct {
14 blockSize int
15 blocks []unsafe.Pointer
16 arena *Arena
17}
18
19func NewArena(size int) *Arena
20func Alloc(size int) unsafe.Pointer
21func Reset()
22func NewPool(blockSize int, capacity int) *Pool
23func Get() unsafe.Pointer
24func Put(ptr unsafe.Pointer)
Solution
Click to see the solution
Algorithm Overview
This implementation provides three memory allocation strategies optimized for different use cases:
1. Arena Allocator:
- Linear allocation from pre-allocated buffer
- Fastest possible allocation: increment offset pointer
- No individual deallocation - reset entire arena
- Best for: Request-scoped allocations, parsers, temporary computations
- Trade-off: Cannot free individual objects
2. Pool Allocator:
- Pre-allocates fixed-size blocks from arena
- Maintains free list for reuse
- O(1) allocation and deallocation
- Best for: Objects of same size
- Trade-off: Fixed block size, potential fragmentation
3. Slab Allocator:
- Multiple pools for different object sizes
- Round up to next power of 2
- Reduces external fragmentation
- Best for: Mixed object sizes with predictable patterns
- Trade-off: Internal fragmentation from rounding
Memory Alignment:
- All allocations aligned to 8-byte boundaries
- Ensures optimal CPU access patterns
- Required for atomic operations and SIMD
- Uses bitwise operations for fast alignment
Key Design Principles:
- Unsafe operations for direct memory manipulation
- Mutex synchronization for thread safety
- Zero allocation overhead after initialization
- Explicit memory lifetime management
Time Complexity Analysis
| Operation | Arena | Pool | Slab | Explanation |
|---|---|---|---|---|
| Alloc | O(1) | O(1) | O(1) | Pointer increment or pop from free list |
| Free | N/A | O(1) | O(1) | Push to free list |
| Reset | O(1) | O(1) | O(n) | Reset offset or rebuild pools |
| Alignment | O(1) | O(1) | O(1) | Bitwise AND operation |
Space Complexity
Arena:
- O(n) where n = arena size
- Zero overhead per allocation
- Total waste: alignment padding only
Pool:
- O(n × m) where n = capacity, m = block size
- Per-block overhead: 8 bytes
- Waste: Unused capacity in pre-allocated blocks
Slab:
- O(k × n × m) where k = number of size classes
- Overhead: Multiple pools, each with own waste
- Typical waste: 25-50% due to power-of-2 rounding
Example Memory Usage:
- Go heap: ~50KB + GC overhead
- Arena: ~10KB
- Pool: ~12KB
- Slab: ~15KB
Implementation
1package allocator
2
3import (
4 "sync"
5 "unsafe"
6)
7
8const (
9 defaultAlignment = 8
10)
11
12// Arena provides bump-pointer allocation
13type Arena struct {
14 buf []byte
15 offset uintptr
16 size uintptr
17 mu sync.Mutex
18}
19
20func NewArena(size int) *Arena {
21 return &Arena{
22 buf: make([]byte, size),
23 size: uintptr(size),
24 }
25}
26
27func Alloc(size int) unsafe.Pointer {
28 a.mu.Lock()
29 defer a.mu.Unlock()
30
31 aligned := align(uintptr(size), defaultAlignment)
32 if a.offset+aligned > a.size {
33 return nil // Out of memory
34 }
35
36 ptr := unsafe.Pointer(&a.buf[a.offset])
37 a.offset += aligned
38 return ptr
39}
40
41func Reset() {
42 a.mu.Lock()
43 defer a.mu.Unlock()
44 a.offset = 0
45}
46
47func Usage() {
48 a.mu.Lock()
49 defer a.mu.Unlock()
50 return a.offset, a.size
51}
52
53// Pool provides object pooling with fixed-size blocks
54type Pool struct {
55 blockSize int
56 free []unsafe.Pointer
57 arena *Arena
58 mu sync.Mutex
59}
60
61func NewPool(blockSize int, capacity int) *Pool {
62 arena := NewArena(blockSize * capacity)
63 p := &Pool{
64 blockSize: blockSize,
65 free: make([]unsafe.Pointer, 0, capacity),
66 arena: arena,
67 }
68
69 // Pre-allocate blocks
70 for i := 0; i < capacity; i++ {
71 ptr := arena.Alloc(blockSize)
72 if ptr != nil {
73 p.free = append(p.free, ptr)
74 }
75 }
76
77 return p
78}
79
80func Get() unsafe.Pointer {
81 p.mu.Lock()
82 defer p.mu.Unlock()
83
84 if len(p.free) == 0 {
85 // Allocate new block
86 return p.arena.Alloc(p.blockSize)
87 }
88
89 ptr := p.free[len(p.free)-1]
90 p.free = p.free[:len(p.free)-1]
91 return ptr
92}
93
94func Put(ptr unsafe.Pointer) {
95 p.mu.Lock()
96 defer p.mu.Unlock()
97 p.free = append(p.free, ptr)
98}
99
100// Slab allocator for multiple object sizes
101type SlabAllocator struct {
102 slabs map[int]*Pool
103 mu sync.RWMutex
104}
105
106func NewSlabAllocator() *SlabAllocator {
107 return &SlabAllocator{
108 slabs: make(map[int]*Pool),
109 }
110}
111
112func Alloc(size int) unsafe.Pointer {
113 slabSize := nextPowerOf2(size)
114
115 s.mu.RLock()
116 pool, exists := s.slabs[slabSize]
117 s.mu.RUnlock()
118
119 if !exists {
120 s.mu.Lock()
121 pool, exists = s.slabs[slabSize]
122 if !exists {
123 pool = NewPool(slabSize, 1024)
124 s.slabs[slabSize] = pool
125 }
126 s.mu.Unlock()
127 }
128
129 return pool.Get()
130}
131
132func Free(ptr unsafe.Pointer, size int) {
133 slabSize := nextPowerOf2(size)
134
135 s.mu.RLock()
136 pool, exists := s.slabs[slabSize]
137 s.mu.RUnlock()
138
139 if exists {
140 pool.Put(ptr)
141 }
142}
143
144func align(size, alignment uintptr) uintptr {
145 return &^
146}
147
148func nextPowerOf2(n int) int {
149 if n <= 0 {
150 return 1
151 }
152 n--
153 n |= n >> 1
154 n |= n >> 2
155 n |= n >> 4
156 n |= n >> 8
157 n |= n >> 16
158 n++
159 return n
160}
161
162// Helper to convert between unsafe.Pointer and []byte
163func PointerToBytes(ptr unsafe.Pointer, size int) []byte {
164 return unsafe.Slice((*byte)(ptr), size)
165}
166
167func BytesToPointer(b []byte) unsafe.Pointer {
168 return unsafe.Pointer(&b[0])
169}
Usage Example
1package main
2
3import (
4 "fmt"
5 "unsafe"
6)
7
8type MyStruct struct {
9 ID int
10 Name [64]byte
11}
12
13func main() {
14 // Arena allocation
15 arena := allocator.NewArena(1024 * 1024) // 1MB
16
17 ptr := arena.Alloc(100)
18 if ptr != nil {
19 slice := allocator.PointerToBytes(ptr, 100)
20 copy(slice, []byte("Hello, Arena!"))
21 fmt.Println(string(slice[:13]))
22 }
23
24 arena.Reset() // Free all allocations
25
26 // Pool allocation
27 structSize := int(unsafe.Sizeof(MyStruct{}))
28 pool := allocator.NewPool(structSize, 100)
29
30 ptr = pool.Get()
31 if ptr != nil {
32 obj :=(ptr)
33 obj.ID = 42
34 copy(obj.Name[:], "Pooled Object")
35 fmt.Printf("ID: %d, Name: %s\n", obj.ID, string(obj.Name[:13]))
36
37 pool.Put(ptr) // Return to pool
38 }
39
40 // Slab allocation
41 slab := allocator.NewSlabAllocator()
42
43 ptr1 := slab.Alloc(32)
44 ptr2 := slab.Alloc(128)
45 ptr3 := slab.Alloc(32)
46
47 slab.Free(ptr1, 32)
48 slab.Free(ptr2, 128)
49 slab.Free(ptr3, 32)
50}
Benchmarking Code
1package allocator_test
2
3import (
4 "sync"
5 "testing"
6 "unsafe"
7)
8
9type TestStruct struct {
10 ID int
11 Data [64]byte
12}
13
14// Benchmark standard Go allocation
15func BenchmarkStandardAllocation(b *testing.B) {
16 b.ResetTimer()
17 for i := 0; i < b.N; i++ {
18 obj := &TestStruct{ID: i}
19 _ = obj
20 }
21}
22
23// Benchmark arena allocation
24func BenchmarkArenaAllocation(b *testing.B) {
25 arena := allocator.NewArena(1024 * 1024) // 1MB
26 size := int(unsafe.Sizeof(TestStruct{}))
27
28 b.ResetTimer()
29 for i := 0; i < b.N; i++ {
30 ptr := arena.Alloc(size)
31 if ptr == nil {
32 arena.Reset()
33 ptr = arena.Alloc(size)
34 }
35 obj :=(ptr)
36 obj.ID = i
37 }
38}
39
40// Benchmark pool allocation
41func BenchmarkPoolAllocation(b *testing.B) {
42 size := int(unsafe.Sizeof(TestStruct{}))
43 pool := allocator.NewPool(size, 1000)
44
45 b.ResetTimer()
46 for i := 0; i < b.N; i++ {
47 ptr := pool.Get()
48 obj :=(ptr)
49 obj.ID = i
50 pool.Put(ptr)
51 }
52}
53
54// Benchmark slab allocation
55func BenchmarkSlabAllocation(b *testing.B) {
56 slab := allocator.NewSlabAllocator()
57 size := int(unsafe.Sizeof(TestStruct{}))
58
59 b.ResetTimer()
60 for i := 0; i < b.N; i++ {
61 ptr := slab.Alloc(size)
62 obj :=(ptr)
63 obj.ID = i
64 slab.Free(ptr, size)
65 }
66}
67
68// Benchmark concurrent arena allocation
69func BenchmarkConcurrentArena(b *testing.B) {
70 arena := allocator.NewArena(10 * 1024 * 1024) // 10MB
71 size := int(unsafe.Sizeof(TestStruct{}))
72
73 b.ResetTimer()
74 b.RunParallel(func(pb *testing.PB) {
75 for pb.Next() {
76 ptr := arena.Alloc(size)
77 if ptr != nil {
78 obj :=(ptr)
79 obj.ID = 1
80 }
81 }
82 })
83}
84
85// Benchmark concurrent pool allocation
86func BenchmarkConcurrentPool(b *testing.B) {
87 size := int(unsafe.Sizeof(TestStruct{}))
88 pool := allocator.NewPool(size, 10000)
89
90 b.ResetTimer()
91 b.RunParallel(func(pb *testing.PB) {
92 for pb.Next() {
93 ptr := pool.Get()
94 if ptr != nil {
95 obj :=(ptr)
96 obj.ID = 1
97 pool.Put(ptr)
98 }
99 }
100 })
101}
102
103// Benchmark allocation patterns
104func BenchmarkMixedSizes(b *testing.B) {
105 slab := allocator.NewSlabAllocator()
106 sizes := []int{16, 32, 64, 128, 256}
107
108 b.ResetTimer()
109 for i := 0; i < b.N; i++ {
110 size := sizes[i%len(sizes)]
111 ptr := slab.Alloc(size)
112 slab.Free(ptr, size)
113 }
114}
115
116// Benchmark memory fragmentation
117func BenchmarkFragmentation(b *testing.B) {
118 slab := allocator.NewSlabAllocator()
119 var ptrs []unsafe.Pointer
120
121 b.ResetTimer()
122 for i := 0; i < b.N; i++ {
123 // Allocate various sizes
124 ptr1 := slab.Alloc(32)
125 ptr2 := slab.Alloc(128)
126 ptr3 := slab.Alloc(64)
127
128 ptrs = append(ptrs, ptr1, ptr2, ptr3)
129
130 // Free every other allocation
131 if len(ptrs) > 6 {
132 slab.Free(ptrs[0], 32)
133 slab.Free(ptrs[2], 64)
134 ptrs = ptrs[3:]
135 }
136 }
137}
138
139// Benchmark vs sync.Pool
140func BenchmarkSyncPool(b *testing.B) {
141 pool := &sync.Pool{
142 New: func() interface{} {
143 return &TestStruct{}
144 },
145 }
146
147 b.ResetTimer()
148 b.RunParallel(func(pb *testing.PB) {
149 for pb.Next() {
150 obj := pool.Get().(*TestStruct)
151 obj.ID = 1
152 pool.Put(obj)
153 }
154 })
155}
156
157// Benchmark arena reset
158func BenchmarkArenaReset(b *testing.B) {
159 arena := allocator.NewArena(1024 * 1024)
160 size := 100
161
162 b.ResetTimer()
163 for i := 0; i < b.N; i++ {
164 // Allocate until full
165 for j := 0; j < 1000; j++ {
166 arena.Alloc(size)
167 }
168 // Reset
169 arena.Reset()
170 }
171}
172
173// Example benchmark results:
174// BenchmarkStandardAllocation-8 50000000 28.5 ns/op 80 B/op 1 allocs/op
175// BenchmarkArenaAllocation-8 100000000 12.3 ns/op 0 B/op 0 allocs/op
176// BenchmarkPoolAllocation-8 200000000 7.8 ns/op 0 B/op 0 allocs/op
177// BenchmarkSlabAllocation-8 150000000 10.2 ns/op 0 B/op 0 allocs/op
178// BenchmarkConcurrentArena-8 80000000 15.5 ns/op 0 B/op 0 allocs/op
179// BenchmarkConcurrentPool-8 150000000 9.1 ns/op 0 B/op 0 allocs/op
180// BenchmarkMixedSizes-8 100000000 11.8 ns/op 0 B/op 0 allocs/op
181// BenchmarkSyncPool-8 200000000 8.5 ns/op 0 B/op 0 allocs/op
182// BenchmarkArenaReset-8 500000 3200.0 ns/op 0 B/op 0 allocs/op
Production Considerations
1. Thread-Safe Arena with Sharding:
1type ShardedArena struct {
2 shards []*Arena
3 shardMask uint32
4}
5
6func NewShardedArena(size int, numShards int) *ShardedArena {
7 shards := make([]*Arena, numShards)
8 for i := 0; i < numShards; i++ {
9 shards[i] = NewArena(size / numShards)
10 }
11
12 return &ShardedArena{
13 shards: shards,
14 shardMask: uint32(numShards - 1), // Must be power of 2
15 }
16}
17
18func Alloc(size int) unsafe.Pointer {
19 // Use goroutine ID or random for shard selection
20 shardID := getGoroutineID() & s.shardMask
21 return s.shards[shardID].Alloc(size)
22}
2. Memory Pressure Monitoring:
1type MonitoredAllocator struct {
2 arena *Arena
3 allocated atomic.Int64
4 peak atomic.Int64
5 threshold int64
6 onPressure func()
7}
8
9func Alloc(size int) unsafe.Pointer {
10 current := m.allocated.Add(int64(size))
11
12 // Update peak
13 for {
14 peak := m.peak.Load()
15 if current <= peak || m.peak.CompareAndSwap(peak, current) {
16 break
17 }
18 }
19
20 // Check memory pressure
21 if current > m.threshold && m.onPressure != nil {
22 m.onPressure()
23 }
24
25 return m.arena.Alloc(size)
26}
27
28func Reset() {
29 m.arena.Reset()
30 m.allocated.Store(0)
31}
3. Buddy Allocator for Variable Sizes:
1type BuddyAllocator struct {
2 memory []byte
3 freeLists [32][]int // One per power of 2
4 minOrder int // Minimum block size = 2^minOrder
5 mu sync.Mutex
6}
7
8func Alloc(size int) unsafe.Pointer {
9 b.mu.Lock()
10 defer b.mu.Unlock()
11
12 // Round up to power of 2
13 order := b.orderFor(size)
14
15 // Find smallest available block
16 for o := order; o < len(b.freeLists); o++ {
17 if len(b.freeLists[o]) > 0 {
18 // Pop block
19 idx := b.freeLists[o][len(b.freeLists[o])-1]
20 b.freeLists[o] = b.freeLists[o][:len(b.freeLists[o])-1]
21
22 // Split if necessary
23 for o > order {
24 o--
25 buddyIdx := idx ^
26 b.freeLists[o] = append(b.freeLists[o], buddyIdx)
27 }
28
29 offset := idx << b.minOrder
30 return unsafe.Pointer(&b.memory[offset])
31 }
32 }
33
34 return nil
35}
36
37func Free(ptr unsafe.Pointer, size int) {
38 b.mu.Lock()
39 defer b.mu.Unlock()
40
41 order := b.orderFor(size)
42 offset := uintptr(ptr) - uintptr(unsafe.Pointer(&b.memory[0]))
43 idx := int(offset) >> b.minOrder
44
45 // Coalesce with buddy if free
46 for order < len(b.freeLists)-1 {
47 buddyIdx := idx ^
48
49 // Find and remove buddy from free list
50 found := -1
51 for i, freeIdx := range b.freeLists[order] {
52 if freeIdx == buddyIdx {
53 found = i
54 break
55 }
56 }
57
58 if found == -1 {
59 break
60 }
61
62 // Remove buddy and merge
63 b.freeLists[order] = append(
64 b.freeLists[order][:found],
65 b.freeLists[order][found+1:]...,
66 )
67
68 idx = min(idx, buddyIdx)
69 order++
70 }
71
72 // Add to free list
73 b.freeLists[order] = append(b.freeLists[order], idx)
74}
4. Integration with CGO:
1// #include <stdlib.h>
2import "C"
3
4type CAllocator struct {
5 allocations map[unsafe.Pointer]C.size_t
6 mu sync.Mutex
7}
8
9func Alloc(size int) unsafe.Pointer {
10 ptr := C.malloc(C.size_t(size))
11
12 c.mu.Lock()
13 c.allocations[ptr] = C.size_t(size)
14 c.mu.Unlock()
15
16 return ptr
17}
18
19func Free(ptr unsafe.Pointer) {
20 c.mu.Lock()
21 delete(c.allocations, ptr)
22 c.mu.Unlock()
23
24 C.free(ptr)
25}
5. NUMA-Aware Allocation:
1type NUMAAllocator struct {
2 nodes []*Arena
3}
4
5func NewNUMAAllocator(size int) *NUMAAllocator {
6 numNodes := runtime.NumCPU() / 2 // Approximate NUMA nodes
7
8 nodes := make([]*Arena, numNodes)
9 for i := 0; i < numNodes; i++ {
10 nodes[i] = NewArena(size / numNodes)
11 }
12
13 return &NUMAAllocator{nodes: nodes}
14}
15
16func Alloc(size int) unsafe.Pointer {
17 // Allocate from local NUMA node
18 nodeID := getCurrentNUMANode()
19 return n.nodes[nodeID].Alloc(size)
20}
6. Debugging and Leak Detection:
1type DebugAllocator struct {
2 allocator *SlabAllocator
3 tracking map[unsafe.Pointer]AllocationInfo
4 mu sync.Mutex
5}
6
7type AllocationInfo struct {
8 Size int
9 Timestamp time.Time
10 Stack string
11}
12
13func Alloc(size int) unsafe.Pointer {
14 ptr := d.allocator.Alloc(size)
15
16 d.mu.Lock()
17 d.tracking[ptr] = AllocationInfo{
18 Size: size,
19 Timestamp: time.Now(),
20 Stack: string(debug.Stack()),
21 }
22 d.mu.Unlock()
23
24 return ptr
25}
26
27func Free(ptr unsafe.Pointer, size int) {
28 d.mu.Lock()
29 delete(d.tracking, ptr)
30 d.mu.Unlock()
31
32 d.allocator.Free(ptr, size)
33}
34
35func DetectLeaks() []AllocationInfo {
36 d.mu.Lock()
37 defer d.mu.Unlock()
38
39 leaks := make([]AllocationInfo, 0)
40 threshold := time.Now().Add(-5 * time.Minute)
41
42 for _, info := range d.tracking {
43 if info.Timestamp.Before(threshold) {
44 leaks = append(leaks, info)
45 }
46 }
47
48 return leaks
49}
Performance Characteristics:
| Allocator | Alloc Speed | Free Speed | Memory Overhead | Best Use Case |
|---|---|---|---|---|
| Arena | 12ns | N/A | 0% | Request-scoped, parsers |
| Pool | 8ns | 8ns | 10% | Fixed-size objects |
| Slab | 10ns | 10ns | 25% | Mixed sizes |
| Buddy | 50ns | 50ns | 15% | Variable sizes with coalescing |
| Go heap | 28ns | Auto | 50-100% | General purpose |
When to Use Custom Allocators:
- High allocation rate
- Predictable allocation patterns
- Batch processing with clear lifetimes
- Real-time systems requiring predictable latency
- Minimizing GC pressure
- FFI/CGO interop
When NOT to Use:
- Unpredictable allocation patterns
- Long-lived objects with complex lifecycles
- Small number of allocations
- Prototype/development phase
Key Takeaways
- Custom allocators reduce GC pressure
- Arena allocation is fast but requires bulk deallocation
- Pool allocation enables object reuse
- Slab allocation handles multiple object sizes
- Memory alignment is critical for performance
- unsafe package enables low-level memory control
- Track memory usage to prevent leaks
- Consider memory fragmentation in long-running applications