Exercise: Memory Allocator

Difficulty - Advanced

Learning Objectives

Implement custom memory allocation strategies
Manage memory pools and arenas
Handle memory alignment requirements
Implement slab allocation
Track memory usage and fragmentation
Understand unsafe operations

Problem Statement

Create a custom memory allocator with arena-based allocation and memory pooling for high-performance applications.

Core Components

 1package allocator
 2
 3import (
 4    "unsafe"
 5)
 6
 7type Arena struct {
 8    buf    []byte
 9    offset uintptr
10    size   uintptr
11}
12
13type Pool struct {
14    blockSize int
15    blocks    []unsafe.Pointer
16    arena     *Arena
17}
18
19func NewArena(size int) *Arena
20func Alloc(size int) unsafe.Pointer
21func Reset()
22func NewPool(blockSize int, capacity int) *Pool
23func Get() unsafe.Pointer
24func Put(ptr unsafe.Pointer)

Solution

Click to see the solution

Algorithm Overview

This implementation provides three memory allocation strategies optimized for different use cases:

1. Arena Allocator:

Linear allocation from pre-allocated buffer
Fastest possible allocation: increment offset pointer
No individual deallocation - reset entire arena
Best for: Request-scoped allocations, parsers, temporary computations
Trade-off: Cannot free individual objects

2. Pool Allocator:

Pre-allocates fixed-size blocks from arena
Maintains free list for reuse
O(1) allocation and deallocation
Best for: Objects of same size
Trade-off: Fixed block size, potential fragmentation

3. Slab Allocator:

Multiple pools for different object sizes
Round up to next power of 2
Reduces external fragmentation
Best for: Mixed object sizes with predictable patterns
Trade-off: Internal fragmentation from rounding

Memory Alignment:

All allocations aligned to 8-byte boundaries
Ensures optimal CPU access patterns
Required for atomic operations and SIMD
Uses bitwise operations for fast alignment

Key Design Principles:

Unsafe operations for direct memory manipulation
Mutex synchronization for thread safety
Zero allocation overhead after initialization
Explicit memory lifetime management

Time Complexity Analysis

Operation	Arena	Pool	Slab	Explanation
Alloc	O(1)	O(1)	O(1)	Pointer increment or pop from free list
Free	N/A	O(1)	O(1)	Push to free list
Reset	O(1)	O(1)	O(n)	Reset offset or rebuild pools
Alignment	O(1)	O(1)	O(1)	Bitwise AND operation

Space Complexity

Arena:

O(n) where n = arena size
Zero overhead per allocation
Total waste: alignment padding only

Pool:

O(n × m) where n = capacity, m = block size
Per-block overhead: 8 bytes
Waste: Unused capacity in pre-allocated blocks

Slab:

O(k × n × m) where k = number of size classes
Overhead: Multiple pools, each with own waste
Typical waste: 25-50% due to power-of-2 rounding

Example Memory Usage:

Go heap: ~50KB + GC overhead
Arena: ~10KB
Pool: ~12KB
Slab: ~15KB

Implementation

  1package allocator
  2
  3import (
  4    "sync"
  5    "unsafe"
  6)
  7
  8const (
  9    defaultAlignment = 8
 10)
 11
 12// Arena provides bump-pointer allocation
 13type Arena struct {
 14    buf    []byte
 15    offset uintptr
 16    size   uintptr
 17    mu     sync.Mutex
 18}
 19
 20func NewArena(size int) *Arena {
 21    return &Arena{
 22        buf:  make([]byte, size),
 23        size: uintptr(size),
 24    }
 25}
 26
 27func Alloc(size int) unsafe.Pointer {
 28    a.mu.Lock()
 29    defer a.mu.Unlock()
 30
 31    aligned := align(uintptr(size), defaultAlignment)
 32    if a.offset+aligned > a.size {
 33        return nil // Out of memory
 34    }
 35
 36    ptr := unsafe.Pointer(&a.buf[a.offset])
 37    a.offset += aligned
 38    return ptr
 39}
 40
 41func Reset() {
 42    a.mu.Lock()
 43    defer a.mu.Unlock()
 44    a.offset = 0
 45}
 46
 47func Usage() {
 48    a.mu.Lock()
 49    defer a.mu.Unlock()
 50    return a.offset, a.size
 51}
 52
 53// Pool provides object pooling with fixed-size blocks
 54type Pool struct {
 55    blockSize int
 56    free      []unsafe.Pointer
 57    arena     *Arena
 58    mu        sync.Mutex
 59}
 60
 61func NewPool(blockSize int, capacity int) *Pool {
 62    arena := NewArena(blockSize * capacity)
 63    p := &Pool{
 64        blockSize: blockSize,
 65        free:      make([]unsafe.Pointer, 0, capacity),
 66        arena:     arena,
 67    }
 68
 69    // Pre-allocate blocks
 70    for i := 0; i < capacity; i++ {
 71        ptr := arena.Alloc(blockSize)
 72        if ptr != nil {
 73            p.free = append(p.free, ptr)
 74        }
 75    }
 76
 77    return p
 78}
 79
 80func Get() unsafe.Pointer {
 81    p.mu.Lock()
 82    defer p.mu.Unlock()
 83
 84    if len(p.free) == 0 {
 85        // Allocate new block
 86        return p.arena.Alloc(p.blockSize)
 87    }
 88
 89    ptr := p.free[len(p.free)-1]
 90    p.free = p.free[:len(p.free)-1]
 91    return ptr
 92}
 93
 94func Put(ptr unsafe.Pointer) {
 95    p.mu.Lock()
 96    defer p.mu.Unlock()
 97    p.free = append(p.free, ptr)
 98}
 99
100// Slab allocator for multiple object sizes
101type SlabAllocator struct {
102    slabs map[int]*Pool
103    mu    sync.RWMutex
104}
105
106func NewSlabAllocator() *SlabAllocator {
107    return &SlabAllocator{
108        slabs: make(map[int]*Pool),
109    }
110}
111
112func Alloc(size int) unsafe.Pointer {
113    slabSize := nextPowerOf2(size)
114
115    s.mu.RLock()
116    pool, exists := s.slabs[slabSize]
117    s.mu.RUnlock()
118
119    if !exists {
120        s.mu.Lock()
121        pool, exists = s.slabs[slabSize]
122        if !exists {
123            pool = NewPool(slabSize, 1024)
124            s.slabs[slabSize] = pool
125        }
126        s.mu.Unlock()
127    }
128
129    return pool.Get()
130}
131
132func Free(ptr unsafe.Pointer, size int) {
133    slabSize := nextPowerOf2(size)
134
135    s.mu.RLock()
136    pool, exists := s.slabs[slabSize]
137    s.mu.RUnlock()
138
139    if exists {
140        pool.Put(ptr)
141    }
142}
143
144func align(size, alignment uintptr) uintptr {
145    return &^
146}
147
148func nextPowerOf2(n int) int {
149    if n <= 0 {
150        return 1
151    }
152    n--
153    n |= n >> 1
154    n |= n >> 2
155    n |= n >> 4
156    n |= n >> 8
157    n |= n >> 16
158    n++
159    return n
160}
161
162// Helper to convert between unsafe.Pointer and []byte
163func PointerToBytes(ptr unsafe.Pointer, size int) []byte {
164    return unsafe.Slice((*byte)(ptr), size)
165}
166
167func BytesToPointer(b []byte) unsafe.Pointer {
168    return unsafe.Pointer(&b[0])
169}

Usage Example

 1package main
 2
 3import (
 4    "fmt"
 5    "unsafe"
 6)
 7
 8type MyStruct struct {
 9    ID   int
10    Name [64]byte
11}
12
13func main() {
14    // Arena allocation
15    arena := allocator.NewArena(1024 * 1024) // 1MB
16
17    ptr := arena.Alloc(100)
18    if ptr != nil {
19        slice := allocator.PointerToBytes(ptr, 100)
20        copy(slice, []byte("Hello, Arena!"))
21        fmt.Println(string(slice[:13]))
22    }
23
24    arena.Reset() // Free all allocations
25
26    // Pool allocation
27    structSize := int(unsafe.Sizeof(MyStruct{}))
28    pool := allocator.NewPool(structSize, 100)
29
30    ptr = pool.Get()
31    if ptr != nil {
32        obj :=(ptr)
33        obj.ID = 42
34        copy(obj.Name[:], "Pooled Object")
35        fmt.Printf("ID: %d, Name: %s\n", obj.ID, string(obj.Name[:13]))
36
37        pool.Put(ptr) // Return to pool
38    }
39
40    // Slab allocation
41    slab := allocator.NewSlabAllocator()
42
43    ptr1 := slab.Alloc(32)
44    ptr2 := slab.Alloc(128)
45    ptr3 := slab.Alloc(32)
46
47    slab.Free(ptr1, 32)
48    slab.Free(ptr2, 128)
49    slab.Free(ptr3, 32)
50}

Benchmarking Code

  1package allocator_test
  2
  3import (
  4    "sync"
  5    "testing"
  6    "unsafe"
  7)
  8
  9type TestStruct struct {
 10    ID   int
 11    Data [64]byte
 12}
 13
 14// Benchmark standard Go allocation
 15func BenchmarkStandardAllocation(b *testing.B) {
 16    b.ResetTimer()
 17    for i := 0; i < b.N; i++ {
 18        obj := &TestStruct{ID: i}
 19        _ = obj
 20    }
 21}
 22
 23// Benchmark arena allocation
 24func BenchmarkArenaAllocation(b *testing.B) {
 25    arena := allocator.NewArena(1024 * 1024) // 1MB
 26    size := int(unsafe.Sizeof(TestStruct{}))
 27
 28    b.ResetTimer()
 29    for i := 0; i < b.N; i++ {
 30        ptr := arena.Alloc(size)
 31        if ptr == nil {
 32            arena.Reset()
 33            ptr = arena.Alloc(size)
 34        }
 35        obj :=(ptr)
 36        obj.ID = i
 37    }
 38}
 39
 40// Benchmark pool allocation
 41func BenchmarkPoolAllocation(b *testing.B) {
 42    size := int(unsafe.Sizeof(TestStruct{}))
 43    pool := allocator.NewPool(size, 1000)
 44
 45    b.ResetTimer()
 46    for i := 0; i < b.N; i++ {
 47        ptr := pool.Get()
 48        obj :=(ptr)
 49        obj.ID = i
 50        pool.Put(ptr)
 51    }
 52}
 53
 54// Benchmark slab allocation
 55func BenchmarkSlabAllocation(b *testing.B) {
 56    slab := allocator.NewSlabAllocator()
 57    size := int(unsafe.Sizeof(TestStruct{}))
 58
 59    b.ResetTimer()
 60    for i := 0; i < b.N; i++ {
 61        ptr := slab.Alloc(size)
 62        obj :=(ptr)
 63        obj.ID = i
 64        slab.Free(ptr, size)
 65    }
 66}
 67
 68// Benchmark concurrent arena allocation
 69func BenchmarkConcurrentArena(b *testing.B) {
 70    arena := allocator.NewArena(10 * 1024 * 1024) // 10MB
 71    size := int(unsafe.Sizeof(TestStruct{}))
 72
 73    b.ResetTimer()
 74    b.RunParallel(func(pb *testing.PB) {
 75        for pb.Next() {
 76            ptr := arena.Alloc(size)
 77            if ptr != nil {
 78                obj :=(ptr)
 79                obj.ID = 1
 80            }
 81        }
 82    })
 83}
 84
 85// Benchmark concurrent pool allocation
 86func BenchmarkConcurrentPool(b *testing.B) {
 87    size := int(unsafe.Sizeof(TestStruct{}))
 88    pool := allocator.NewPool(size, 10000)
 89
 90    b.ResetTimer()
 91    b.RunParallel(func(pb *testing.PB) {
 92        for pb.Next() {
 93            ptr := pool.Get()
 94            if ptr != nil {
 95                obj :=(ptr)
 96                obj.ID = 1
 97                pool.Put(ptr)
 98            }
 99        }
100    })
101}
102
103// Benchmark allocation patterns
104func BenchmarkMixedSizes(b *testing.B) {
105    slab := allocator.NewSlabAllocator()
106    sizes := []int{16, 32, 64, 128, 256}
107
108    b.ResetTimer()
109    for i := 0; i < b.N; i++ {
110        size := sizes[i%len(sizes)]
111        ptr := slab.Alloc(size)
112        slab.Free(ptr, size)
113    }
114}
115
116// Benchmark memory fragmentation
117func BenchmarkFragmentation(b *testing.B) {
118    slab := allocator.NewSlabAllocator()
119    var ptrs []unsafe.Pointer
120
121    b.ResetTimer()
122    for i := 0; i < b.N; i++ {
123        // Allocate various sizes
124        ptr1 := slab.Alloc(32)
125        ptr2 := slab.Alloc(128)
126        ptr3 := slab.Alloc(64)
127
128        ptrs = append(ptrs, ptr1, ptr2, ptr3)
129
130        // Free every other allocation
131        if len(ptrs) > 6 {
132            slab.Free(ptrs[0], 32)
133            slab.Free(ptrs[2], 64)
134            ptrs = ptrs[3:]
135        }
136    }
137}
138
139// Benchmark vs sync.Pool
140func BenchmarkSyncPool(b *testing.B) {
141    pool := &sync.Pool{
142        New: func() interface{} {
143            return &TestStruct{}
144        },
145    }
146
147    b.ResetTimer()
148    b.RunParallel(func(pb *testing.PB) {
149        for pb.Next() {
150            obj := pool.Get().(*TestStruct)
151            obj.ID = 1
152            pool.Put(obj)
153        }
154    })
155}
156
157// Benchmark arena reset
158func BenchmarkArenaReset(b *testing.B) {
159    arena := allocator.NewArena(1024 * 1024)
160    size := 100
161
162    b.ResetTimer()
163    for i := 0; i < b.N; i++ {
164        // Allocate until full
165        for j := 0; j < 1000; j++ {
166            arena.Alloc(size)
167        }
168        // Reset
169        arena.Reset()
170    }
171}
172
173// Example benchmark results:
174// BenchmarkStandardAllocation-8       50000000     28.5 ns/op    80 B/op    1 allocs/op
175// BenchmarkArenaAllocation-8         100000000     12.3 ns/op     0 B/op    0 allocs/op
176// BenchmarkPoolAllocation-8          200000000      7.8 ns/op     0 B/op    0 allocs/op
177// BenchmarkSlabAllocation-8          150000000     10.2 ns/op     0 B/op    0 allocs/op
178// BenchmarkConcurrentArena-8          80000000     15.5 ns/op     0 B/op    0 allocs/op
179// BenchmarkConcurrentPool-8          150000000      9.1 ns/op     0 B/op    0 allocs/op
180// BenchmarkMixedSizes-8              100000000     11.8 ns/op     0 B/op    0 allocs/op
181// BenchmarkSyncPool-8                200000000      8.5 ns/op     0 B/op    0 allocs/op
182// BenchmarkArenaReset-8                 500000   3200.0 ns/op     0 B/op    0 allocs/op

Production Considerations

1. Thread-Safe Arena with Sharding:

 1type ShardedArena struct {
 2    shards    []*Arena
 3    shardMask uint32
 4}
 5
 6func NewShardedArena(size int, numShards int) *ShardedArena {
 7    shards := make([]*Arena, numShards)
 8    for i := 0; i < numShards; i++ {
 9        shards[i] = NewArena(size / numShards)
10    }
11
12    return &ShardedArena{
13        shards:    shards,
14        shardMask: uint32(numShards - 1), // Must be power of 2
15    }
16}
17
18func Alloc(size int) unsafe.Pointer {
19    // Use goroutine ID or random for shard selection
20    shardID := getGoroutineID() & s.shardMask
21    return s.shards[shardID].Alloc(size)
22}

2. Memory Pressure Monitoring:

 1type MonitoredAllocator struct {
 2    arena     *Arena
 3    allocated atomic.Int64
 4    peak      atomic.Int64
 5    threshold int64
 6    onPressure func()
 7}
 8
 9func Alloc(size int) unsafe.Pointer {
10    current := m.allocated.Add(int64(size))
11
12    // Update peak
13    for {
14        peak := m.peak.Load()
15        if current <= peak || m.peak.CompareAndSwap(peak, current) {
16            break
17        }
18    }
19
20    // Check memory pressure
21    if current > m.threshold && m.onPressure != nil {
22        m.onPressure()
23    }
24
25    return m.arena.Alloc(size)
26}
27
28func Reset() {
29    m.arena.Reset()
30    m.allocated.Store(0)
31}

3. Buddy Allocator for Variable Sizes:

 1type BuddyAllocator struct {
 2    memory    []byte
 3    freeLists [32][]int // One per power of 2
 4    minOrder  int       // Minimum block size = 2^minOrder
 5    mu        sync.Mutex
 6}
 7
 8func Alloc(size int) unsafe.Pointer {
 9    b.mu.Lock()
10    defer b.mu.Unlock()
11
12    // Round up to power of 2
13    order := b.orderFor(size)
14
15    // Find smallest available block
16    for o := order; o < len(b.freeLists); o++ {
17        if len(b.freeLists[o]) > 0 {
18            // Pop block
19            idx := b.freeLists[o][len(b.freeLists[o])-1]
20            b.freeLists[o] = b.freeLists[o][:len(b.freeLists[o])-1]
21
22            // Split if necessary
23            for o > order {
24                o--
25                buddyIdx := idx ^
26                b.freeLists[o] = append(b.freeLists[o], buddyIdx)
27            }
28
29            offset := idx << b.minOrder
30            return unsafe.Pointer(&b.memory[offset])
31        }
32    }
33
34    return nil
35}
36
37func Free(ptr unsafe.Pointer, size int) {
38    b.mu.Lock()
39    defer b.mu.Unlock()
40
41    order := b.orderFor(size)
42    offset := uintptr(ptr) - uintptr(unsafe.Pointer(&b.memory[0]))
43    idx := int(offset) >> b.minOrder
44
45    // Coalesce with buddy if free
46    for order < len(b.freeLists)-1 {
47        buddyIdx := idx ^
48
49        // Find and remove buddy from free list
50        found := -1
51        for i, freeIdx := range b.freeLists[order] {
52            if freeIdx == buddyIdx {
53                found = i
54                break
55            }
56        }
57
58        if found == -1 {
59            break
60        }
61
62        // Remove buddy and merge
63        b.freeLists[order] = append(
64            b.freeLists[order][:found],
65            b.freeLists[order][found+1:]...,
66        )
67
68        idx = min(idx, buddyIdx)
69        order++
70    }
71
72    // Add to free list
73    b.freeLists[order] = append(b.freeLists[order], idx)
74}

4. Integration with CGO:

 1// #include <stdlib.h>
 2import "C"
 3
 4type CAllocator struct {
 5    allocations map[unsafe.Pointer]C.size_t
 6    mu          sync.Mutex
 7}
 8
 9func Alloc(size int) unsafe.Pointer {
10    ptr := C.malloc(C.size_t(size))
11
12    c.mu.Lock()
13    c.allocations[ptr] = C.size_t(size)
14    c.mu.Unlock()
15
16    return ptr
17}
18
19func Free(ptr unsafe.Pointer) {
20    c.mu.Lock()
21    delete(c.allocations, ptr)
22    c.mu.Unlock()
23
24    C.free(ptr)
25}

5. NUMA-Aware Allocation:

 1type NUMAAllocator struct {
 2    nodes []*Arena
 3}
 4
 5func NewNUMAAllocator(size int) *NUMAAllocator {
 6    numNodes := runtime.NumCPU() / 2 // Approximate NUMA nodes
 7
 8    nodes := make([]*Arena, numNodes)
 9    for i := 0; i < numNodes; i++ {
10        nodes[i] = NewArena(size / numNodes)
11    }
12
13    return &NUMAAllocator{nodes: nodes}
14}
15
16func Alloc(size int) unsafe.Pointer {
17    // Allocate from local NUMA node
18    nodeID := getCurrentNUMANode()
19    return n.nodes[nodeID].Alloc(size)
20}

6. Debugging and Leak Detection:

 1type DebugAllocator struct {
 2    allocator *SlabAllocator
 3    tracking  map[unsafe.Pointer]AllocationInfo
 4    mu        sync.Mutex
 5}
 6
 7type AllocationInfo struct {
 8    Size      int
 9    Timestamp time.Time
10    Stack     string
11}
12
13func Alloc(size int) unsafe.Pointer {
14    ptr := d.allocator.Alloc(size)
15
16    d.mu.Lock()
17    d.tracking[ptr] = AllocationInfo{
18        Size:      size,
19        Timestamp: time.Now(),
20        Stack:     string(debug.Stack()),
21    }
22    d.mu.Unlock()
23
24    return ptr
25}
26
27func Free(ptr unsafe.Pointer, size int) {
28    d.mu.Lock()
29    delete(d.tracking, ptr)
30    d.mu.Unlock()
31
32    d.allocator.Free(ptr, size)
33}
34
35func DetectLeaks() []AllocationInfo {
36    d.mu.Lock()
37    defer d.mu.Unlock()
38
39    leaks := make([]AllocationInfo, 0)
40    threshold := time.Now().Add(-5 * time.Minute)
41
42    for _, info := range d.tracking {
43        if info.Timestamp.Before(threshold) {
44            leaks = append(leaks, info)
45        }
46    }
47
48    return leaks
49}

Performance Characteristics:

Allocator	Alloc Speed	Free Speed	Memory Overhead	Best Use Case
Arena	12ns	N/A	0%	Request-scoped, parsers
Pool	8ns	8ns	10%	Fixed-size objects
Slab	10ns	10ns	25%	Mixed sizes
Buddy	50ns	50ns	15%	Variable sizes with coalescing
Go heap	28ns	Auto	50-100%	General purpose

When to Use Custom Allocators:

High allocation rate
Predictable allocation patterns
Batch processing with clear lifetimes
Real-time systems requiring predictable latency
Minimizing GC pressure
FFI/CGO interop

When NOT to Use:

Unpredictable allocation patterns
Long-lived objects with complex lifecycles
Small number of allocations
Prototype/development phase

Key Takeaways

Custom allocators reduce GC pressure
Arena allocation is fast but requires bulk deallocation
Pool allocation enables object reuse
Slab allocation handles multiple object sizes
Memory alignment is critical for performance
unsafe package enables low-level memory control
Track memory usage to prevent leaks
Consider memory fragmentation in long-running applications