Custom Garbage Collector

Exercise: Custom Garbage Collector

Difficulty - Advanced

Estimated Time - 10-12 hours

Learning Objectives

  • Understand garbage collection algorithms and theory
  • Implement mark-and-sweep and generational collection
  • Master heap management and memory allocation strategies
  • Build write barriers for tracking object mutations
  • Implement garbage collection pausing and concurrency
  • Measure GC performance metrics

Problem Statement

Build a custom garbage collector that automatically reclaims unused memory in a managed heap. The collector should support multiple collection algorithms, handle object graphs with cycles, minimize pause times, and provide performance metrics similar to production garbage collectors.

Garbage collection is fundamental to memory-safe languages like Go, Java, JavaScript, and Python. Understanding how GC works internally helps you write more efficient code and diagnose memory issues in production systems.

Real-World Scenario:

 1// A managed heap with automatic garbage collection
 2heap := NewManagedHeap(10 * 1024 * 1024) // 10 MB heap
 3
 4// Allocate objects without manual free
 5obj1 := heap.Allocate(1024)  // 1 KB object
 6obj2 := heap.Allocate(2048)  // 2 KB object
 7obj3 := heap.Allocate(512)   // 512 B object
 8
 9// Create object graph
10heap.AddReference(obj1, obj2)  // obj1 -> obj2
11heap.AddReference(obj2, obj3)  // obj2 -> obj3
12
13// Remove reference - obj3 becomes garbage
14heap.RemoveReference(obj2, obj3)
15obj3 = nil
16
17// GC automatically reclaims obj3
18stats := heap.CollectGarbage()
19fmt.Printf("Collected %d bytes in %v\n", stats.BytesReclaimed, stats.PauseTime)
20
21// No memory leaks, no manual free() calls!

Requirements

Functional Requirements

  1. Memory Allocation: Allocate objects from a managed heap
  2. Mark-and-Sweep GC: Implement basic mark-and-sweep algorithm
  3. Generational GC: Separate young and old generations for efficiency
  4. Reference Tracking: Track references between objects
  5. Root Set Management: Identify root objects
  6. Cycle Detection: Handle circular reference graphs
  7. Compaction: Reduce fragmentation by moving live objects
  8. Concurrent Collection: Minimize pause times with background GC
  9. Write Barriers: Track mutations during concurrent collection
  10. Performance Metrics: Measure pause time, throughput, and memory usage

Non-Functional Requirements

  • Support heap sizes from 1 MB to 1 GB
  • Pause times under 10ms for generational GC
  • Memory overhead under 20% for GC metadata
  • Handle object graphs with 100,000+ objects
  • Thread-safe allocation and collection

Background and Theory

Garbage Collection Fundamentals

Garbage collection automatically reclaims memory that is no longer reachable from the program's root set. The root set includes:

  • Stack variables
  • Global variables
  • CPU registers
  • Other roots

Key Concepts:

  1. Reachability: An object is live if reachable from roots
  2. Garbage: Objects not reachable from any root
  3. Object Graph: Directed graph of object references
  4. Heap: Memory region for dynamically allocated objects

Garbage Collection Algorithms

1. Reference Counting

  • Each object has a counter of references to it
  • Increment on reference creation, decrement on deletion
  • Free when count reaches zero
  • Problem: Cannot handle cycles

2. Mark-and-Sweep

  • Mark Phase: Traverse object graph from roots, mark reachable objects
  • Sweep Phase: Scan heap, free unmarked objects
  • Advantage: Handles cycles
  • Disadvantage: Pause time proportional to heap size

3. Generational Collection

  • Most objects die young
  • Separate heap into young and old generations
  • Collect young generation frequently
  • Collect old generation rarely
  • Advantage: Lower average pause times

4. Copying Collection

  • Divide heap into two semi-spaces
  • Allocate in one space until full
  • Copy live objects to other space
  • Swap spaces
  • Advantage: Automatic compaction, fast allocation
  • Disadvantage: 50% memory overhead

5. Mark-Compact

  • Mark live objects
  • Compact them to beginning of heap
  • Update all references
  • Advantage: No fragmentation, lower memory overhead than copying
  • Disadvantage: Requires multiple passes

How Mark-and-Sweep Works

Initial Heap:
[A(live)] -> [B(live)] -> [C(dead)]
   |
   v
[D(dead)]

Root Set: {A}

Mark Phase:
1. Start from root A, mark it
2. Follow A's references, mark B
3. Follow B's references, mark C...
4. Marked: {A, B}, Unmarked: {C, D}

Sweep Phase:
1. Scan entire heap
2. Free unmarked objects C and D
3. Reset mark bits

Final Heap:
[A] -> [B] [FREE] [FREE]

Generational GC Example

Young Generation:
[A(new)] [B(new)] [C(new)] [D(old)] [E(new)]

Minor GC:
1. Mark from roots and remembered set
2. Copy survivors to survivor space or promote to old gen
3. Clear eden

Old Generation:
[D] [F] [G] ...

Major GC:
1. Mark entire heap from roots
2. Sweep or compact old generation
3. Much longer pause time

Use Cases

1. Language Runtimes

  • Go
  • Java
  • JavaScript
  • Python

2. Database Systems

  • PostgreSQL MVCC garbage collection
  • MongoDB WiredTiger cache eviction
  • Redis memory eviction policies

3. Application-Level Memory Management

  • Object pools
  • Cached data eviction
  • Resource cleanup

Implementation Challenges

Challenge 1: Safe Memory Allocation

Issue: Allocating memory in a managed heap requires finding free space and tracking metadata.

Solution: Use a free list or bump pointer allocation:

 1type FreeBlock struct {
 2    Size int
 3    Next *FreeBlock
 4}
 5
 6// First-fit allocation
 7func allocate(size int) *Object {
 8    // Round up to alignment
 9    size = align(size, 8)
10
11    // Find free block
12    for fb := h.freeList; fb != nil; fb = fb.Next {
13        if fb.Size >= size {
14            // Split block if necessary
15            if fb.Size > size+minBlockSize {
16                remaining := fb.Size - size
17                newFB := &FreeBlock{Size: remaining, Next: fb.Next}
18                // Update free list
19            }
20            return fb
21        }
22    }
23    return nil // Out of memory
24}

Challenge 2: Concurrent Collection

Issue: Program execution must pause during GC to maintain consistency.

Solution: Tri-color marking with write barriers:

  • White: Unvisited objects
  • Gray: Visited but not scanned
  • Black: Visited and scanned

Write Barrier: When a black object is modified to reference a white object, either:

  • Mark the white object gray
  • Mark the black object gray again

Challenge 3: Pointer Identification

Issue: Need to distinguish pointers from data in object fields.

Solution: Use type information or conservative GC:

 1type ObjectMetadata struct {
 2    Size    int
 3    TypeID  int
 4    Marked  bool
 5    Forwarded *Object // For copying/compaction
 6}
 7
 8type TypeInfo struct {
 9    Size         int
10    PointerMap   []int // Offsets of pointer fields
11}

Challenge 4: Heap Fragmentation

Issue: Free blocks scattered throughout heap prevent large allocations.

Solution: Implement compaction or use size-class segregation:

 1// Compact heap by moving objects
 2func compact() {
 3    dest := h.base
 4    for obj := range h.liveObjects() {
 5        if obj != dest {
 6            copy(dest, obj, obj.Size)
 7            obj.ForwardingAddress = dest
 8        }
 9        dest += obj.Size
10    }
11    // Update all references to use forwarding addresses
12}

Hints

Hint 1: Start with a Simple Heap Structure

Define the basic heap and object structures:

 1type Object struct {
 2    Header   ObjectHeader
 3    Data     []byte
 4}
 5
 6type ObjectHeader struct {
 7    Size     int32
 8    Marked   bool
 9    TypeID   int16
10    RefCount int32 // For reference counting
11}
12
13type Heap struct {
14    Memory     []byte
15    Objects    map[uintptr]*Object
16    Roots      []*Object
17    FreeList   *FreeBlock
18    BytesUsed  int64
19    BytesTotal int64
20}
Hint 2: Implement Mark Phase with DFS

Use depth-first search to mark reachable objects:

 1func mark() {
 2    // Mark all roots
 3    for _, root := range gc.heap.Roots {
 4        gc.markObject(root)
 5    }
 6}
 7
 8func markObject(obj *Object) {
 9    if obj == nil || obj.Header.Marked {
10        return
11    }
12
13    obj.Header.Marked = true
14    gc.stats.ObjectsMarked++
15
16    // Mark all referenced objects
17    for _, ref := range gc.getReferences(obj) {
18        gc.markObject(ref)
19    }
20}
Hint 3: Implement Sweep Phase

Scan heap and free unmarked objects:

 1func sweep() {
 2    freedBytes := 0
 3
 4    for addr, obj := range gc.heap.Objects {
 5        if !obj.Header.Marked {
 6            // Unmarked = garbage, free it
 7            freedBytes += obj.Header.Size
 8            gc.freeObject(obj)
 9            delete(gc.heap.Objects, addr)
10        } else {
11            // Clear mark for next GC cycle
12            obj.Header.Marked = false
13        }
14    }
15
16    gc.stats.BytesReclaimed = freedBytes
17}
Hint 4: Add Generational Collection

Separate young and old generations:

 1type GenerationalHeap struct {
 2    Young *Heap  // Frequently collected
 3    Old   *Heap  // Rarely collected
 4    RememberedSet map[*Object][]*Object // Old->Young refs
 5}
 6
 7func minorGC() {
 8    // Mark from roots and remembered set
 9    gc := NewGC(h.Young)
10    for oldObj, youngRefs := range h.RememberedSet {
11        for _, ref := range youngRefs {
12            gc.markObject(ref)
13        }
14    }
15    gc.mark()
16
17    // Promote survivors to old generation
18    for obj := range h.Young.Objects {
19        if obj.Header.Marked && obj.Age > ageThreshold {
20            h.promote(obj)
21        }
22    }
23
24    gc.sweep()
25}
Hint 5: Add Write Barriers for Concurrent GC

Track mutations during concurrent marking:

 1func WriteBarrier(obj *Object, field int, newRef *Object) {
 2    // Dijkstra write barrier
 3    if gc.InProgress && obj.Color == Black && newRef.Color == White {
 4        // Mark newly referenced object as gray
 5        gc.MarkGray(newRef)
 6    }
 7
 8    // Update the reference
 9    obj.SetReference(field, newRef)
10}

Solution

Show Complete Solution

Approach

We'll implement a comprehensive garbage collector with:

  1. Basic mark-and-sweep GC for correctness
  2. Generational GC for performance
  3. Write barriers for concurrent collection
  4. Compaction to reduce fragmentation
  5. Performance metrics for monitoring

Implementation

  1package gc
  2
  3import (
  4    "fmt"
  5    "sync"
  6    "time"
  7    "unsafe"
  8)
  9
 10// ===== Object and Heap Structures =====
 11
 12// ObjectHeader contains GC metadata for each object
 13type ObjectHeader struct {
 14    Size      int32       // Object size in bytes
 15    TypeID    int16       // Type identifier
 16    Marked    bool        // Mark bit for GC
 17    Age       uint8       // Generation age
 18    Forwarded uintptr     // Forwarding pointer for compaction
 19    RefCount  int32       // Reference count
 20}
 21
 22// Object represents a managed object
 23type Object struct {
 24    Header     ObjectHeader
 25    References []*Object // Pointers to other objects
 26    Data       []byte    // Actual object data
 27}
 28
 29// NewObject creates a new object
 30func NewObject(size int, typeID int) *Object {
 31    return &Object{
 32        Header: ObjectHeader{
 33            Size:   int32(size),
 34            TypeID: int16(typeID),
 35        },
 36        References: make([]*Object, 0),
 37        Data:       make([]byte, size),
 38    }
 39}
 40
 41// AddReference adds a reference to another object
 42func AddReference(ref *Object) {
 43    o.References = append(o.References, ref)
 44}
 45
 46// RemoveReference removes a reference
 47func RemoveReference(ref *Object) {
 48    for i, r := range o.References {
 49        if r == ref {
 50            o.References = append(o.References[:i], o.References[i+1:]...)
 51            return
 52        }
 53    }
 54}
 55
 56// ===== Heap Management =====
 57
 58// Heap represents a managed memory region
 59type Heap struct {
 60    mu           sync.RWMutex
 61    Memory       []byte
 62    Objects      map[uintptr]*Object
 63    Roots        []*Object
 64    BytesUsed    int64
 65    BytesTotal   int64
 66    BytesAllocated int64
 67    NextGCBytes  int64 // Trigger GC when this threshold is reached
 68}
 69
 70// NewHeap creates a new managed heap
 71func NewHeap(size int) *Heap {
 72    return &Heap{
 73        Memory:      make([]byte, size),
 74        Objects:     make(map[uintptr]*Object),
 75        Roots:       make([]*Object, 0),
 76        BytesTotal:  int64(size),
 77        NextGCBytes: int64(size / 2), // GC when 50% full
 78    }
 79}
 80
 81// Allocate allocates an object in the heap
 82func Allocate(size int, typeID int) {
 83    h.mu.Lock()
 84    defer h.mu.Unlock()
 85
 86    // Check if GC is needed
 87    if h.BytesUsed >= h.NextGCBytes {
 88        h.mu.Unlock()
 89        h.triggerGC()
 90        h.mu.Lock()
 91    }
 92
 93    // Check if we have space
 94    if h.BytesUsed+int64(size) > h.BytesTotal {
 95        return nil, fmt.Errorf("out of memory")
 96    }
 97
 98    obj := NewObject(size, typeID)
 99    addr := uintptr(unsafe.Pointer(obj))
100    h.Objects[addr] = obj
101
102    h.BytesUsed += int64(size)
103    h.BytesAllocated += int64(size)
104
105    return obj, nil
106}
107
108// AddRoot adds a root object
109func AddRoot(obj *Object) {
110    h.mu.Lock()
111    defer h.mu.Unlock()
112    h.Roots = append(h.Roots, obj)
113}
114
115// RemoveRoot removes a root object
116func RemoveRoot(obj *Object) {
117    h.mu.Lock()
118    defer h.mu.Unlock()
119
120    for i, r := range h.Roots {
121        if r == obj {
122            h.Roots = append(h.Roots[:i], h.Roots[i+1:]...)
123            return
124        }
125    }
126}
127
128// triggerGC triggers garbage collection
129func triggerGC() {
130    // This will be called by the GC
131}
132
133// ===== Garbage Collector =====
134
135// GCStats tracks garbage collection metrics
136type GCStats struct {
137    NumCollections    int
138    TotalPauseTime    time.Duration
139    LastPauseTime     time.Duration
140    BytesReclaimed    int64
141    ObjectsMarked     int
142    ObjectsSwept      int
143    HeapSizeBefore    int64
144    HeapSizeAfter     int64
145}
146
147// GC represents the garbage collector
148type GC struct {
149    Heap       *Heap
150    Stats      GCStats
151    Algorithm  string // "mark-sweep", "generational", "concurrent"
152    Concurrent bool
153    Running    bool
154    mu         sync.Mutex
155}
156
157// NewGC creates a new garbage collector
158func NewGC(heap *Heap, algorithm string) *GC {
159    return &GC{
160        Heap:      heap,
161        Algorithm: algorithm,
162    }
163}
164
165// Collect performs garbage collection
166func Collect() GCStats {
167    gc.mu.Lock()
168    if gc.Running {
169        gc.mu.Unlock()
170        return gc.Stats
171    }
172    gc.Running = true
173    gc.mu.Unlock()
174
175    defer func() {
176        gc.mu.Lock()
177        gc.Running = false
178        gc.mu.Unlock()
179    }()
180
181    start := time.Now()
182
183    switch gc.Algorithm {
184    case "mark-sweep":
185        gc.markAndSweep()
186    case "generational":
187        gc.generationalCollect()
188    case "concurrent":
189        gc.concurrentCollect()
190    default:
191        gc.markAndSweep()
192    }
193
194    pauseTime := time.Since(start)
195    gc.Stats.NumCollections++
196    gc.Stats.LastPauseTime = pauseTime
197    gc.Stats.TotalPauseTime += pauseTime
198
199    return gc.Stats
200}
201
202// ===== Mark-and-Sweep Algorithm =====
203
204// markAndSweep performs mark-and-sweep collection
205func markAndSweep() {
206    gc.Heap.mu.Lock()
207    defer gc.Heap.mu.Unlock()
208
209    gc.Stats.HeapSizeBefore = gc.Heap.BytesUsed
210
211    // Mark phase
212    gc.mark()
213
214    // Sweep phase
215    gc.sweep()
216
217    gc.Stats.HeapSizeAfter = gc.Heap.BytesUsed
218    gc.Stats.BytesReclaimed = gc.Stats.HeapSizeBefore - gc.Stats.HeapSizeAfter
219
220    // Update GC threshold
221    gc.Heap.NextGCBytes = gc.Heap.BytesUsed * 2
222}
223
224// mark marks all reachable objects
225func mark() {
226    gc.Stats.ObjectsMarked = 0
227
228    // Mark from roots
229    for _, root := range gc.Heap.Roots {
230        gc.markObject(root)
231    }
232}
233
234// markObject marks an object and its references
235func markObject(obj *Object) {
236    if obj == nil || obj.Header.Marked {
237        return
238    }
239
240    obj.Header.Marked = true
241    gc.Stats.ObjectsMarked++
242
243    // Recursively mark references
244    for _, ref := range obj.References {
245        gc.markObject(ref)
246    }
247}
248
249// sweep frees unmarked objects
250func sweep() {
251    gc.Stats.ObjectsSwept = 0
252
253    for addr, obj := range gc.Heap.Objects {
254        if !obj.Header.Marked {
255            // Object is garbage, free it
256            gc.Heap.BytesUsed -= int64(obj.Header.Size)
257            gc.Stats.ObjectsSwept++
258            delete(gc.Heap.Objects, addr)
259        } else {
260            // Clear mark for next GC
261            obj.Header.Marked = false
262        }
263    }
264}
265
266// ===== Generational GC =====
267
268// GenerationalHeap separates young and old generations
269type GenerationalHeap struct {
270    Young         *Heap
271    Old           *Heap
272    RememberedSet map[*Object][]*Object // Old->Young references
273    mu            sync.RWMutex
274}
275
276// NewGenerationalHeap creates a generational heap
277func NewGenerationalHeap(youngSize, oldSize int) *GenerationalHeap {
278    return &GenerationalHeap{
279        Young:         NewHeap(youngSize),
280        Old:           NewHeap(oldSize),
281        RememberedSet: make(map[*Object][]*Object),
282    }
283}
284
285// Allocate allocates in young generation
286func Allocate(size int, typeID int) {
287    obj, err := gh.Young.Allocate(size, typeID)
288    if err != nil {
289        // Young gen full, try minor GC
290        gc := NewGC(gh.Young, "generational")
291        gc.minorGC(gh)
292        return gh.Young.Allocate(size, typeID)
293    }
294    return obj, nil
295}
296
297// AddCrossGenerationalReference tracks old->young references
298func AddCrossGenerationalReference(old, young *Object) {
299    gh.mu.Lock()
300    defer gh.mu.Unlock()
301
302    if _, exists := gh.RememberedSet[old]; !exists {
303        gh.RememberedSet[old] = make([]*Object, 0)
304    }
305    gh.RememberedSet[old] = append(gh.RememberedSet[old], young)
306}
307
308// minorGC collects young generation
309func minorGC(gh *GenerationalHeap) {
310    gh.Young.mu.Lock()
311    defer gh.Young.mu.Unlock()
312
313    // Mark from roots
314    for _, root := range gh.Young.Roots {
315        gc.markObject(root)
316    }
317
318    // Mark from remembered set
319    for _, youngRefs := range gh.RememberedSet {
320        for _, ref := range youngRefs {
321            gc.markObject(ref)
322        }
323    }
324
325    // Promote survivors to old generation
326    ageThreshold := uint8(3)
327    for addr, obj := range gh.Young.Objects {
328        if obj.Header.Marked {
329            obj.Header.Age++
330            if obj.Header.Age > ageThreshold {
331                // Promote to old generation
332                gh.Old.Objects[addr] = obj
333                gh.Young.BytesUsed -= int64(obj.Header.Size)
334                gh.Old.BytesUsed += int64(obj.Header.Size)
335                delete(gh.Young.Objects, addr)
336            }
337        }
338    }
339
340    // Sweep young generation
341    gc.sweep()
342}
343
344// majorGC collects entire heap
345func majorGC(gh *GenerationalHeap) {
346    // Collect both generations
347    gc.Heap = gh.Old
348    gc.markAndSweep()
349
350    gc.Heap = gh.Young
351    gc.markAndSweep()
352}
353
354// ===== Concurrent GC with Tri-Color Marking =====
355
356type Color byte
357
358const (
359    White Color = iota // Unvisited
360    Gray               // Visited, not scanned
361    Black              // Visited and scanned
362)
363
364// ConcurrentGC implements tri-color concurrent marking
365type ConcurrentGC struct {
366    *GC
367    Colors    map[*Object]Color
368    GrayQueue []*Object
369    WriteLog  []*Object
370    mu        sync.Mutex
371}
372
373// NewConcurrentGC creates a concurrent GC
374func NewConcurrentGC(heap *Heap) *ConcurrentGC {
375    return &ConcurrentGC{
376        GC:        NewGC(heap, "concurrent"),
377        Colors:    make(map[*Object]Color),
378        GrayQueue: make([]*Object, 0),
379        WriteLog:  make([]*Object, 0),
380    }
381}
382
383// concurrentCollect performs concurrent collection
384func concurrentCollect() {
385    // Initial marking
386    cgc.initialMark()
387
388    // Concurrent marking
389    cgc.concurrentMark()
390
391    // Final marking
392    cgc.finalMark()
393
394    // Sweep
395    cgc.sweep()
396}
397
398// initialMark marks roots
399func initialMark() {
400    cgc.Heap.mu.Lock()
401    defer cgc.Heap.mu.Unlock()
402
403    // All objects start as white
404    for _, obj := range cgc.Heap.Objects {
405        cgc.Colors[obj] = White
406    }
407
408    // Mark roots as gray
409    for _, root := range cgc.Heap.Roots {
410        cgc.Colors[root] = Gray
411        cgc.GrayQueue = append(cgc.GrayQueue, root)
412    }
413}
414
415// concurrentMark processes gray objects concurrently
416func concurrentMark() {
417    for len(cgc.GrayQueue) > 0 {
418        cgc.mu.Lock()
419        if len(cgc.GrayQueue) == 0 {
420            cgc.mu.Unlock()
421            break
422        }
423
424        obj := cgc.GrayQueue[0]
425        cgc.GrayQueue = cgc.GrayQueue[1:]
426        cgc.mu.Unlock()
427
428        // Mark object as black
429        cgc.Colors[obj] = Black
430
431        // Mark references as gray
432        for _, ref := range obj.References {
433            if cgc.Colors[ref] == White {
434                cgc.Colors[ref] = Gray
435                cgc.mu.Lock()
436                cgc.GrayQueue = append(cgc.GrayQueue, ref)
437                cgc.mu.Unlock()
438            }
439        }
440    }
441}
442
443// finalMark processes write log
444func finalMark() {
445    cgc.Heap.mu.Lock()
446    defer cgc.Heap.mu.Unlock()
447
448    // Process objects mutated during concurrent phase
449    for _, obj := range cgc.WriteLog {
450        if cgc.Colors[obj] == Black {
451            // Re-scan black objects that were mutated
452            for _, ref := range obj.References {
453                if cgc.Colors[ref] == White {
454                    cgc.Colors[ref] = Gray
455                    cgc.GrayQueue = append(cgc.GrayQueue, ref)
456                }
457            }
458        }
459    }
460
461    // Finish marking
462    for len(cgc.GrayQueue) > 0 {
463        obj := cgc.GrayQueue[0]
464        cgc.GrayQueue = cgc.GrayQueue[1:]
465
466        cgc.Colors[obj] = Black
467
468        for _, ref := range obj.References {
469            if cgc.Colors[ref] == White {
470                cgc.Colors[ref] = Gray
471                cgc.GrayQueue = append(cgc.GrayQueue, ref)
472            }
473        }
474    }
475}
476
477// WriteBarrier is called when an object is mutated
478func WriteBarrier(obj *Object, ref *Object) {
479    if !cgc.Running {
480        return
481    }
482
483    cgc.mu.Lock()
484    defer cgc.mu.Unlock()
485
486    // Dijkstra write barrier
487    if cgc.Colors[obj] == Black && cgc.Colors[ref] == White {
488        cgc.Colors[ref] = Gray
489        cgc.GrayQueue = append(cgc.GrayQueue, ref)
490    }
491
492    cgc.WriteLog = append(cgc.WriteLog, obj)
493}
494
495// ===== Helper Functions =====
496
497// GetStats returns current GC statistics
498func GetStats() GCStats {
499    return gc.Stats
500}
501
502// PrintStats prints GC statistics
503func Print() {
504    fmt.Println("=== GC Statistics ===")
505    fmt.Printf("Collections: %d\n", stats.NumCollections)
506    fmt.Printf("Total Pause Time: %v\n", stats.TotalPauseTime)
507    fmt.Printf("Last Pause Time: %v\n", stats.LastPauseTime)
508    fmt.Printf("Bytes Reclaimed: %d\n", stats.BytesReclaimed)
509    fmt.Printf("Objects Marked: %d\n", stats.ObjectsMarked)
510    fmt.Printf("Objects Swept: %d\n", stats.ObjectsSwept)
511    fmt.Printf("Heap Before: %d bytes\n", stats.HeapSizeBefore)
512    fmt.Printf("Heap After: %d bytes\n", stats.HeapSizeAfter)
513}

Testing

  1package gc
  2
  3import (
  4    "testing"
  5    "time"
  6)
  7
  8func TestHeap_Allocation(t *testing.T) {
  9    heap := NewHeap(1024 * 1024) // 1 MB
 10
 11    obj, err := heap.Allocate(1024, 1)
 12    if err != nil {
 13        t.Fatalf("Allocation failed: %v", err)
 14    }
 15
 16    if obj.Header.Size != 1024 {
 17        t.Errorf("Expected size 1024, got %d", obj.Header.Size)
 18    }
 19
 20    if heap.BytesUsed != 1024 {
 21        t.Errorf("Expected BytesUsed 1024, got %d", heap.BytesUsed)
 22    }
 23}
 24
 25func TestGC_MarkAndSweep(t *testing.T) {
 26    heap := NewHeap(10 * 1024) // 10 KB
 27    gc := NewGC(heap, "mark-sweep")
 28
 29    // Allocate objects
 30    obj1, _ := heap.Allocate(100, 1)
 31    obj2, _ := heap.Allocate(100, 1)
 32    obj3, _ := heap.Allocate(100, 1)
 33
 34    // Create references
 35    obj1.AddReference(obj2)
 36
 37    // Add roots
 38    heap.AddRoot(obj1)
 39
 40    // obj3 is garbage
 41
 42    beforeBytes := heap.BytesUsed
 43
 44    // Collect
 45    stats := gc.Collect()
 46
 47    afterBytes := heap.BytesUsed
 48
 49    // Should have collected obj3
 50    if stats.ObjectsSwept != 1 {
 51        t.Errorf("Expected 1 object swept, got %d", stats.ObjectsSwept)
 52    }
 53
 54    if beforeBytes-afterBytes != 100 {
 55        t.Errorf("Expected 100 bytes reclaimed, got %d", beforeBytes-afterBytes)
 56    }
 57}
 58
 59func TestGC_CycleDetection(t *testing.T) {
 60    heap := NewHeap(10 * 1024)
 61    gc := NewGC(heap, "mark-sweep")
 62
 63    // Create cycle: obj1 -> obj2 -> obj3 -> obj1
 64    obj1, _ := heap.Allocate(100, 1)
 65    obj2, _ := heap.Allocate(100, 1)
 66    obj3, _ := heap.Allocate(100, 1)
 67
 68    obj1.AddReference(obj2)
 69    obj2.AddReference(obj3)
 70    obj3.AddReference(obj1)
 71
 72    // No roots - entire cycle is garbage
 73
 74    stats := gc.Collect()
 75
 76    // Should collect all 3 objects
 77    if stats.ObjectsSwept != 3 {
 78        t.Errorf("Expected 3 objects swept, got %d", stats.ObjectsSwept)
 79    }
 80
 81    if heap.BytesUsed != 0 {
 82        t.Errorf("Expected 0 bytes used after GC, got %d", heap.BytesUsed)
 83    }
 84}
 85
 86func TestGenerationalHeap_MinorGC(t *testing.T) {
 87    gh := NewGenerationalHeap(1024, 10*1024)
 88    gc := NewGC(gh.Young, "generational")
 89
 90    // Allocate in young generation
 91    youngObj, _ := gh.Allocate(100, 1)
 92    gh.Young.AddRoot(youngObj)
 93
 94    // Trigger minor GC
 95    gc.minorGC(gh)
 96
 97    // Young object should survive
 98    if gc.Stats.ObjectsSwept != 0 {
 99        t.Errorf("Expected 0 objects swept, got %d", gc.Stats.ObjectsSwept)
100    }
101
102    // Allocate garbage
103    garbage, _ := gh.Allocate(100, 1)
104    _ = garbage
105
106    // Trigger minor GC
107    gc.minorGC(gh)
108
109    // Garbage should be collected
110    if gc.Stats.ObjectsSwept != 1 {
111        t.Errorf("Expected 1 object swept, got %d", gc.Stats.ObjectsSwept)
112    }
113}
114
115func TestConcurrentGC_TriColorMarking(t *testing.T) {
116    heap := NewHeap(10 * 1024)
117    cgc := NewConcurrentGC(heap)
118
119    // Allocate objects
120    obj1, _ := heap.Allocate(100, 1)
121    obj2, _ := heap.Allocate(100, 1)
122    obj3, _ := heap.Allocate(100, 1)
123
124    obj1.AddReference(obj2)
125    heap.AddRoot(obj1)
126
127    // Collect concurrently
128    cgc.concurrentCollect()
129
130    // obj1 and obj2 should be black
131    // obj3 should be white
132
133    if cgc.Colors[obj1] != Black {
134        t.Error("Expected obj1 to be black")
135    }
136
137    if cgc.Colors[obj2] != Black {
138        t.Error("Expected obj2 to be black")
139    }
140
141    if cgc.Colors[obj3] != White {
142        t.Error("Expected obj3 to be white")
143    }
144}
145
146func BenchmarkAllocation(b *testing.B) {
147    heap := NewHeap(100 * 1024 * 1024) // 100 MB
148
149    b.ResetTimer()
150    for i := 0; i < b.N; i++ {
151        heap.Allocate(1024, 1)
152    }
153}
154
155func BenchmarkGC_MarkAndSweep(b *testing.B) {
156    heap := NewHeap(10 * 1024 * 1024) // 10 MB
157    gc := NewGC(heap, "mark-sweep")
158
159    // Allocate some objects
160    for i := 0; i < 1000; i++ {
161        obj, _ := heap.Allocate(1024, 1)
162        if i%10 == 0 {
163            heap.AddRoot(obj)
164        }
165    }
166
167    b.ResetTimer()
168    for i := 0; i < b.N; i++ {
169        gc.Collect()
170    }
171}
172
173func TestGC_PauseTime(t *testing.T) {
174    heap := NewHeap(1024 * 1024) // 1 MB
175    gc := NewGC(heap, "mark-sweep")
176
177    // Allocate many small objects
178    for i := 0; i < 100; i++ {
179        obj, _ := heap.Allocate(100, 1)
180        if i%10 == 0 {
181            heap.AddRoot(obj)
182        }
183    }
184
185    start := time.Now()
186    stats := gc.Collect()
187    duration := time.Since(start)
188
189    t.Logf("GC pause time: %v", duration)
190    t.Logf("Objects marked: %d", stats.ObjectsMarked)
191    t.Logf("Objects swept: %d", stats.ObjectsSwept)
192    t.Logf("Bytes reclaimed: %d", stats.BytesReclaimed)
193
194    // Pause time should be reasonable
195    if duration > 100*time.Millisecond {
196        t.Errorf("GC pause time too long: %v", duration)
197    }
198}

Usage Example

 1package main
 2
 3import (
 4    "fmt"
 5    "gc"
 6)
 7
 8func main() {
 9    fmt.Println("=== Basic Garbage Collection ===")
10
11    // Create heap
12    heap := gc.NewHeap(1024 * 1024) // 1 MB
13    collector := gc.NewGC(heap, "mark-sweep")
14
15    // Allocate objects
16    obj1, _ := heap.Allocate(1024, 1)
17    obj2, _ := heap.Allocate(2048, 2)
18    obj3, _ := heap.Allocate(512, 1)
19
20    // Create object graph
21    obj1.AddReference(obj2)
22    heap.AddRoot(obj1)
23
24    fmt.Printf("Heap usage before GC: %d bytes\n", heap.BytesUsed)
25
26    // obj3 is garbage
27    stats := collector.Collect()
28
29    fmt.Printf("Heap usage after GC: %d bytes\n", heap.BytesUsed)
30    stats.Print()
31
32    fmt.Println("\n=== Generational Collection ===")
33
34    gh := gc.NewGenerationalHeap(64*1024, 1024*1024)
35
36    // Allocate in young generation
37    for i := 0; i < 100; i++ {
38        obj, _ := gh.Allocate(100, 1)
39        if i == 0 {
40            gh.Young.AddRoot(obj)
41        }
42    }
43
44    fmt.Printf("Young gen before GC: %d bytes\n", gh.Young.BytesUsed)
45
46    genGC := gc.NewGC(gh.Young, "generational")
47    genGC.MinorGC(gh)
48
49    fmt.Printf("Young gen after GC: %d bytes\n", gh.Young.BytesUsed)
50
51    fmt.Println("\n=== Concurrent Collection ===")
52
53    heap2 := gc.NewHeap(1024 * 1024)
54    cgc := gc.NewConcurrentGC(heap2)
55
56    root, _ := heap2.Allocate(100, 1)
57    child1, _ := heap2.Allocate(100, 1)
58    child2, _ := heap2.Allocate(100, 1)
59
60    root.AddReference(child1)
61    heap2.AddRoot(root)
62
63    cgc.ConcurrentCollect()
64
65    fmt.Println("Concurrent GC completed")
66    fmt.Printf("Live objects: %d\n", len(heap2.Objects))
67}

Testing Your Solution

Test these scenarios:

  1. Basic Collection: Allocate and free simple objects
  2. Reachability: Verify only reachable objects survive
  3. Cycles: Ensure circular references are collected
  4. Generational: Young objects promoted to old generation
  5. Concurrent: Mutator runs during marking
  6. Write Barriers: Mutations tracked correctly
  7. Performance: Measure pause times and throughput
  8. Memory: No leaks in GC metadata
  9. Stress Test: Handle 100,000+ objects

Verification Checklist:

  • Mark phase identifies all reachable objects
  • Sweep phase frees all garbage
  • Cycles are detected and collected
  • Generational GC reduces average pause time
  • Write barriers maintain tri-color invariant
  • No live objects are collected
  • No memory leaks in GC itself
  • Pause times are reasonable

Bonus Challenges

  1. Incremental Collection: Spread GC work across multiple cycles
  2. Parallel Marking: Use multiple threads for marking phase
  3. Compaction: Move objects to eliminate fragmentation
  4. Large Object Space: Separate space for large allocations
  5. Weak References: Support weak pointers that don't prevent collection
  6. Finalizers: Call cleanup code before freeing objects
  7. Heap Profiler: Track allocation sites and object lifetimes
  8. Adaptive Tuning: Automatically adjust GC parameters
  9. Region-Based Collection: Collect heap in regions
  10. Real-Time GC: Guarantee maximum pause time bounds

Key Takeaways

  • Garbage collection trades CPU time for programmer productivity by automating memory management
  • Mark-and-sweep handles cycles unlike reference counting
  • Generational collection exploits the weak generational hypothesis - most objects die young
  • Concurrent collection minimizes pause times using tri-color marking and write barriers
  • Write barriers track mutations during concurrent marking phases
  • Compaction reduces fragmentation but requires updating all pointers
  • GC tuning is application-specific - balance pause time vs throughput vs memory overhead
  • Understanding GC internals helps write efficient code in managed languages

Garbage collection is a fundamental technique in modern programming languages. While Go's GC is more sophisticated than this implementation, understanding these principles helps you write more efficient Go code and diagnose memory issues.

References