Exercise: Custom Garbage Collector
Difficulty - Advanced
Estimated Time - 10-12 hours
Learning Objectives
- Understand garbage collection algorithms and theory
- Implement mark-and-sweep and generational collection
- Master heap management and memory allocation strategies
- Build write barriers for tracking object mutations
- Implement garbage collection pausing and concurrency
- Measure GC performance metrics
Problem Statement
Build a custom garbage collector that automatically reclaims unused memory in a managed heap. The collector should support multiple collection algorithms, handle object graphs with cycles, minimize pause times, and provide performance metrics similar to production garbage collectors.
Garbage collection is fundamental to memory-safe languages like Go, Java, JavaScript, and Python. Understanding how GC works internally helps you write more efficient code and diagnose memory issues in production systems.
Real-World Scenario:
1// A managed heap with automatic garbage collection
2heap := NewManagedHeap(10 * 1024 * 1024) // 10 MB heap
3
4// Allocate objects without manual free
5obj1 := heap.Allocate(1024) // 1 KB object
6obj2 := heap.Allocate(2048) // 2 KB object
7obj3 := heap.Allocate(512) // 512 B object
8
9// Create object graph
10heap.AddReference(obj1, obj2) // obj1 -> obj2
11heap.AddReference(obj2, obj3) // obj2 -> obj3
12
13// Remove reference - obj3 becomes garbage
14heap.RemoveReference(obj2, obj3)
15obj3 = nil
16
17// GC automatically reclaims obj3
18stats := heap.CollectGarbage()
19fmt.Printf("Collected %d bytes in %v\n", stats.BytesReclaimed, stats.PauseTime)
20
21// No memory leaks, no manual free() calls!
Requirements
Functional Requirements
- Memory Allocation: Allocate objects from a managed heap
- Mark-and-Sweep GC: Implement basic mark-and-sweep algorithm
- Generational GC: Separate young and old generations for efficiency
- Reference Tracking: Track references between objects
- Root Set Management: Identify root objects
- Cycle Detection: Handle circular reference graphs
- Compaction: Reduce fragmentation by moving live objects
- Concurrent Collection: Minimize pause times with background GC
- Write Barriers: Track mutations during concurrent collection
- Performance Metrics: Measure pause time, throughput, and memory usage
Non-Functional Requirements
- Support heap sizes from 1 MB to 1 GB
- Pause times under 10ms for generational GC
- Memory overhead under 20% for GC metadata
- Handle object graphs with 100,000+ objects
- Thread-safe allocation and collection
Background and Theory
Garbage Collection Fundamentals
Garbage collection automatically reclaims memory that is no longer reachable from the program's root set. The root set includes:
- Stack variables
- Global variables
- CPU registers
- Other roots
Key Concepts:
- Reachability: An object is live if reachable from roots
- Garbage: Objects not reachable from any root
- Object Graph: Directed graph of object references
- Heap: Memory region for dynamically allocated objects
Garbage Collection Algorithms
1. Reference Counting
- Each object has a counter of references to it
- Increment on reference creation, decrement on deletion
- Free when count reaches zero
- Problem: Cannot handle cycles
2. Mark-and-Sweep
- Mark Phase: Traverse object graph from roots, mark reachable objects
- Sweep Phase: Scan heap, free unmarked objects
- Advantage: Handles cycles
- Disadvantage: Pause time proportional to heap size
3. Generational Collection
- Most objects die young
- Separate heap into young and old generations
- Collect young generation frequently
- Collect old generation rarely
- Advantage: Lower average pause times
4. Copying Collection
- Divide heap into two semi-spaces
- Allocate in one space until full
- Copy live objects to other space
- Swap spaces
- Advantage: Automatic compaction, fast allocation
- Disadvantage: 50% memory overhead
5. Mark-Compact
- Mark live objects
- Compact them to beginning of heap
- Update all references
- Advantage: No fragmentation, lower memory overhead than copying
- Disadvantage: Requires multiple passes
How Mark-and-Sweep Works
Initial Heap:
[A(live)] -> [B(live)] -> [C(dead)]
|
v
[D(dead)]
Root Set: {A}
Mark Phase:
1. Start from root A, mark it
2. Follow A's references, mark B
3. Follow B's references, mark C...
4. Marked: {A, B}, Unmarked: {C, D}
Sweep Phase:
1. Scan entire heap
2. Free unmarked objects C and D
3. Reset mark bits
Final Heap:
[A] -> [B] [FREE] [FREE]
Generational GC Example
Young Generation:
[A(new)] [B(new)] [C(new)] [D(old)] [E(new)]
Minor GC:
1. Mark from roots and remembered set
2. Copy survivors to survivor space or promote to old gen
3. Clear eden
Old Generation:
[D] [F] [G] ...
Major GC:
1. Mark entire heap from roots
2. Sweep or compact old generation
3. Much longer pause time
Use Cases
1. Language Runtimes
- Go
- Java
- JavaScript
- Python
2. Database Systems
- PostgreSQL MVCC garbage collection
- MongoDB WiredTiger cache eviction
- Redis memory eviction policies
3. Application-Level Memory Management
- Object pools
- Cached data eviction
- Resource cleanup
Implementation Challenges
Challenge 1: Safe Memory Allocation
Issue: Allocating memory in a managed heap requires finding free space and tracking metadata.
Solution: Use a free list or bump pointer allocation:
1type FreeBlock struct {
2 Size int
3 Next *FreeBlock
4}
5
6// First-fit allocation
7func allocate(size int) *Object {
8 // Round up to alignment
9 size = align(size, 8)
10
11 // Find free block
12 for fb := h.freeList; fb != nil; fb = fb.Next {
13 if fb.Size >= size {
14 // Split block if necessary
15 if fb.Size > size+minBlockSize {
16 remaining := fb.Size - size
17 newFB := &FreeBlock{Size: remaining, Next: fb.Next}
18 // Update free list
19 }
20 return fb
21 }
22 }
23 return nil // Out of memory
24}
Challenge 2: Concurrent Collection
Issue: Program execution must pause during GC to maintain consistency.
Solution: Tri-color marking with write barriers:
- White: Unvisited objects
- Gray: Visited but not scanned
- Black: Visited and scanned
Write Barrier: When a black object is modified to reference a white object, either:
- Mark the white object gray
- Mark the black object gray again
Challenge 3: Pointer Identification
Issue: Need to distinguish pointers from data in object fields.
Solution: Use type information or conservative GC:
1type ObjectMetadata struct {
2 Size int
3 TypeID int
4 Marked bool
5 Forwarded *Object // For copying/compaction
6}
7
8type TypeInfo struct {
9 Size int
10 PointerMap []int // Offsets of pointer fields
11}
Challenge 4: Heap Fragmentation
Issue: Free blocks scattered throughout heap prevent large allocations.
Solution: Implement compaction or use size-class segregation:
1// Compact heap by moving objects
2func compact() {
3 dest := h.base
4 for obj := range h.liveObjects() {
5 if obj != dest {
6 copy(dest, obj, obj.Size)
7 obj.ForwardingAddress = dest
8 }
9 dest += obj.Size
10 }
11 // Update all references to use forwarding addresses
12}
Hints
Hint 1: Start with a Simple Heap Structure
Define the basic heap and object structures:
1type Object struct {
2 Header ObjectHeader
3 Data []byte
4}
5
6type ObjectHeader struct {
7 Size int32
8 Marked bool
9 TypeID int16
10 RefCount int32 // For reference counting
11}
12
13type Heap struct {
14 Memory []byte
15 Objects map[uintptr]*Object
16 Roots []*Object
17 FreeList *FreeBlock
18 BytesUsed int64
19 BytesTotal int64
20}
Hint 2: Implement Mark Phase with DFS
Use depth-first search to mark reachable objects:
1func mark() {
2 // Mark all roots
3 for _, root := range gc.heap.Roots {
4 gc.markObject(root)
5 }
6}
7
8func markObject(obj *Object) {
9 if obj == nil || obj.Header.Marked {
10 return
11 }
12
13 obj.Header.Marked = true
14 gc.stats.ObjectsMarked++
15
16 // Mark all referenced objects
17 for _, ref := range gc.getReferences(obj) {
18 gc.markObject(ref)
19 }
20}
Hint 3: Implement Sweep Phase
Scan heap and free unmarked objects:
1func sweep() {
2 freedBytes := 0
3
4 for addr, obj := range gc.heap.Objects {
5 if !obj.Header.Marked {
6 // Unmarked = garbage, free it
7 freedBytes += obj.Header.Size
8 gc.freeObject(obj)
9 delete(gc.heap.Objects, addr)
10 } else {
11 // Clear mark for next GC cycle
12 obj.Header.Marked = false
13 }
14 }
15
16 gc.stats.BytesReclaimed = freedBytes
17}
Hint 4: Add Generational Collection
Separate young and old generations:
1type GenerationalHeap struct {
2 Young *Heap // Frequently collected
3 Old *Heap // Rarely collected
4 RememberedSet map[*Object][]*Object // Old->Young refs
5}
6
7func minorGC() {
8 // Mark from roots and remembered set
9 gc := NewGC(h.Young)
10 for oldObj, youngRefs := range h.RememberedSet {
11 for _, ref := range youngRefs {
12 gc.markObject(ref)
13 }
14 }
15 gc.mark()
16
17 // Promote survivors to old generation
18 for obj := range h.Young.Objects {
19 if obj.Header.Marked && obj.Age > ageThreshold {
20 h.promote(obj)
21 }
22 }
23
24 gc.sweep()
25}
Hint 5: Add Write Barriers for Concurrent GC
Track mutations during concurrent marking:
1func WriteBarrier(obj *Object, field int, newRef *Object) {
2 // Dijkstra write barrier
3 if gc.InProgress && obj.Color == Black && newRef.Color == White {
4 // Mark newly referenced object as gray
5 gc.MarkGray(newRef)
6 }
7
8 // Update the reference
9 obj.SetReference(field, newRef)
10}
Solution
Show Complete Solution
Approach
We'll implement a comprehensive garbage collector with:
- Basic mark-and-sweep GC for correctness
- Generational GC for performance
- Write barriers for concurrent collection
- Compaction to reduce fragmentation
- Performance metrics for monitoring
Implementation
1package gc
2
3import (
4 "fmt"
5 "sync"
6 "time"
7 "unsafe"
8)
9
10// ===== Object and Heap Structures =====
11
12// ObjectHeader contains GC metadata for each object
13type ObjectHeader struct {
14 Size int32 // Object size in bytes
15 TypeID int16 // Type identifier
16 Marked bool // Mark bit for GC
17 Age uint8 // Generation age
18 Forwarded uintptr // Forwarding pointer for compaction
19 RefCount int32 // Reference count
20}
21
22// Object represents a managed object
23type Object struct {
24 Header ObjectHeader
25 References []*Object // Pointers to other objects
26 Data []byte // Actual object data
27}
28
29// NewObject creates a new object
30func NewObject(size int, typeID int) *Object {
31 return &Object{
32 Header: ObjectHeader{
33 Size: int32(size),
34 TypeID: int16(typeID),
35 },
36 References: make([]*Object, 0),
37 Data: make([]byte, size),
38 }
39}
40
41// AddReference adds a reference to another object
42func AddReference(ref *Object) {
43 o.References = append(o.References, ref)
44}
45
46// RemoveReference removes a reference
47func RemoveReference(ref *Object) {
48 for i, r := range o.References {
49 if r == ref {
50 o.References = append(o.References[:i], o.References[i+1:]...)
51 return
52 }
53 }
54}
55
56// ===== Heap Management =====
57
58// Heap represents a managed memory region
59type Heap struct {
60 mu sync.RWMutex
61 Memory []byte
62 Objects map[uintptr]*Object
63 Roots []*Object
64 BytesUsed int64
65 BytesTotal int64
66 BytesAllocated int64
67 NextGCBytes int64 // Trigger GC when this threshold is reached
68}
69
70// NewHeap creates a new managed heap
71func NewHeap(size int) *Heap {
72 return &Heap{
73 Memory: make([]byte, size),
74 Objects: make(map[uintptr]*Object),
75 Roots: make([]*Object, 0),
76 BytesTotal: int64(size),
77 NextGCBytes: int64(size / 2), // GC when 50% full
78 }
79}
80
81// Allocate allocates an object in the heap
82func Allocate(size int, typeID int) {
83 h.mu.Lock()
84 defer h.mu.Unlock()
85
86 // Check if GC is needed
87 if h.BytesUsed >= h.NextGCBytes {
88 h.mu.Unlock()
89 h.triggerGC()
90 h.mu.Lock()
91 }
92
93 // Check if we have space
94 if h.BytesUsed+int64(size) > h.BytesTotal {
95 return nil, fmt.Errorf("out of memory")
96 }
97
98 obj := NewObject(size, typeID)
99 addr := uintptr(unsafe.Pointer(obj))
100 h.Objects[addr] = obj
101
102 h.BytesUsed += int64(size)
103 h.BytesAllocated += int64(size)
104
105 return obj, nil
106}
107
108// AddRoot adds a root object
109func AddRoot(obj *Object) {
110 h.mu.Lock()
111 defer h.mu.Unlock()
112 h.Roots = append(h.Roots, obj)
113}
114
115// RemoveRoot removes a root object
116func RemoveRoot(obj *Object) {
117 h.mu.Lock()
118 defer h.mu.Unlock()
119
120 for i, r := range h.Roots {
121 if r == obj {
122 h.Roots = append(h.Roots[:i], h.Roots[i+1:]...)
123 return
124 }
125 }
126}
127
128// triggerGC triggers garbage collection
129func triggerGC() {
130 // This will be called by the GC
131}
132
133// ===== Garbage Collector =====
134
135// GCStats tracks garbage collection metrics
136type GCStats struct {
137 NumCollections int
138 TotalPauseTime time.Duration
139 LastPauseTime time.Duration
140 BytesReclaimed int64
141 ObjectsMarked int
142 ObjectsSwept int
143 HeapSizeBefore int64
144 HeapSizeAfter int64
145}
146
147// GC represents the garbage collector
148type GC struct {
149 Heap *Heap
150 Stats GCStats
151 Algorithm string // "mark-sweep", "generational", "concurrent"
152 Concurrent bool
153 Running bool
154 mu sync.Mutex
155}
156
157// NewGC creates a new garbage collector
158func NewGC(heap *Heap, algorithm string) *GC {
159 return &GC{
160 Heap: heap,
161 Algorithm: algorithm,
162 }
163}
164
165// Collect performs garbage collection
166func Collect() GCStats {
167 gc.mu.Lock()
168 if gc.Running {
169 gc.mu.Unlock()
170 return gc.Stats
171 }
172 gc.Running = true
173 gc.mu.Unlock()
174
175 defer func() {
176 gc.mu.Lock()
177 gc.Running = false
178 gc.mu.Unlock()
179 }()
180
181 start := time.Now()
182
183 switch gc.Algorithm {
184 case "mark-sweep":
185 gc.markAndSweep()
186 case "generational":
187 gc.generationalCollect()
188 case "concurrent":
189 gc.concurrentCollect()
190 default:
191 gc.markAndSweep()
192 }
193
194 pauseTime := time.Since(start)
195 gc.Stats.NumCollections++
196 gc.Stats.LastPauseTime = pauseTime
197 gc.Stats.TotalPauseTime += pauseTime
198
199 return gc.Stats
200}
201
202// ===== Mark-and-Sweep Algorithm =====
203
204// markAndSweep performs mark-and-sweep collection
205func markAndSweep() {
206 gc.Heap.mu.Lock()
207 defer gc.Heap.mu.Unlock()
208
209 gc.Stats.HeapSizeBefore = gc.Heap.BytesUsed
210
211 // Mark phase
212 gc.mark()
213
214 // Sweep phase
215 gc.sweep()
216
217 gc.Stats.HeapSizeAfter = gc.Heap.BytesUsed
218 gc.Stats.BytesReclaimed = gc.Stats.HeapSizeBefore - gc.Stats.HeapSizeAfter
219
220 // Update GC threshold
221 gc.Heap.NextGCBytes = gc.Heap.BytesUsed * 2
222}
223
224// mark marks all reachable objects
225func mark() {
226 gc.Stats.ObjectsMarked = 0
227
228 // Mark from roots
229 for _, root := range gc.Heap.Roots {
230 gc.markObject(root)
231 }
232}
233
234// markObject marks an object and its references
235func markObject(obj *Object) {
236 if obj == nil || obj.Header.Marked {
237 return
238 }
239
240 obj.Header.Marked = true
241 gc.Stats.ObjectsMarked++
242
243 // Recursively mark references
244 for _, ref := range obj.References {
245 gc.markObject(ref)
246 }
247}
248
249// sweep frees unmarked objects
250func sweep() {
251 gc.Stats.ObjectsSwept = 0
252
253 for addr, obj := range gc.Heap.Objects {
254 if !obj.Header.Marked {
255 // Object is garbage, free it
256 gc.Heap.BytesUsed -= int64(obj.Header.Size)
257 gc.Stats.ObjectsSwept++
258 delete(gc.Heap.Objects, addr)
259 } else {
260 // Clear mark for next GC
261 obj.Header.Marked = false
262 }
263 }
264}
265
266// ===== Generational GC =====
267
268// GenerationalHeap separates young and old generations
269type GenerationalHeap struct {
270 Young *Heap
271 Old *Heap
272 RememberedSet map[*Object][]*Object // Old->Young references
273 mu sync.RWMutex
274}
275
276// NewGenerationalHeap creates a generational heap
277func NewGenerationalHeap(youngSize, oldSize int) *GenerationalHeap {
278 return &GenerationalHeap{
279 Young: NewHeap(youngSize),
280 Old: NewHeap(oldSize),
281 RememberedSet: make(map[*Object][]*Object),
282 }
283}
284
285// Allocate allocates in young generation
286func Allocate(size int, typeID int) {
287 obj, err := gh.Young.Allocate(size, typeID)
288 if err != nil {
289 // Young gen full, try minor GC
290 gc := NewGC(gh.Young, "generational")
291 gc.minorGC(gh)
292 return gh.Young.Allocate(size, typeID)
293 }
294 return obj, nil
295}
296
297// AddCrossGenerationalReference tracks old->young references
298func AddCrossGenerationalReference(old, young *Object) {
299 gh.mu.Lock()
300 defer gh.mu.Unlock()
301
302 if _, exists := gh.RememberedSet[old]; !exists {
303 gh.RememberedSet[old] = make([]*Object, 0)
304 }
305 gh.RememberedSet[old] = append(gh.RememberedSet[old], young)
306}
307
308// minorGC collects young generation
309func minorGC(gh *GenerationalHeap) {
310 gh.Young.mu.Lock()
311 defer gh.Young.mu.Unlock()
312
313 // Mark from roots
314 for _, root := range gh.Young.Roots {
315 gc.markObject(root)
316 }
317
318 // Mark from remembered set
319 for _, youngRefs := range gh.RememberedSet {
320 for _, ref := range youngRefs {
321 gc.markObject(ref)
322 }
323 }
324
325 // Promote survivors to old generation
326 ageThreshold := uint8(3)
327 for addr, obj := range gh.Young.Objects {
328 if obj.Header.Marked {
329 obj.Header.Age++
330 if obj.Header.Age > ageThreshold {
331 // Promote to old generation
332 gh.Old.Objects[addr] = obj
333 gh.Young.BytesUsed -= int64(obj.Header.Size)
334 gh.Old.BytesUsed += int64(obj.Header.Size)
335 delete(gh.Young.Objects, addr)
336 }
337 }
338 }
339
340 // Sweep young generation
341 gc.sweep()
342}
343
344// majorGC collects entire heap
345func majorGC(gh *GenerationalHeap) {
346 // Collect both generations
347 gc.Heap = gh.Old
348 gc.markAndSweep()
349
350 gc.Heap = gh.Young
351 gc.markAndSweep()
352}
353
354// ===== Concurrent GC with Tri-Color Marking =====
355
356type Color byte
357
358const (
359 White Color = iota // Unvisited
360 Gray // Visited, not scanned
361 Black // Visited and scanned
362)
363
364// ConcurrentGC implements tri-color concurrent marking
365type ConcurrentGC struct {
366 *GC
367 Colors map[*Object]Color
368 GrayQueue []*Object
369 WriteLog []*Object
370 mu sync.Mutex
371}
372
373// NewConcurrentGC creates a concurrent GC
374func NewConcurrentGC(heap *Heap) *ConcurrentGC {
375 return &ConcurrentGC{
376 GC: NewGC(heap, "concurrent"),
377 Colors: make(map[*Object]Color),
378 GrayQueue: make([]*Object, 0),
379 WriteLog: make([]*Object, 0),
380 }
381}
382
383// concurrentCollect performs concurrent collection
384func concurrentCollect() {
385 // Initial marking
386 cgc.initialMark()
387
388 // Concurrent marking
389 cgc.concurrentMark()
390
391 // Final marking
392 cgc.finalMark()
393
394 // Sweep
395 cgc.sweep()
396}
397
398// initialMark marks roots
399func initialMark() {
400 cgc.Heap.mu.Lock()
401 defer cgc.Heap.mu.Unlock()
402
403 // All objects start as white
404 for _, obj := range cgc.Heap.Objects {
405 cgc.Colors[obj] = White
406 }
407
408 // Mark roots as gray
409 for _, root := range cgc.Heap.Roots {
410 cgc.Colors[root] = Gray
411 cgc.GrayQueue = append(cgc.GrayQueue, root)
412 }
413}
414
415// concurrentMark processes gray objects concurrently
416func concurrentMark() {
417 for len(cgc.GrayQueue) > 0 {
418 cgc.mu.Lock()
419 if len(cgc.GrayQueue) == 0 {
420 cgc.mu.Unlock()
421 break
422 }
423
424 obj := cgc.GrayQueue[0]
425 cgc.GrayQueue = cgc.GrayQueue[1:]
426 cgc.mu.Unlock()
427
428 // Mark object as black
429 cgc.Colors[obj] = Black
430
431 // Mark references as gray
432 for _, ref := range obj.References {
433 if cgc.Colors[ref] == White {
434 cgc.Colors[ref] = Gray
435 cgc.mu.Lock()
436 cgc.GrayQueue = append(cgc.GrayQueue, ref)
437 cgc.mu.Unlock()
438 }
439 }
440 }
441}
442
443// finalMark processes write log
444func finalMark() {
445 cgc.Heap.mu.Lock()
446 defer cgc.Heap.mu.Unlock()
447
448 // Process objects mutated during concurrent phase
449 for _, obj := range cgc.WriteLog {
450 if cgc.Colors[obj] == Black {
451 // Re-scan black objects that were mutated
452 for _, ref := range obj.References {
453 if cgc.Colors[ref] == White {
454 cgc.Colors[ref] = Gray
455 cgc.GrayQueue = append(cgc.GrayQueue, ref)
456 }
457 }
458 }
459 }
460
461 // Finish marking
462 for len(cgc.GrayQueue) > 0 {
463 obj := cgc.GrayQueue[0]
464 cgc.GrayQueue = cgc.GrayQueue[1:]
465
466 cgc.Colors[obj] = Black
467
468 for _, ref := range obj.References {
469 if cgc.Colors[ref] == White {
470 cgc.Colors[ref] = Gray
471 cgc.GrayQueue = append(cgc.GrayQueue, ref)
472 }
473 }
474 }
475}
476
477// WriteBarrier is called when an object is mutated
478func WriteBarrier(obj *Object, ref *Object) {
479 if !cgc.Running {
480 return
481 }
482
483 cgc.mu.Lock()
484 defer cgc.mu.Unlock()
485
486 // Dijkstra write barrier
487 if cgc.Colors[obj] == Black && cgc.Colors[ref] == White {
488 cgc.Colors[ref] = Gray
489 cgc.GrayQueue = append(cgc.GrayQueue, ref)
490 }
491
492 cgc.WriteLog = append(cgc.WriteLog, obj)
493}
494
495// ===== Helper Functions =====
496
497// GetStats returns current GC statistics
498func GetStats() GCStats {
499 return gc.Stats
500}
501
502// PrintStats prints GC statistics
503func Print() {
504 fmt.Println("=== GC Statistics ===")
505 fmt.Printf("Collections: %d\n", stats.NumCollections)
506 fmt.Printf("Total Pause Time: %v\n", stats.TotalPauseTime)
507 fmt.Printf("Last Pause Time: %v\n", stats.LastPauseTime)
508 fmt.Printf("Bytes Reclaimed: %d\n", stats.BytesReclaimed)
509 fmt.Printf("Objects Marked: %d\n", stats.ObjectsMarked)
510 fmt.Printf("Objects Swept: %d\n", stats.ObjectsSwept)
511 fmt.Printf("Heap Before: %d bytes\n", stats.HeapSizeBefore)
512 fmt.Printf("Heap After: %d bytes\n", stats.HeapSizeAfter)
513}
Testing
1package gc
2
3import (
4 "testing"
5 "time"
6)
7
8func TestHeap_Allocation(t *testing.T) {
9 heap := NewHeap(1024 * 1024) // 1 MB
10
11 obj, err := heap.Allocate(1024, 1)
12 if err != nil {
13 t.Fatalf("Allocation failed: %v", err)
14 }
15
16 if obj.Header.Size != 1024 {
17 t.Errorf("Expected size 1024, got %d", obj.Header.Size)
18 }
19
20 if heap.BytesUsed != 1024 {
21 t.Errorf("Expected BytesUsed 1024, got %d", heap.BytesUsed)
22 }
23}
24
25func TestGC_MarkAndSweep(t *testing.T) {
26 heap := NewHeap(10 * 1024) // 10 KB
27 gc := NewGC(heap, "mark-sweep")
28
29 // Allocate objects
30 obj1, _ := heap.Allocate(100, 1)
31 obj2, _ := heap.Allocate(100, 1)
32 obj3, _ := heap.Allocate(100, 1)
33
34 // Create references
35 obj1.AddReference(obj2)
36
37 // Add roots
38 heap.AddRoot(obj1)
39
40 // obj3 is garbage
41
42 beforeBytes := heap.BytesUsed
43
44 // Collect
45 stats := gc.Collect()
46
47 afterBytes := heap.BytesUsed
48
49 // Should have collected obj3
50 if stats.ObjectsSwept != 1 {
51 t.Errorf("Expected 1 object swept, got %d", stats.ObjectsSwept)
52 }
53
54 if beforeBytes-afterBytes != 100 {
55 t.Errorf("Expected 100 bytes reclaimed, got %d", beforeBytes-afterBytes)
56 }
57}
58
59func TestGC_CycleDetection(t *testing.T) {
60 heap := NewHeap(10 * 1024)
61 gc := NewGC(heap, "mark-sweep")
62
63 // Create cycle: obj1 -> obj2 -> obj3 -> obj1
64 obj1, _ := heap.Allocate(100, 1)
65 obj2, _ := heap.Allocate(100, 1)
66 obj3, _ := heap.Allocate(100, 1)
67
68 obj1.AddReference(obj2)
69 obj2.AddReference(obj3)
70 obj3.AddReference(obj1)
71
72 // No roots - entire cycle is garbage
73
74 stats := gc.Collect()
75
76 // Should collect all 3 objects
77 if stats.ObjectsSwept != 3 {
78 t.Errorf("Expected 3 objects swept, got %d", stats.ObjectsSwept)
79 }
80
81 if heap.BytesUsed != 0 {
82 t.Errorf("Expected 0 bytes used after GC, got %d", heap.BytesUsed)
83 }
84}
85
86func TestGenerationalHeap_MinorGC(t *testing.T) {
87 gh := NewGenerationalHeap(1024, 10*1024)
88 gc := NewGC(gh.Young, "generational")
89
90 // Allocate in young generation
91 youngObj, _ := gh.Allocate(100, 1)
92 gh.Young.AddRoot(youngObj)
93
94 // Trigger minor GC
95 gc.minorGC(gh)
96
97 // Young object should survive
98 if gc.Stats.ObjectsSwept != 0 {
99 t.Errorf("Expected 0 objects swept, got %d", gc.Stats.ObjectsSwept)
100 }
101
102 // Allocate garbage
103 garbage, _ := gh.Allocate(100, 1)
104 _ = garbage
105
106 // Trigger minor GC
107 gc.minorGC(gh)
108
109 // Garbage should be collected
110 if gc.Stats.ObjectsSwept != 1 {
111 t.Errorf("Expected 1 object swept, got %d", gc.Stats.ObjectsSwept)
112 }
113}
114
115func TestConcurrentGC_TriColorMarking(t *testing.T) {
116 heap := NewHeap(10 * 1024)
117 cgc := NewConcurrentGC(heap)
118
119 // Allocate objects
120 obj1, _ := heap.Allocate(100, 1)
121 obj2, _ := heap.Allocate(100, 1)
122 obj3, _ := heap.Allocate(100, 1)
123
124 obj1.AddReference(obj2)
125 heap.AddRoot(obj1)
126
127 // Collect concurrently
128 cgc.concurrentCollect()
129
130 // obj1 and obj2 should be black
131 // obj3 should be white
132
133 if cgc.Colors[obj1] != Black {
134 t.Error("Expected obj1 to be black")
135 }
136
137 if cgc.Colors[obj2] != Black {
138 t.Error("Expected obj2 to be black")
139 }
140
141 if cgc.Colors[obj3] != White {
142 t.Error("Expected obj3 to be white")
143 }
144}
145
146func BenchmarkAllocation(b *testing.B) {
147 heap := NewHeap(100 * 1024 * 1024) // 100 MB
148
149 b.ResetTimer()
150 for i := 0; i < b.N; i++ {
151 heap.Allocate(1024, 1)
152 }
153}
154
155func BenchmarkGC_MarkAndSweep(b *testing.B) {
156 heap := NewHeap(10 * 1024 * 1024) // 10 MB
157 gc := NewGC(heap, "mark-sweep")
158
159 // Allocate some objects
160 for i := 0; i < 1000; i++ {
161 obj, _ := heap.Allocate(1024, 1)
162 if i%10 == 0 {
163 heap.AddRoot(obj)
164 }
165 }
166
167 b.ResetTimer()
168 for i := 0; i < b.N; i++ {
169 gc.Collect()
170 }
171}
172
173func TestGC_PauseTime(t *testing.T) {
174 heap := NewHeap(1024 * 1024) // 1 MB
175 gc := NewGC(heap, "mark-sweep")
176
177 // Allocate many small objects
178 for i := 0; i < 100; i++ {
179 obj, _ := heap.Allocate(100, 1)
180 if i%10 == 0 {
181 heap.AddRoot(obj)
182 }
183 }
184
185 start := time.Now()
186 stats := gc.Collect()
187 duration := time.Since(start)
188
189 t.Logf("GC pause time: %v", duration)
190 t.Logf("Objects marked: %d", stats.ObjectsMarked)
191 t.Logf("Objects swept: %d", stats.ObjectsSwept)
192 t.Logf("Bytes reclaimed: %d", stats.BytesReclaimed)
193
194 // Pause time should be reasonable
195 if duration > 100*time.Millisecond {
196 t.Errorf("GC pause time too long: %v", duration)
197 }
198}
Usage Example
1package main
2
3import (
4 "fmt"
5 "gc"
6)
7
8func main() {
9 fmt.Println("=== Basic Garbage Collection ===")
10
11 // Create heap
12 heap := gc.NewHeap(1024 * 1024) // 1 MB
13 collector := gc.NewGC(heap, "mark-sweep")
14
15 // Allocate objects
16 obj1, _ := heap.Allocate(1024, 1)
17 obj2, _ := heap.Allocate(2048, 2)
18 obj3, _ := heap.Allocate(512, 1)
19
20 // Create object graph
21 obj1.AddReference(obj2)
22 heap.AddRoot(obj1)
23
24 fmt.Printf("Heap usage before GC: %d bytes\n", heap.BytesUsed)
25
26 // obj3 is garbage
27 stats := collector.Collect()
28
29 fmt.Printf("Heap usage after GC: %d bytes\n", heap.BytesUsed)
30 stats.Print()
31
32 fmt.Println("\n=== Generational Collection ===")
33
34 gh := gc.NewGenerationalHeap(64*1024, 1024*1024)
35
36 // Allocate in young generation
37 for i := 0; i < 100; i++ {
38 obj, _ := gh.Allocate(100, 1)
39 if i == 0 {
40 gh.Young.AddRoot(obj)
41 }
42 }
43
44 fmt.Printf("Young gen before GC: %d bytes\n", gh.Young.BytesUsed)
45
46 genGC := gc.NewGC(gh.Young, "generational")
47 genGC.MinorGC(gh)
48
49 fmt.Printf("Young gen after GC: %d bytes\n", gh.Young.BytesUsed)
50
51 fmt.Println("\n=== Concurrent Collection ===")
52
53 heap2 := gc.NewHeap(1024 * 1024)
54 cgc := gc.NewConcurrentGC(heap2)
55
56 root, _ := heap2.Allocate(100, 1)
57 child1, _ := heap2.Allocate(100, 1)
58 child2, _ := heap2.Allocate(100, 1)
59
60 root.AddReference(child1)
61 heap2.AddRoot(root)
62
63 cgc.ConcurrentCollect()
64
65 fmt.Println("Concurrent GC completed")
66 fmt.Printf("Live objects: %d\n", len(heap2.Objects))
67}
Testing Your Solution
Test these scenarios:
- Basic Collection: Allocate and free simple objects
- Reachability: Verify only reachable objects survive
- Cycles: Ensure circular references are collected
- Generational: Young objects promoted to old generation
- Concurrent: Mutator runs during marking
- Write Barriers: Mutations tracked correctly
- Performance: Measure pause times and throughput
- Memory: No leaks in GC metadata
- Stress Test: Handle 100,000+ objects
Verification Checklist:
- Mark phase identifies all reachable objects
- Sweep phase frees all garbage
- Cycles are detected and collected
- Generational GC reduces average pause time
- Write barriers maintain tri-color invariant
- No live objects are collected
- No memory leaks in GC itself
- Pause times are reasonable
Bonus Challenges
- Incremental Collection: Spread GC work across multiple cycles
- Parallel Marking: Use multiple threads for marking phase
- Compaction: Move objects to eliminate fragmentation
- Large Object Space: Separate space for large allocations
- Weak References: Support weak pointers that don't prevent collection
- Finalizers: Call cleanup code before freeing objects
- Heap Profiler: Track allocation sites and object lifetimes
- Adaptive Tuning: Automatically adjust GC parameters
- Region-Based Collection: Collect heap in regions
- Real-Time GC: Guarantee maximum pause time bounds
Key Takeaways
- Garbage collection trades CPU time for programmer productivity by automating memory management
- Mark-and-sweep handles cycles unlike reference counting
- Generational collection exploits the weak generational hypothesis - most objects die young
- Concurrent collection minimizes pause times using tri-color marking and write barriers
- Write barriers track mutations during concurrent marking phases
- Compaction reduces fragmentation but requires updating all pointers
- GC tuning is application-specific - balance pause time vs throughput vs memory overhead
- Understanding GC internals helps write efficient code in managed languages
Garbage collection is a fundamental technique in modern programming languages. While Go's GC is more sophisticated than this implementation, understanding these principles helps you write more efficient Go code and diagnose memory issues.