Just-In-Time Compiler

Exercise: Just-In-Time Compiler

Difficulty - Advanced

Learning Objectives

  • Understand JIT compilation principles and techniques
  • Implement bytecode interpretation and native code generation
  • Master low-level memory management and executable memory regions
  • Build optimization passes for generated code
  • Handle runtime type information and dynamic dispatch
  • Implement register allocation and calling conventions

Problem Statement

Build a Just-In-Time compiler that translates a simple bytecode language into native machine code at runtime. The compiler should support basic arithmetic operations, control flow, and function calls, with optimization passes that improve the generated code.

JIT compilation is a critical technology in modern programming language runtimes. It combines the portability of interpreted languages with the performance of compiled languages by generating native machine code at runtime based on actual execution patterns.

Core Components

 1package jit
 2
 3import (
 4    "encoding/binary"
 5    "fmt"
 6    "syscall"
 7    "unsafe"
 8)
 9
10// Opcode represents a bytecode instruction
11type Opcode byte
12
13const (
14    OP_LOAD_CONST Opcode = iota
15    OP_ADD
16    OP_SUB
17    OP_MUL
18    OP_DIV
19    OP_RETURN
20)
21
22// Bytecode represents compiled bytecode
23type Bytecode struct {
24    Instructions []byte
25    Constants    []int64
26}
27
28// VM executes bytecode with profiling
29type VM struct {
30    Stack         []int64
31    ProfileCounts map[int]int
32    JITCache      map[int]CompiledFunc
33    JITThreshold  int
34}
35
36// CompiledFunc is a JIT-compiled function
37type CompiledFunc func() int64
38
39// CodeBuffer manages native code generation
40type CodeBuffer struct {
41    Code []byte
42    Mem  []byte // Executable memory
43}
44
45func Execute(bc *Bytecode) int64
46func Interpret(bc *Bytecode) int64
47func Compile(bc *Bytecode)

Solution

Click to see the solution

Algorithm Overview

The JIT compiler implements a tiered compilation approach with three levels:

1. Tier 0 - Interpreter:

  • Simple bytecode interpreter as baseline
  • Profiles execution to identify hot code
  • Minimal overhead for rarely executed code
  • Uses: Cold code paths, one-time initialization

2. Tier 1 - Template JIT:

  • Direct translation of bytecode to machine code
  • No optimization, just eliminates interpretation overhead
  • Fast compilation for moderately hot code
  • Generates: Stack-based x86-64 instructions

3. Tier 2 - Optimizing JIT:

  • Constant folding, dead code elimination
  • Register allocation for better performance
  • Triggered after many executions
  • Generates: Optimized register-based code

Profiling-Guided Compilation:

  • Track execution counts per bytecode location
  • Compile after threshold
  • Cache compiled functions to avoid recompilation
  • Fall back to interpreter if JIT fails

Time Complexity Analysis

Operation Interpreter Template JIT Optimizing JIT
OP_LOAD_CONST O(1) O(1) O(1)
OP_ADD/SUB/MUL O(1) O(1) O(1)
OP_DIV O(1) O(1) O(1)
Compilation N/A O(n) bytecodes O(n²) optimization
Execution ~100ns/op ~10ns/op ~2ns/op

Space Complexity:

  • Bytecode: O(n) where n = number of instructions
  • Interpreter stack: O(d) where d = max stack depth
  • JIT code buffer: ~10-20 bytes per bytecode op
  • Total overhead: ~100-200 bytes per function

Implementation

  1package jit
  2
  3import (
  4    "encoding/binary"
  5    "fmt"
  6    "syscall"
  7    "unsafe"
  8)
  9
 10// ===== Bytecode Definition =====
 11
 12// Opcode represents a bytecode instruction
 13type Opcode byte
 14
 15const (
 16    OP_LOAD_CONST Opcode = iota // Push constant from pool
 17    OP_LOAD_VAR                 // Push variable from frame
 18    OP_STORE_VAR                // Pop and store in variable
 19    OP_ADD                      // Pop 2, add, push result
 20    OP_SUB                      // Pop 2, subtract, push result
 21    OP_MUL                      // Pop 2, multiply, push result
 22    OP_DIV                      // Pop 2, divide, push result
 23    OP_MOD                      // Pop 2, modulo, push result
 24    OP_NEG                      // Pop 1, negate, push result
 25    OP_EQ                       // Pop 2, compare equal
 26    OP_NE                       // Pop 2, compare not equal
 27    OP_LT                       // Pop 2, compare less than
 28    OP_LE                       // Pop 2, compare less or equal
 29    OP_GT                       // Pop 2, compare greater than
 30    OP_GE                       // Pop 2, compare greater or equal
 31    OP_JUMP                     // Unconditional jump
 32    OP_JUMP_IF_FALSE            // Pop and jump if false
 33    OP_CALL                     // Call function
 34    OP_RETURN                   // Return from function
 35    OP_PRINT                    // Print top of stack
 36    OP_DUP                      // Duplicate top of stack
 37    OP_POP                      // Pop and discard
 38)
 39
 40// Bytecode represents compiled bytecode
 41type Bytecode struct {
 42    Instructions []byte
 43    Constants    []int64
 44    NumVars      int
 45}
 46
 47// AddInstruction adds an instruction to bytecode
 48func AddInstruction(op Opcode, args ...int) {
 49    b.Instructions = append(b.Instructions, byte(op))
 50    for _, arg := range args {
 51        b.Instructions = append(b.Instructions, byte(arg))
 52    }
 53}
 54
 55// AddConstant adds a constant and returns its index
 56func AddConstant(value int64) int {
 57    idx := len(b.Constants)
 58    b.Constants = append(b.Constants, value)
 59    return idx
 60}
 61
 62// ===== Interpreter =====
 63
 64type VM struct {
 65    Stack         []int64
 66    Vars          []int64
 67    IP            int // Instruction pointer
 68    ProfileCounts map[int]int
 69    JITCache      map[int]CompiledFunc
 70    JITThreshold  int
 71}
 72
 73func NewVM() *VM {
 74    return &VM{
 75        Stack:         make([]int64, 0, 256),
 76        Vars:          make([]int64, 256),
 77        ProfileCounts: make(map[int]int),
 78        JITCache:      make(map[int]CompiledFunc),
 79        JITThreshold:  100,
 80    }
 81}
 82
 83func Push(value int64) {
 84    vm.Stack = append(vm.Stack, value)
 85}
 86
 87func Pop() int64 {
 88    if len(vm.Stack) == 0 {
 89        panic("stack underflow")
 90    }
 91    value := vm.Stack[len(vm.Stack)-1]
 92    vm.Stack = vm.Stack[:len(vm.Stack)-1]
 93    return value
 94}
 95
 96func Peek() int64 {
 97    if len(vm.Stack) == 0 {
 98        return 0
 99    }
100    return vm.Stack[len(vm.Stack)-1]
101}
102
103// Execute runs bytecode with profiling and JIT
104func Execute(bc *Bytecode) int64 {
105    // Check if we should JIT compile
106    vm.ProfileCounts[0]++
107
108    if vm.ProfileCounts[0] >= vm.JITThreshold {
109        if compiledFunc, exists := vm.JITCache[0]; exists {
110            // Use JIT compiled version
111            return compiledFunc()
112        }
113
114        // Try to JIT compile
115        if compiledFunc, err := Compile(bc); err == nil {
116            vm.JITCache[0] = compiledFunc
117            return compiledFunc()
118        }
119    }
120
121    // Fall back to interpretation
122    return vm.Interpret(bc)
123}
124
125// Interpret executes bytecode in interpreter
126func Interpret(bc *Bytecode) int64 {
127    vm.IP = 0
128    vm.Stack = vm.Stack[:0] // Clear stack
129
130    for vm.IP < len(bc.Instructions) {
131        opcode := Opcode(bc.Instructions[vm.IP])
132        vm.IP++
133
134        switch opcode {
135        case OP_LOAD_CONST:
136            idx := int(bc.Instructions[vm.IP])
137            vm.IP++
138            vm.Push(bc.Constants[idx])
139
140        case OP_LOAD_VAR:
141            idx := int(bc.Instructions[vm.IP])
142            vm.IP++
143            vm.Push(vm.Vars[idx])
144
145        case OP_STORE_VAR:
146            idx := int(bc.Instructions[vm.IP])
147            vm.IP++
148            vm.Vars[idx] = vm.Pop()
149
150        case OP_ADD:
151            b := vm.Pop()
152            a := vm.Pop()
153            vm.Push(a + b)
154
155        case OP_SUB:
156            b := vm.Pop()
157            a := vm.Pop()
158            vm.Push(a - b)
159
160        case OP_MUL:
161            b := vm.Pop()
162            a := vm.Pop()
163            vm.Push(a * b)
164
165        case OP_DIV:
166            b := vm.Pop()
167            a := vm.Pop()
168            if b == 0 {
169                panic("division by zero")
170            }
171            vm.Push(a / b)
172
173        case OP_MOD:
174            b := vm.Pop()
175            a := vm.Pop()
176            vm.Push(a % b)
177
178        case OP_NEG:
179            vm.Push(-vm.Pop())
180
181        case OP_EQ:
182            b := vm.Pop()
183            a := vm.Pop()
184            if a == b {
185                vm.Push(1)
186            } else {
187                vm.Push(0)
188            }
189
190        case OP_NE:
191            b := vm.Pop()
192            a := vm.Pop()
193            if a != b {
194                vm.Push(1)
195            } else {
196                vm.Push(0)
197            }
198
199        case OP_LT:
200            b := vm.Pop()
201            a := vm.Pop()
202            if a < b {
203                vm.Push(1)
204            } else {
205                vm.Push(0)
206            }
207
208        case OP_DUP:
209            vm.Push(vm.Peek())
210
211        case OP_POP:
212            vm.Pop()
213
214        case OP_RETURN:
215            return vm.Pop()
216
217        default:
218            panic(fmt.Sprintf("unknown opcode: %d", opcode))
219        }
220    }
221
222    if len(vm.Stack) > 0 {
223        return vm.Pop()
224    }
225    return 0
226}
227
228// ===== JIT Compiler =====
229
230type CompiledFunc func() int64
231
232type CodeBuffer struct {
233    Code []byte
234    Mem  []byte // Executable memory
235}
236
237func NewCodeBuffer() *CodeBuffer {
238    return &CodeBuffer{
239        Code: make([]byte, 0, 4096),
240    }
241}
242
243// AllocateExecutable allocates executable memory and copies code
244func AllocateExecutable() {
245    size := len(cb.Code)
246    if size == 0 {
247        return nil, fmt.Errorf("no code to execute")
248    }
249
250    // Round up to page size
251    pageSize := 4096
252    size = / pageSize) * pageSize
253
254    // Allocate executable memory
255    mem, err := syscall.Mmap(
256        -1, 0, size,
257        syscall.PROT_READ|syscall.PROT_WRITE|syscall.PROT_EXEC,
258        syscall.MAP_PRIVATE|syscall.MAP_ANONYMOUS,
259    )
260    if err != nil {
261        return nil, fmt.Errorf("mmap failed: %w", err)
262    }
263
264    // Copy code to executable memory
265    copy(mem, cb.Code)
266    cb.Mem = mem
267
268    // Create function pointer
269    return *(*func() int64)(unsafe.Pointer(&mem)), nil
270}
271
272func Free() error {
273    if cb.Mem != nil {
274        return syscall.Munmap(cb.Mem)
275    }
276    return nil
277}
278
279// x86-64 code generation helpers
280
281// EmitPrologue sets up stack frame
282func EmitPrologue() {
283    // PUSH RBP
284    cb.Code = append(cb.Code, 0x55)
285    // MOV RBP, RSP
286    cb.Code = append(cb.Code, 0x48, 0x89, 0xE5)
287}
288
289// EmitEpilogue tears down stack frame
290func EmitEpilogue() {
291    // POP RBP
292    cb.Code = append(cb.Code, 0x5D)
293    // RET
294    cb.Code = append(cb.Code, 0xC3)
295}
296
297// EmitMovImm64 emits: MOV reg, imm64
298func EmitMovImm64(reg int, value int64) {
299    // REX.W + B8+reg
300    cb.Code = append(cb.Code, 0x48, byte(0xB8+reg))
301    // 8-byte immediate
302    buf := make([]byte, 8)
303    binary.LittleEndian.PutUint64(buf, uint64(value))
304    cb.Code = append(cb.Code, buf...)
305}
306
307// EmitPush emits: PUSH reg
308func EmitPush(reg int) {
309    cb.Code = append(cb.Code, byte(0x50+reg))
310}
311
312// EmitPop emits: POP reg
313func EmitPop(reg int) {
314    cb.Code = append(cb.Code, byte(0x58+reg))
315}
316
317// EmitAdd emits: ADD dst, src
318func EmitAdd(dst, src int) {
319    // ADD r64, r64: REX.W + 01 /r
320    cb.Code = append(cb.Code, 0x48, 0x01, byte(0xC0+(src<<3)+dst))
321}
322
323// EmitSub emits: SUB dst, src
324func EmitSub(dst, src int) {
325    // SUB r64, r64: REX.W + 29 /r
326    cb.Code = append(cb.Code, 0x48, 0x29, byte(0xC0+(src<<3)+dst))
327}
328
329// EmitIMul emits: IMUL dst, src
330func EmitIMul(dst, src int) {
331    // IMUL r64, r64: REX.W + 0F AF /r
332    cb.Code = append(cb.Code, 0x48, 0x0F, 0xAF, byte(0xC0+(dst<<3)+src))
333}
334
335// EmitIDiv emits: IDIV src
336func EmitIDiv(src int) {
337    // CQO
338    cb.Code = append(cb.Code, 0x48, 0x99)
339    // IDIV r64: REX.W + F7 /7
340    cb.Code = append(cb.Code, 0x48, 0xF7, byte(0xF8+src))
341}
342
343// EmitNeg emits: NEG reg
344func EmitNeg(reg int) {
345    // NEG r64: REX.W + F7 /3
346    cb.Code = append(cb.Code, 0x48, 0xF7, byte(0xD8+reg))
347}
348
349// EmitRet emits: RET
350func EmitRet() {
351    cb.Code = append(cb.Code, 0xC3)
352}
353
354// Register allocation
355const (
356    RAX = 0
357    RCX = 1
358    RDX = 2
359    RBX = 3
360    RSP = 4
361    RBP = 5
362    RSI = 6
363    RDI = 7
364)
365
366// Compile translates bytecode to native code
367func Compile(bc *Bytecode) {
368    cb := NewCodeBuffer()
369
370    cb.EmitPrologue()
371
372    // Use RAX as accumulator, RBX as temp
373    // We simulate a stack using native stack
374
375    ip := 0
376    for ip < len(bc.Instructions) {
377        opcode := Opcode(bc.Instructions[ip])
378        ip++
379
380        switch opcode {
381        case OP_LOAD_CONST:
382            idx := int(bc.Instructions[ip])
383            ip++
384            value := bc.Constants[idx]
385
386            // MOV RAX, value
387            cb.EmitMovImm64(RAX, value)
388            // PUSH RAX
389            cb.EmitPush(RAX)
390
391        case OP_ADD:
392            // POP RAX
393            cb.EmitPop(RAX)
394            // POP RBX
395            cb.EmitPop(RBX)
396            // ADD RAX, RBX
397            cb.EmitAdd(RAX, RBX)
398            // PUSH RAX
399            cb.EmitPush(RAX)
400
401        case OP_SUB:
402            // POP RBX
403            cb.EmitPop(RBX)
404            // POP RAX
405            cb.EmitPop(RAX)
406            // SUB RAX, RBX
407            cb.EmitSub(RAX, RBX)
408            // PUSH RAX
409            cb.EmitPush(RAX)
410
411        case OP_MUL:
412            // POP RAX
413            cb.EmitPop(RAX)
414            // POP RBX
415            cb.EmitPop(RBX)
416            // IMUL RAX, RBX
417            cb.EmitIMul(RAX, RBX)
418            // PUSH RAX
419            cb.EmitPush(RAX)
420
421        case OP_DIV:
422            // POP RBX
423            cb.EmitPop(RBX)
424            // POP RAX
425            cb.EmitPop(RAX)
426            // IDIV RBX
427            cb.EmitIDiv(RBX)
428            // PUSH RAX
429            cb.EmitPush(RAX)
430
431        case OP_NEG:
432            // POP RAX
433            cb.EmitPop(RAX)
434            // NEG RAX
435            cb.EmitNeg(RAX)
436            // PUSH RAX
437            cb.EmitPush(RAX)
438
439        case OP_RETURN:
440            // POP RAX
441            cb.EmitPop(RAX)
442            // Epilogue and return
443            cb.EmitEpilogue()
444            goto done
445
446        default:
447            return nil, fmt.Errorf("unsupported opcode in JIT: %d", opcode)
448        }
449    }
450
451    // If no explicit return, return 0
452    cb.EmitMovImm64(RAX, 0)
453    cb.EmitEpilogue()
454
455done:
456    return cb.AllocateExecutable()
457}
458
459// ===== Optimizer =====
460
461type Optimizer struct{}
462
463// OptimizeBytecode applies optimization passes
464func Optimize(bc *Bytecode) *Bytecode {
465    optimized := &Bytecode{
466        Instructions: make([]byte, 0, len(bc.Instructions)),
467        Constants:    bc.Constants,
468        NumVars:      bc.NumVars,
469    }
470
471    // Pass 1: Constant folding
472    o.constantFolding(bc, optimized)
473
474    return optimized
475}
476
477// constantFolding folds constant expressions
478func constantFolding(bc *Bytecode, out *Bytecode) {
479    ip := 0
480    stack := []int64{}
481
482    for ip < len(bc.Instructions) {
483        opcode := Opcode(bc.Instructions[ip])
484        ip++
485
486        switch opcode {
487        case OP_LOAD_CONST:
488            idx := int(bc.Instructions[ip])
489            ip++
490            stack = append(stack, bc.Constants[idx])
491
492        case OP_ADD, OP_SUB, OP_MUL, OP_DIV:
493            // Check if we can fold
494            if len(stack) >= 2 {
495                b := stack[len(stack)-1]
496                a := stack[len(stack)-2]
497                stack = stack[:len(stack)-2]
498
499                var result int64
500                switch opcode {
501                case OP_ADD:
502                    result = a + b
503                case OP_SUB:
504                    result = a - b
505                case OP_MUL:
506                    result = a * b
507                case OP_DIV:
508                    if b != 0 {
509                        result = a / b
510                    } else {
511                        // Can't fold division by zero
512                        goto noFold
513                    }
514                }
515
516                stack = append(stack, result)
517                continue
518            }
519
520        noFold:
521            // Flush stack to output
522            for _, val := range stack {
523                idx := out.AddConstant(val)
524                out.AddInstruction(OP_LOAD_CONST, idx)
525            }
526            stack = stack[:0]
527
528            // Add the operation
529            out.Instructions = append(out.Instructions, byte(opcode))
530            if opcode == OP_LOAD_CONST {
531                out.Instructions = append(out.Instructions, bc.Instructions[ip-1])
532            }
533
534        case OP_RETURN:
535            // Flush stack
536            for _, val := range stack {
537                idx := out.AddConstant(val)
538                out.AddInstruction(OP_LOAD_CONST, idx)
539            }
540            stack = stack[:0]
541            out.AddInstruction(OP_RETURN)
542
543        default:
544            // Flush and copy
545            for _, val := range stack {
546                idx := out.AddConstant(val)
547                out.AddInstruction(OP_LOAD_CONST, idx)
548            }
549            stack = stack[:0]
550            out.Instructions = append(out.Instructions, byte(opcode))
551        }
552    }
553}

Usage Example

 1package main
 2
 3import (
 4    "fmt"
 5    "jit"
 6)
 7
 8func main() {
 9    // Example 1: Simple expression
10    fmt.Println("=== Simple Expression ===")
11
12    bc1 := &jit.Bytecode{}
13    idx1 := bc1.AddConstant(15)
14    idx2 := bc1.AddConstant(7)
15
16    bc1.AddInstruction(jit.OP_LOAD_CONST, idx1)
17    bc1.AddInstruction(jit.OP_LOAD_CONST, idx2)
18    bc1.AddInstruction(jit.OP_MUL)
19    bc1.AddInstruction(jit.OP_RETURN)
20
21    vm := jit.NewVM()
22    fmt.Printf("Interpreted result: %d\n", vm.Interpret(bc1))
23
24    compiled, _ := jit.Compile(bc1)
25    fmt.Printf("JIT compiled result: %d\n", compiled())
26
27    // Example 2: Complex expression
28    fmt.Println("\n=== Complex Expression ===")
29
30    bc2 := &jit.Bytecode{}
31    // / 3) - 10
32    c1 := bc2.AddConstant(100)
33    c2 := bc2.AddConstant(50)
34    c3 := bc2.AddConstant(3)
35    c4 := bc2.AddConstant(10)
36
37    bc2.AddInstruction(jit.OP_LOAD_CONST, c1)
38    bc2.AddInstruction(jit.OP_LOAD_CONST, c2)
39    bc2.AddInstruction(jit.OP_ADD)
40    bc2.AddInstruction(jit.OP_LOAD_CONST, c3)
41    bc2.AddInstruction(jit.OP_DIV)
42    bc2.AddInstruction(jit.OP_LOAD_CONST, c4)
43    bc2.AddInstruction(jit.OP_SUB)
44    bc2.AddInstruction(jit.OP_RETURN)
45
46    fmt.Printf("Result: %d\n", vm.Interpret(bc2))
47
48    // Example 3: With optimization
49    fmt.Println("\n=== With Optimization ===")
50
51    optimizer := &jit.Optimizer{}
52    optimized := optimizer.Optimize(bc2)
53
54    fmt.Printf("Original instructions: %d bytes\n", len(bc2.Instructions))
55    fmt.Printf("Optimized instructions: %d bytes\n", len(optimized.Instructions))
56    fmt.Printf("Optimized result: %d\n", vm.Interpret(optimized))
57}

Benchmarking Code

 1package jit_test
 2
 3import (
 4    "testing"
 5    "jit"
 6)
 7
 8func BenchmarkInterpreter(b *testing.B) {
 9    bc := &jit.Bytecode{}
10
11    // Compute: * 5
12    idx1 := bc.AddConstant(10)
13    idx2 := bc.AddConstant(20)
14    idx3 := bc.AddConstant(5)
15
16    bc.AddInstruction(jit.OP_LOAD_CONST, idx1)
17    bc.AddInstruction(jit.OP_LOAD_CONST, idx2)
18    bc.AddInstruction(jit.OP_ADD)
19    bc.AddInstruction(jit.OP_LOAD_CONST, idx3)
20    bc.AddInstruction(jit.OP_MUL)
21    bc.AddInstruction(jit.OP_RETURN)
22
23    vm := jit.NewVM()
24
25    b.ResetTimer()
26    for i := 0; i < b.N; i++ {
27        vm.Interpret(bc)
28    }
29}
30
31func BenchmarkJIT(b *testing.B) {
32    bc := &jit.Bytecode{}
33
34    // Compute: * 5
35    idx1 := bc.AddConstant(10)
36    idx2 := bc.AddConstant(20)
37    idx3 := bc.AddConstant(5)
38
39    bc.AddInstruction(jit.OP_LOAD_CONST, idx1)
40    bc.AddInstruction(jit.OP_LOAD_CONST, idx2)
41    bc.AddInstruction(jit.OP_ADD)
42    bc.AddInstruction(jit.OP_LOAD_CONST, idx3)
43    bc.AddInstruction(jit.OP_MUL)
44    bc.AddInstruction(jit.OP_RETURN)
45
46    compiledFunc, err := jit.Compile(bc)
47    if err != nil {
48        b.Fatalf("Compilation failed: %v", err)
49    }
50
51    b.ResetTimer()
52    for i := 0; i < b.N; i++ {
53        compiledFunc()
54    }
55}
56
57// Example benchmark results:
58// BenchmarkInterpreter-8    5000000    250 ns/op    0 B/op    0 allocs/op
59// BenchmarkJIT-8          200000000      5 ns/op    0 B/op    0 allocs/op
60// Speedup: 50x faster with JIT!

Production Considerations

1. Advanced Register Allocation:

 1type RegisterAllocator struct {
 2    Available []bool // Which registers are free
 3    VarToReg  map[int]int // Variable to register mapping
 4}
 5
 6func Allocate() {
 7    for i, avail := range ra.Available {
 8        if avail {
 9            ra.Available[i] = false
10            return i, nil
11        }
12    }
13    return -1, fmt.Errorf("out of registers")
14}
15
16func Free(reg int) {
17    ra.Available[reg] = true
18}

2. Tiered Compilation Strategy:

 1type TieredCompiler struct {
 2    Threshold1 int // Interpreter -> Template JIT
 3    Threshold2 int // Template -> Optimizing JIT
 4}
 5
 6func ShouldCompile(count int) int {
 7    if count < tc.Threshold1 {
 8        return 0 // Interpret
 9    } else if count < tc.Threshold2 {
10        return 1 // Template JIT
11    } else {
12        return 2 // Optimizing JIT
13    }
14}

3. Inline Caching for Dynamic Dispatch:

 1type InlineCache struct {
 2    Expected int64  // Expected value
 3    Target   uintptr // Cached code pointer
 4}
 5
 6func Lookup(value int64) {
 7    if value == ic.Expected {
 8        return ic.Target, true
 9    }
10    return 0, false
11}

4. Deoptimization Support:

1type Deoptimizer struct {
2    SafePoints map[uintptr]int // PC -> bytecode offset
3}
4
5func Deoptimize(pc uintptr, vm *VM) {
6    offset := d.SafePoints[pc]
7    // Restore interpreter state at this offset
8    vm.IP = offset
9}

Performance Characteristics:

Metric Interpreter Template JIT Optimizing JIT
Compilation time 0ms ~1ms ~10ms
Execution speed 1x 10-30x faster 50-100x faster
Code size N/A ~15 bytes/op ~8 bytes/op
Warmup time None Fast Slow
Best for Cold code Warm code Hot code

Scaling Considerations:

  • Use code cache to avoid recompilation
  • Implement tiered compilation for startup performance
  • Add speculative optimization with deopt support
  • Monitor code cache size and evict cold code
  • Profile to identify true hot paths

Key Takeaways

  • JIT compilation bridges interpreted and compiled execution by generating native code at runtime
  • Profile-guided optimization uses runtime information to make better optimization decisions
  • Executable memory requires special OS permissions
  • x86-64 encoding is complex but can be wrapped in helper functions
  • Optimization must preserve semantics - aggressive optimization can break programs
  • Tiered compilation balances startup time and peak performance by compiling hot code first
  • Register allocation is critical for performance - stack-based code is simple but slower
  • System V ABI defines calling conventions that must be followed for interop

References