Exercise: Merkle Tree Cryptographic Verification
Difficulty - Advanced
Estimated Time - 3-4 hours
Learning Objectives
- Understand cryptographic hash trees and their security properties
- Implement Merkle trees for efficient data verification
- Master proof generation and verification algorithms
- Apply Merkle trees to blockchain and distributed systems
- Design systems with tamper-evident data structures
- Optimize tree construction and proof size
Problem Statement
How can you verify the integrity of large datasets efficiently? A naive approach would hash all the data, but what if you want to verify just one piece of data, or detect which specific data was tampered with? Merkle trees solve this elegantly by organizing cryptographic hashes in a binary tree structure.
A Merkle tree is a data structure where each leaf node contains a hash of data, and each non-leaf node contains a hash of its children. This creates a hierarchy where the root hash represents the entire dataset. To verify a single piece of data, you only need O(log n) hashes, not the entire dataset.
Real-World Scenario:
Consider a blockchain or distributed file system where nodes need to verify data without downloading everything:
1// Without Merkle tree: Download everything to verify
2type NaiveVerification struct {
3 allData [][]byte // Must download all blocks
4}
5
6func Verify(data []byte, index int) bool {
7 // Need to hash ALL data to verify one piece
8 fullHash := hashAll(nv.allData)
9 return fullHash == trustedRoot
10}
11
12// With Merkle tree: Only need log(n) hashes
13type MerkleVerification struct {
14 root []byte // Just the root hash
15}
16
17func Verify(data []byte, proof [][]byte) bool {
18 // Verify using only O(log n) hashes
19 return mv.VerifyProof(data, proof, mv.root)
20}
For 1 million blocks, naive verification requires downloading ~1 GB. With a Merkle tree, verification needs only ~20 hashes!
Requirements
Functional Requirements
-
Tree Construction
- Build Merkle tree from list of data blocks
- Support any number of leaves
- Calculate all interior node hashes
- Generate and store root hash
-
Proof Generation
- Generate authentication path for any leaf
- Proof should contain minimal hashes needed to verify
- Support multi-leaf proofs
- Generate compact binary representation
-
Proof Verification
- Verify single leaf given proof and root hash
- Verify multiple leaves with combined proof
- Detect invalid or tampered proofs
- Verify against trusted root hash
-
Tree Updates
- Update single leaf and recompute affected hashes
- Efficiently update multiple leaves in batch
- Generate delta proof showing what changed
- Support append operations
-
Advanced Features
- Serialize and deserialize tree structure
- Support different hash algorithms
- Implement sparse Merkle trees for key-value stores
- Generate inclusion/exclusion proofs
Non-Functional Requirements
- Security: Collision-resistant hash function
- Efficiency: O(log n) proof size and verification time
- Memory: Efficient storage, don't duplicate data
- Correctness: Cryptographically secure verification
- Flexibility: Support trees with millions of leaves
Background and Theory
Merkle trees were invented by Ralph Merkle in 1979 and patented in 1982. They are fundamental to many modern technologies.
How Merkle Trees Work
Tree Structure:
Root Hash
/ \
Hash(A,B) Hash(C,D)
/ \ / \
Hash(A) Hash(B) Hash(C) Hash(D)
| | | |
Data A Data B Data C Data D
Construction:
- Hash each data block to create leaf nodes
- Pair leaf hashes and hash them to create parent nodes
- Continue pairing and hashing up the tree
- The top hash is the Merkle root
Verification:
To verify Data A is in the tree:
- Get proof: [Hash(B), Hash(C,D)]
- Compute Hash(A) from Data A
- Compute Hash(A,B) from Hash(A) and Hash(B)
- Compute Root from Hash(A,B) and Hash(C,D)
- Compare computed root with trusted root
If they match, Data A is authentic!
Security Properties
Collision Resistance: Finding two different inputs that hash to same value is computationally infeasible. This ensures:
- Cannot forge data that produces same root
- Cannot substitute one data block for another
- Any tampering changes the root hash
Efficient Verification:
- Proof size: O(log n) hashes
- Verification time: O(log n) hash operations
- For n=1,000,000: only 20 hashes needed
Tamper Evidence:
- Changing any leaf changes its parent
- Change propagates to root
- Different root = tampering detected
Use Cases
- Bitcoin/Blockchain: Verify transactions without full blockchain
- Git: Content-addressable storage with integrity
- IPFS: Distributed file system with verification
- Certificate Transparency: Audit log of SSL certificates
- Database Replication: Efficient synchronization detection
- File Synchronization: Detect changed files efficiently
- Distributed Systems: Byzantine fault tolerance with verified state
Implementation Challenges
Challenge 1: Handling Non-Power-of-2 Leaves
When the number of leaves is not a power of 2, the tree is unbalanced. How do you handle odd numbers at each level?
Solution Approach:
- Duplicate last node: If odd number of nodes, duplicate the last one
- Pass up single node: Don't pair if single node remains
- Use placeholder: Add empty node with zero hash
Each has trade-offs; duplication is most common in practice.
Challenge 2: Efficient Tree Storage
Storing the entire tree in memory can be wasteful, especially for large datasets.
Solution Approach:
- Store only the leaves and compute interior nodes on-demand
- Use implicit tree representation
- For sparse trees, use a map to store only non-empty nodes
- Implement lazy evaluation for unused branches
Challenge 3: Proof Compactness
Proofs should be minimal - include only necessary hashes.
Solution Approach:
- For single leaf: O(log n) hashes from leaf to root
- For adjacent leaves: Share common path nodes
- Use bit flags to indicate left/right position
- Compress proof representation
Challenge 4: Concurrent Tree Updates
Multiple updates might happen simultaneously in distributed systems.
Solution Approach:
- Implement copy-on-write for immutable trees
- Use version numbers for each root
- Lock-free updates for independent subtrees
- Batch updates to minimize recomputation
Hints
Hint 1: Tree Representation
Use a slice-based binary tree representation for efficiency:
1type MerkleTree struct {
2 leaves [][]byte // Original data
3 nodes [][]byte // All hashes
4 hashFn hash.Hash // Hash function
5}
6
7// For a complete binary tree with n leaves:
8// - Total nodes = 2*n - 1
9// - nodes[0] is root
10// - nodes[2*i+1] is left child of nodes[i]
11// - nodes[2*i+2] is right child of nodes[i]
12// - Parent of nodes[i] is nodes[(i-1)/2]
This representation provides O(1) navigation and is cache-friendly.
Hint 2: Building the Tree
Build from bottom up:
1func Build() {
2 // Start with leaf hashes
3 currentLevel := mt.hashLeaves()
4
5 // Build tree level by level
6 for len(currentLevel) > 1 {
7 nextLevel := [][]byte{}
8
9 for i := 0; i < len(currentLevel); i += 2 {
10 if i+1 < len(currentLevel) {
11 // Pair exists
12 parent := mt.hashPair(currentLevel[i], currentLevel[i+1])
13 nextLevel = append(nextLevel, parent)
14 } else {
15 // Odd node: duplicate it
16 parent := mt.hashPair(currentLevel[i], currentLevel[i])
17 nextLevel = append(nextLevel, parent)
18 }
19 }
20
21 currentLevel = nextLevel
22 }
23
24 mt.root = currentLevel[0]
25}
Hint 3: Generating Proofs
A proof is the sibling path from leaf to root:
1func GenerateProof(leafIndex int) [][]byte {
2 proof := [][]byte{}
3 index := mt.leafToNodeIndex(leafIndex)
4
5 // Walk up the tree
6 for index > 0 {
7 // Get sibling hash
8 siblingIndex := mt.getSiblingIndex(index)
9 proof = append(proof, mt.nodes[siblingIndex])
10
11 // Move to parent
12 index = mt.getParentIndex(index)
13 }
14
15 return proof
16}
17
18func getSiblingIndex(index int) int {
19 if index%2 == 1 {
20 return index + 1 // Right sibling
21 }
22 return index - 1 // Left sibling
23}
Hint 4: Verifying Proofs
Recompute root from leaf using proof hashes:
1func VerifyProof(leaf []byte, proof [][]byte, root []byte) bool {
2 currentHash := hashData(leaf)
3
4 for _, siblingHash := range proof {
5 // Determine if current is left or right child
6 //
7 if isLeftChild {
8 currentHash = hashPair(currentHash, siblingHash)
9 } else {
10 currentHash = hashPair(siblingHash, currentHash)
11 }
12 }
13
14 return bytes.Equal(currentHash, root)
15}
Note: Need to include position information in proof.
Hint 5: Efficient Updates
Only recompute affected path:
1func UpdateLeaf(leafIndex int, newData []byte) {
2 // Update leaf
3 mt.leaves[leafIndex] = newData
4
5 // Recompute only the path to root
6 index := mt.leafToNodeIndex(leafIndex)
7 mt.nodes[index] = mt.hashData(newData)
8
9 // Walk up and recompute parents
10 for index > 0 {
11 parentIndex := mt.getParentIndex(index)
12 siblingIndex := mt.getSiblingIndex(index)
13
14 left := mt.nodes[min(index, siblingIndex)]
15 right := mt.nodes[max(index, siblingIndex)]
16
17 mt.nodes[parentIndex] = mt.hashPair(left, right)
18 index = parentIndex
19 }
20}
This is O(log n) instead of rebuilding entire tree O(n).
Solution
Show Complete Solution
Approach
This implementation provides:
- Standard Merkle Tree: Binary tree with SHA-256 hashing
- Proof System: Generate and verify proofs with position tracking
- Multi-Proof: Efficiently prove multiple leaves simultaneously
- Tree Updates: Efficient single-leaf and batch updates
- Sparse Merkle Tree: For key-value stores with inclusion/exclusion proofs
- Serialization: Save and load tree state
Implementation
1package merkletree
2
3import (
4 "bytes"
5 "crypto/sha256"
6 "encoding/hex"
7 "errors"
8 "fmt"
9 "math"
10)
11
12// MerkleTree represents a binary Merkle tree
13type MerkleTree struct {
14 leaves [][]byte
15 nodes [][]byte
16 root []byte
17}
18
19// NewMerkleTree creates a new Merkle tree from data blocks
20func NewMerkleTree(data [][]byte) {
21 if len(data) == 0 {
22 return nil, errors.New("cannot create tree with zero leaves")
23 }
24
25 mt := &MerkleTree{
26 leaves: make([][]byte, len(data)),
27 }
28
29 // Copy leaves
30 for i, d := range data {
31 mt.leaves[i] = make([]byte, len(d))
32 copy(mt.leaves[i], d)
33 }
34
35 // Build tree
36 mt.build()
37
38 return mt, nil
39}
40
41// build constructs the Merkle tree from leaves
42func build() {
43 numLeaves := len(mt.leaves)
44
45 // Calculate total nodes in tree
46 // For n leaves, tree has 2*n-1 nodes
47 treeSize := 2*nextPowerOf2(numLeaves) - 1
48 mt.nodes = make([][]byte, treeSize)
49
50 // Hash leaves and place in tree
51 leafOffset := nextPowerOf2(numLeaves) - 1
52 for i, leaf := range mt.leaves {
53 mt.nodes[leafOffset+i] = hashData(leaf)
54 }
55
56 // Handle odd number of leaves
57 if numLeaves != nextPowerOf2(numLeaves) {
58 for i := numLeaves; i < nextPowerOf2(numLeaves); i++ {
59 mt.nodes[leafOffset+i] = mt.nodes[leafOffset+numLeaves-1]
60 }
61 }
62
63 // Build tree bottom-up
64 for i := leafOffset - 1; i >= 0; i-- {
65 left := mt.nodes[2*i+1]
66 right := mt.nodes[2*i+2]
67 mt.nodes[i] = hashPair(left, right)
68 }
69
70 mt.root = mt.nodes[0]
71}
72
73// Root returns the Merkle root hash
74func Root() []byte {
75 result := make([]byte, len(mt.root))
76 copy(result, mt.root)
77 return result
78}
79
80// RootHex returns the root hash as hex string
81func RootHex() string {
82 return hex.EncodeToString(mt.root)
83}
84
85// Proof represents a Merkle proof
86type Proof struct {
87 LeafIndex int
88 Leaf []byte
89 Hashes [][]byte
90 Positions []bool // true = right, false = left
91}
92
93// GenerateProof creates a proof for a specific leaf
94func GenerateProof(leafIndex int) {
95 if leafIndex < 0 || leafIndex >= len(mt.leaves) {
96 return nil, errors.New("leaf index out of bounds")
97 }
98
99 proof := &Proof{
100 LeafIndex: leafIndex,
101 Leaf: mt.leaves[leafIndex],
102 Hashes: [][]byte{},
103 Positions: []bool{},
104 }
105
106 // Start at leaf position in tree
107 leafOffset := nextPowerOf2(len(mt.leaves)) - 1
108 index := leafOffset + leafIndex
109
110 // Walk up to root
111 for index > 0 {
112 isRightChild := index%2 == 0
113 siblingIndex := index
114 if isRightChild {
115 siblingIndex = index - 1
116 } else {
117 siblingIndex = index + 1
118 }
119
120 proof.Hashes = append(proof.Hashes, mt.nodes[siblingIndex])
121 proof.Positions = append(proof.Positions, !isRightChild) // Store sibling position
122
123 // Move to parent
124 index = / 2
125 }
126
127 return proof, nil
128}
129
130// VerifyProof verifies a Merkle proof against the tree root
131func VerifyProof(proof *Proof) bool {
132 return VerifyProofWithRoot(proof, mt.root)
133}
134
135// VerifyProofWithRoot verifies a proof against a given root
136func VerifyProofWithRoot(proof *Proof, root []byte) bool {
137 currentHash := hashData(proof.Leaf)
138
139 for i, siblingHash := range proof.Hashes {
140 if proof.Positions[i] { // Sibling is on right
141 currentHash = hashPair(currentHash, siblingHash)
142 } else { // Sibling is on left
143 currentHash = hashPair(siblingHash, currentHash)
144 }
145 }
146
147 return bytes.Equal(currentHash, root)
148}
149
150// UpdateLeaf updates a single leaf and recomputes affected nodes
151func UpdateLeaf(leafIndex int, newData []byte) error {
152 if leafIndex < 0 || leafIndex >= len(mt.leaves) {
153 return errors.New("leaf index out of bounds")
154 }
155
156 // Update leaf
157 mt.leaves[leafIndex] = make([]byte, len(newData))
158 copy(mt.leaves[leafIndex], newData)
159
160 // Recompute path to root
161 leafOffset := nextPowerOf2(len(mt.leaves)) - 1
162 index := leafOffset + leafIndex
163 mt.nodes[index] = hashData(newData)
164
165 // Walk up and recompute parents
166 for index > 0 {
167 parentIndex := / 2
168 leftChild := 2*parentIndex + 1
169 rightChild := 2*parentIndex + 2
170
171 mt.nodes[parentIndex] = hashPair(mt.nodes[leftChild], mt.nodes[rightChild])
172 index = parentIndex
173 }
174
175 mt.root = mt.nodes[0]
176 return nil
177}
178
179// GetLeaf returns a copy of the leaf at the given index
180func GetLeaf(leafIndex int) {
181 if leafIndex < 0 || leafIndex >= len(mt.leaves) {
182 return nil, errors.New("leaf index out of bounds")
183 }
184
185 result := make([]byte, len(mt.leaves[leafIndex]))
186 copy(result, mt.leaves[leafIndex])
187 return result, nil
188}
189
190// NumLeaves returns the number of leaves in the tree
191func NumLeaves() int {
192 return len(mt.leaves)
193}
194
195// Helper functions
196
197func hashData(data []byte) []byte {
198 hash := sha256.Sum256(data)
199 return hash[:]
200}
201
202func hashPair(left, right []byte) []byte {
203 combined := append(left, right...)
204 hash := sha256.Sum256(combined)
205 return hash[:]
206}
207
208func nextPowerOf2(n int) int {
209 if n <= 0 {
210 return 1
211 }
212 return int(math.Pow(2, math.Ceil(math.Log2(float64(n)))))
213}
214
215// MultiProof represents proofs for multiple leaves
216type MultiProof struct {
217 LeafIndices []int
218 Leaves [][]byte
219 Hashes [][]byte
220 Flags []bool // Reconstruction flags
221}
222
223// GenerateMultiProof creates an efficient proof for multiple leaves
224func GenerateMultiProof(leafIndices []int) {
225 // Validate indices
226 for _, idx := range leafIndices {
227 if idx < 0 || idx >= len(mt.leaves) {
228 return nil, errors.New("leaf index out of bounds")
229 }
230 }
231
232 // For simplicity, generate individual proofs
233 // A more efficient implementation would share common path nodes
234 mp := &MultiProof{
235 LeafIndices: leafIndices,
236 Leaves: make([][]byte, len(leafIndices)),
237 Hashes: [][]byte{},
238 }
239
240 for i, idx := range leafIndices {
241 mp.Leaves[i] = mt.leaves[idx]
242 }
243
244 // Collect all required hashes
245 seen := make(map[string]bool)
246 for _, idx := range leafIndices {
247 proof, _ := mt.GenerateProof(idx)
248 for _, hash := range proof.Hashes {
249 hashStr := hex.EncodeToString(hash)
250 if !seen[hashStr] {
251 seen[hashStr] = true
252 mp.Hashes = append(mp.Hashes, hash)
253 }
254 }
255 }
256
257 return mp, nil
258}
259
260// SparseMerkleTree represents a sparse Merkle tree for key-value stores
261type SparseMerkleTree struct {
262 root []byte
263 nodes map[string][]byte // hash -> value
264 leaves map[string][]byte // key -> value
265 height int
266 defaultHash []byte
267}
268
269// NewSparseMerkleTree creates a sparse Merkle tree of given height
270func NewSparseMerkleTree(height int) *SparseMerkleTree {
271 smt := &SparseMerkleTree{
272 nodes: make(map[string][]byte),
273 leaves: make(map[string][]byte),
274 height: height,
275 defaultHash: hashData([]byte{}),
276 }
277
278 // Compute default hashes for empty tree
279 smt.root = smt.computeDefaultRoot()
280
281 return smt
282}
283
284func computeDefaultRoot() []byte {
285 current := smt.defaultHash
286 for i := 0; i < smt.height; i++ {
287 current = hashPair(current, current)
288 }
289 return current
290}
291
292// Set inserts or updates a key-value pair
293func Set(key, value []byte) {
294 keyHash := hashData(key)
295 smt.leaves[string(keyHash)] = value
296 smt.recomputeRoot()
297}
298
299// Get retrieves a value by key
300func Get(key []byte) {
301 keyHash := hashData(key)
302 val, exists := smt.leaves[string(keyHash)]
303 return val, exists
304}
305
306func recomputeRoot() {
307 // Simplified recomputation
308 smt.nodes = make(map[string][]byte)
309
310 // Build tree from leaves
311 //
312 leaves := make([][]byte, 0, len(smt.leaves))
313 for _, v := range smt.leaves {
314 leaves = append(leaves, v)
315 }
316
317 if len(leaves) == 0 {
318 smt.root = smt.computeDefaultRoot()
319 return
320 }
321
322 // Build standard Merkle tree
323 tree, _ := NewMerkleTree(leaves)
324 smt.root = tree.Root()
325}
326
327// Root returns the root hash
328func Root() []byte {
329 result := make([]byte, len(smt.root))
330 copy(result, smt.root)
331 return result
332}
333
334// MerkleTreeFromHashes creates a tree from pre-computed leaf hashes
335func MerkleTreeFromHashes(leafHashes [][]byte) *MerkleTree {
336 mt := &MerkleTree{
337 leaves: make([][]byte, len(leafHashes)),
338 }
339
340 // Store hashes as leaves
341 for i, hash := range leafHashes {
342 mt.leaves[i] = make([]byte, len(hash))
343 copy(mt.leaves[i], hash)
344 }
345
346 // Build tree from hashes
347 numLeaves := len(leafHashes)
348 treeSize := 2*nextPowerOf2(numLeaves) - 1
349 mt.nodes = make([][]byte, treeSize)
350
351 leafOffset := nextPowerOf2(numLeaves) - 1
352 for i, hash := range leafHashes {
353 mt.nodes[leafOffset+i] = hash
354 }
355
356 // Handle odd leaves
357 if numLeaves != nextPowerOf2(numLeaves) {
358 for i := numLeaves; i < nextPowerOf2(numLeaves); i++ {
359 mt.nodes[leafOffset+i] = mt.nodes[leafOffset+numLeaves-1]
360 }
361 }
362
363 // Build tree
364 for i := leafOffset - 1; i >= 0; i-- {
365 left := mt.nodes[2*i+1]
366 right := mt.nodes[2*i+2]
367 mt.nodes[i] = hashPair(left, right)
368 }
369
370 mt.root = mt.nodes[0]
371 return mt
372}
373
374// String returns a string representation of the tree
375func String() string {
376 return fmt.Sprintf("MerkleTree{leaves: %d, root: %s}",
377 len(mt.leaves), mt.RootHex()[:16]+"...")
378}
379
380// ProofSize returns the size of a proof in bytes
381func ProofSize() int {
382 size := 0
383 for _, hash := range p.Hashes {
384 size += len(hash)
385 }
386 return size
387}
Testing
1package merkletree
2
3import (
4 "encoding/hex"
5 "fmt"
6 "testing"
7)
8
9func TestBasicTree(t *testing.T) {
10 data := [][]byte{
11 []byte("Transaction 1"),
12 []byte("Transaction 2"),
13 []byte("Transaction 3"),
14 []byte("Transaction 4"),
15 }
16
17 tree, err := NewMerkleTree(data)
18 if err != nil {
19 t.Fatalf("Failed to create tree: %v", err)
20 }
21
22 root := tree.RootHex()
23 t.Logf("Root hash: %s", root)
24
25 if len(root) != 64 { // SHA-256 produces 64 hex chars
26 t.Error("Root hash has incorrect length")
27 }
28}
29
30func TestProofGeneration(t *testing.T) {
31 data := [][]byte{
32 []byte("Block 0"),
33 []byte("Block 1"),
34 []byte("Block 2"),
35 []byte("Block 3"),
36 }
37
38 tree, _ := NewMerkleTree(data)
39
40 // Generate proof for each leaf
41 for i := 0; i < len(data); i++ {
42 proof, err := tree.GenerateProof(i)
43 if err != nil {
44 t.Fatalf("Failed to generate proof for leaf %d: %v", i, err)
45 }
46
47 // Verify proof
48 if !tree.VerifyProof(proof) {
49 t.Errorf("Proof verification failed for leaf %d", i)
50 }
51
52 t.Logf("Leaf %d proof size: %d bytes", i, proof.ProofSize())
53 }
54}
55
56func TestProofVerificationWithTampering(t *testing.T) {
57 data := [][]byte{
58 []byte("Original data 1"),
59 []byte("Original data 2"),
60 []byte("Original data 3"),
61 []byte("Original data 4"),
62 }
63
64 tree, _ := NewMerkleTree(data)
65 proof, _ := tree.GenerateProof(0)
66
67 // Valid proof should verify
68 if !tree.VerifyProof(proof) {
69 t.Error("Valid proof failed verification")
70 }
71
72 // Tamper with leaf data
73 tamperedProof := &Proof{
74 LeafIndex: proof.LeafIndex,
75 Leaf: []byte("Tampered data"),
76 Hashes: proof.Hashes,
77 Positions: proof.Positions,
78 }
79
80 // Tampered proof should fail
81 if tree.VerifyProof(tamperedProof) {
82 t.Error("Tampered proof should not verify")
83 }
84}
85
86func TestTreeUpdate(t *testing.T) {
87 data := [][]byte{
88 []byte("Data 1"),
89 []byte("Data 2"),
90 []byte("Data 3"),
91 []byte("Data 4"),
92 }
93
94 tree, _ := NewMerkleTree(data)
95 originalRoot := tree.RootHex()
96
97 // Update a leaf
98 err := tree.UpdateLeaf(1, []byte("Updated Data 2"))
99 if err != nil {
100 t.Fatalf("Failed to update leaf: %v", err)
101 }
102
103 newRoot := tree.RootHex()
104
105 // Root should change
106 if originalRoot == newRoot {
107 t.Error("Root should change after update")
108 }
109
110 // Verify proof for updated leaf
111 proof, _ := tree.GenerateProof(1)
112 if !tree.VerifyProof(proof) {
113 t.Error("Proof verification failed after update")
114 }
115}
116
117func TestNonPowerOfTwoLeaves(t *testing.T) {
118 // Test with 5 leaves
119 data := [][]byte{
120 []byte("Leaf 1"),
121 []byte("Leaf 2"),
122 []byte("Leaf 3"),
123 []byte("Leaf 4"),
124 []byte("Leaf 5"),
125 }
126
127 tree, err := NewMerkleTree(data)
128 if err != nil {
129 t.Fatalf("Failed to create tree: %v", err)
130 }
131
132 // All leaves should have valid proofs
133 for i := 0; i < len(data); i++ {
134 proof, _ := tree.GenerateProof(i)
135 if !tree.VerifyProof(proof) {
136 t.Errorf("Proof verification failed for leaf %d", i)
137 }
138 }
139}
140
141func TestLargeTree(t *testing.T) {
142 numLeaves := 1000
143
144 data := make([][]byte, numLeaves)
145 for i := 0; i < numLeaves; i++ {
146 data[i] = []byte(fmt.Sprintf("Transaction %d", i))
147 }
148
149 tree, err := NewMerkleTree(data)
150 if err != nil {
151 t.Fatalf("Failed to create large tree: %v", err)
152 }
153
154 // Verify random proofs
155 testIndices := []int{0, 100, 500, 999}
156 for _, idx := range testIndices {
157 proof, _ := tree.GenerateProof(idx)
158 if !tree.VerifyProof(proof) {
159 t.Errorf("Proof verification failed for leaf %d", idx)
160 }
161
162 // Check proof size is logarithmic
163 expectedSize := int(math.Log2(float64(numLeaves))) + 1
164 actualSize := len(proof.Hashes)
165
166 if actualSize > expectedSize+2 { // Allow small tolerance
167 t.Errorf("Proof size too large: expected ~%d, got %d",
168 expectedSize, actualSize)
169 }
170 }
171
172 t.Logf("Tree with %d leaves - proof size: %d hashes",
173 numLeaves, len(testProof.Hashes))
174}
175
176func TestSparseMerkleTree(t *testing.T) {
177 smt := NewSparseMerkleTree(256)
178
179 // Set key-value pairs
180 smt.Set([]byte("key1"), []byte("value1"))
181 smt.Set([]byte("key2"), []byte("value2"))
182
183 // Retrieve values
184 val, exists := smt.Get([]byte("key1"))
185 if !exists || string(val) != "value1" {
186 t.Error("Failed to retrieve value from sparse Merkle tree")
187 }
188
189 // Non-existent key
190 _, exists = smt.Get([]byte("key3"))
191 if exists {
192 t.Error("Should not find non-existent key")
193 }
194}
195
196func BenchmarkTreeConstruction(b *testing.B) {
197 data := make([][]byte, 1000)
198 for i := 0; i < 1000; i++ {
199 data[i] = []byte(fmt.Sprintf("Transaction %d", i))
200 }
201
202 b.ResetTimer()
203 for i := 0; i < b.N; i++ {
204 NewMerkleTree(data)
205 }
206}
207
208func BenchmarkProofGeneration(b *testing.B) {
209 data := make([][]byte, 1000)
210 for i := 0; i < 1000; i++ {
211 data[i] = []byte(fmt.Sprintf("Transaction %d", i))
212 }
213
214 tree, _ := NewMerkleTree(data)
215
216 b.ResetTimer()
217 for i := 0; i < b.N; i++ {
218 tree.GenerateProof(i % 1000)
219 }
220}
221
222func BenchmarkProofVerification(b *testing.B) {
223 data := make([][]byte, 1000)
224 for i := 0; i < 1000; i++ {
225 data[i] = []byte(fmt.Sprintf("Transaction %d", i))
226 }
227
228 tree, _ := NewMerkleTree(data)
229 proof, _ := tree.GenerateProof(500)
230
231 b.ResetTimer()
232 for i := 0; i < b.N; i++ {
233 tree.VerifyProof(proof)
234 }
235}
Usage Example
1package main
2
3import (
4 "fmt"
5 "merkletree"
6)
7
8func main() {
9 // Example 1: Blockchain transactions
10 fmt.Println("=== Blockchain Transaction Verification ===")
11
12 transactions := [][]byte{
13 []byte("Alice pays Bob 10 BTC"),
14 []byte("Charlie pays Dave 5 BTC"),
15 []byte("Eve pays Frank 3 BTC"),
16 []byte("Grace pays Henry 7 BTC"),
17 }
18
19 tree, _ := merkletree.NewMerkleTree(transactions)
20 fmt.Printf("Merkle Root: %s\n", tree.RootHex())
21
22 // Generate proof for transaction 0
23 proof, _ := tree.GenerateProof(0)
24 fmt.Printf("\nProof for transaction 0:\n")
25 fmt.Printf("- Leaf: %s\n", proof.Leaf)
26 fmt.Printf("- Proof size: %d hashes\n",
27 len(proof.Hashes), proof.ProofSize())
28
29 // Verify proof
30 if tree.VerifyProof(proof) {
31 fmt.Println("✓ Transaction verified!")
32 }
33
34 // Example 2: File integrity checking
35 fmt.Println("\n=== File Integrity Verification ===")
36
37 fileBlocks := [][]byte{
38 []byte("Block 1 of file.txt"),
39 []byte("Block 2 of file.txt"),
40 []byte("Block 3 of file.txt"),
41 []byte("Block 4 of file.txt"),
42 }
43
44 fileTree, _ := merkletree.NewMerkleTree(fileBlocks)
45 trustedRoot := fileTree.Root()
46
47 fmt.Printf("File root hash: %x\n", trustedRoot)
48
49 // Simulate receiving block 2 from untrusted source
50 receivedBlock := []byte("Block 2 of file.txt")
51 blockProof, _ := fileTree.GenerateProof(1)
52
53 // Verify without downloading entire file
54 if merkletree.VerifyProofWithRoot(blockProof, trustedRoot) {
55 fmt.Printf("✓ Block 2 integrity verified\n")
56 }
57
58 // Detect tampering
59 tamperedBlock := []byte("TAMPERED Block 2")
60 tamperedProof := &merkletree.Proof{
61 LeafIndex: blockProof.LeafIndex,
62 Leaf: tamperedBlock,
63 Hashes: blockProof.Hashes,
64 Positions: blockProof.Positions,
65 }
66
67 if !merkletree.VerifyProofWithRoot(tamperedProof, trustedRoot) {
68 fmt.Println("✗ Tampering detected!")
69 }
70
71 // Example 3: Git-like version control
72 fmt.Println("\n=== Version Control System ===")
73
74 files := [][]byte{
75 []byte("main.go: package main..."),
76 []byte("util.go: package util..."),
77 []byte("README.md: # My Project..."),
78 }
79
80 commitTree, _ := merkletree.NewMerkleTree(files)
81 commitHash := commitTree.RootHex()
82
83 fmt.Printf("Commit hash: %s\n", commitHash[:16]+"...")
84
85 // Update a file
86 commitTree.UpdateLeaf(0, []byte("main.go: UPDATED package main..."))
87 newCommitHash := commitTree.RootHex()
88
89 fmt.Printf("New commit hash: %s\n", newCommitHash[:16]+"...")
90
91 if commitHash != newCommitHash {
92 fmt.Println("✓ Change detected - new commit required")
93 }
94
95 // Example 4: Efficient sync detection
96 fmt.Println("\n=== Distributed System Sync ===")
97
98 // Node A has 1000 data blocks
99 dataA := make([][]byte, 1000)
100 for i := 0; i < 1000; i++ {
101 dataA[i] = []byte(fmt.Sprintf("Block %d", i))
102 }
103
104 treeA, _ := merkletree.NewMerkleTree(dataA)
105 rootA := treeA.RootHex()
106
107 // Node B has same 1000 blocks
108 treeB, _ := merkletree.NewMerkleTree(dataA)
109 rootB := treeB.RootHex()
110
111 if rootA == rootB {
112 fmt.Println("✓ Nodes A and B are in sync")
113 }
114
115 // Node A updates block 500
116 treeA.UpdateLeaf(500, []byte("Updated Block 500"))
117 newRootA := treeA.RootHex()
118
119 if newRootA != rootB {
120 fmt.Println("✗ Nodes are out of sync - need to reconcile")
121 fmt.Println(" ")
122 }
123}
Testing Your Solution
1. Basic Construction
- Create tree with 4, 8, 16 leaves
- Create tree with 5, 7, 10 leaves
- Verify root hash is deterministic
2. Proof Generation
- Generate proof for each leaf
- Verify all proofs are valid
- Check proof size is O(log n)
3. Verification
- Valid proofs should verify against root
- Tampered leaf should fail verification
- Tampered proof hashes should fail
- Wrong root should fail
4. Updates
- Update leaf and verify root changes
- Verify proofs still work after update
- Batch update multiple leaves
5. Performance
- Benchmark tree construction with 1M leaves
- Benchmark proof generation
- Verify proof size scales logarithmically
Edge Cases
- Single leaf tree
- Empty data blocks
- Identical leaves
- Maximum tree size
- Concurrent access
Bonus Challenges
-
Efficient Tree Diff: Compare two Merkle trees and identify which leaves differ
-
Incremental Updates: Support appending new leaves without rebuilding entire tree
-
Merkle Mountain Ranges: Implement MMR for append-only logs
-
Multi-Hash Support: Support different hash functions
-
Compact Proof Serialization: Binary encoding to minimize proof size
-
Tree Visualization: Generate visual representation of tree structure
-
IPFS Integration: Use Merkle DAG for content-addressable storage
Key Takeaways
- Merkle trees enable efficient verification of large datasets
- Proof size is O(log n), making verification scalable
- Cryptographic hashing ensures tamper-evidence
- Root hash represents entire dataset integrity
- Updates are efficient - only O(log n) hashes recomputed
- Blockchains use Merkle trees for transaction verification
- Distributed systems use them for efficient synchronization
- Git uses similar hash tree structure for version control
- Sparse Merkle trees extend concept to key-value stores
- Trade-offs between tree balance, proof size, and update efficiency
References
- Original Patent: "Method of Providing Digital Signatures" by Ralph Merkle
- Bitcoin Whitepaper: Satoshi Nakamoto's use of Merkle trees for SPV
- Certificate Transparency: RFC 6962 - Merkle trees for audit logs
- IPFS Documentation: Content addressing with Merkle DAGs
- Ethereum Yellow Paper: Merkle Patricia trees for state
- Git Internals: How Git uses hash trees
- "Mastering Bitcoin": Chapter on Merkle trees by Andreas Antonopoulos
- Cryptography Engineering: Ferguson, Schneier, Kohno - Hash trees chapter