Merkle Tree Cryptographic Verification

Exercise: Merkle Tree Cryptographic Verification

Difficulty - Advanced

Estimated Time - 3-4 hours

Learning Objectives

  • Understand cryptographic hash trees and their security properties
  • Implement Merkle trees for efficient data verification
  • Master proof generation and verification algorithms
  • Apply Merkle trees to blockchain and distributed systems
  • Design systems with tamper-evident data structures
  • Optimize tree construction and proof size

Problem Statement

How can you verify the integrity of large datasets efficiently? A naive approach would hash all the data, but what if you want to verify just one piece of data, or detect which specific data was tampered with? Merkle trees solve this elegantly by organizing cryptographic hashes in a binary tree structure.

A Merkle tree is a data structure where each leaf node contains a hash of data, and each non-leaf node contains a hash of its children. This creates a hierarchy where the root hash represents the entire dataset. To verify a single piece of data, you only need O(log n) hashes, not the entire dataset.

Real-World Scenario:

Consider a blockchain or distributed file system where nodes need to verify data without downloading everything:

 1// Without Merkle tree: Download everything to verify
 2type NaiveVerification struct {
 3    allData [][]byte  // Must download all blocks
 4}
 5
 6func Verify(data []byte, index int) bool {
 7    // Need to hash ALL data to verify one piece
 8    fullHash := hashAll(nv.allData)
 9    return fullHash == trustedRoot
10}
11
12// With Merkle tree: Only need log(n) hashes
13type MerkleVerification struct {
14    root []byte  // Just the root hash
15}
16
17func Verify(data []byte, proof [][]byte) bool {
18    // Verify using only O(log n) hashes
19    return mv.VerifyProof(data, proof, mv.root)
20}

For 1 million blocks, naive verification requires downloading ~1 GB. With a Merkle tree, verification needs only ~20 hashes!

Requirements

Functional Requirements

  1. Tree Construction

    • Build Merkle tree from list of data blocks
    • Support any number of leaves
    • Calculate all interior node hashes
    • Generate and store root hash
  2. Proof Generation

    • Generate authentication path for any leaf
    • Proof should contain minimal hashes needed to verify
    • Support multi-leaf proofs
    • Generate compact binary representation
  3. Proof Verification

    • Verify single leaf given proof and root hash
    • Verify multiple leaves with combined proof
    • Detect invalid or tampered proofs
    • Verify against trusted root hash
  4. Tree Updates

    • Update single leaf and recompute affected hashes
    • Efficiently update multiple leaves in batch
    • Generate delta proof showing what changed
    • Support append operations
  5. Advanced Features

    • Serialize and deserialize tree structure
    • Support different hash algorithms
    • Implement sparse Merkle trees for key-value stores
    • Generate inclusion/exclusion proofs

Non-Functional Requirements

  • Security: Collision-resistant hash function
  • Efficiency: O(log n) proof size and verification time
  • Memory: Efficient storage, don't duplicate data
  • Correctness: Cryptographically secure verification
  • Flexibility: Support trees with millions of leaves

Background and Theory

Merkle trees were invented by Ralph Merkle in 1979 and patented in 1982. They are fundamental to many modern technologies.

How Merkle Trees Work

Tree Structure:

                    Root Hash
                   /          \
            Hash(A,B)        Hash(C,D)
           /        \       /        \
       Hash(A)   Hash(B) Hash(C)  Hash(D)
          |         |       |        |
       Data A    Data B  Data C   Data D

Construction:

  1. Hash each data block to create leaf nodes
  2. Pair leaf hashes and hash them to create parent nodes
  3. Continue pairing and hashing up the tree
  4. The top hash is the Merkle root

Verification:
To verify Data A is in the tree:

  1. Get proof: [Hash(B), Hash(C,D)]
  2. Compute Hash(A) from Data A
  3. Compute Hash(A,B) from Hash(A) and Hash(B)
  4. Compute Root from Hash(A,B) and Hash(C,D)
  5. Compare computed root with trusted root

If they match, Data A is authentic!

Security Properties

Collision Resistance: Finding two different inputs that hash to same value is computationally infeasible. This ensures:

  • Cannot forge data that produces same root
  • Cannot substitute one data block for another
  • Any tampering changes the root hash

Efficient Verification:

  • Proof size: O(log n) hashes
  • Verification time: O(log n) hash operations
  • For n=1,000,000: only 20 hashes needed

Tamper Evidence:

  • Changing any leaf changes its parent
  • Change propagates to root
  • Different root = tampering detected

Use Cases

  1. Bitcoin/Blockchain: Verify transactions without full blockchain
  2. Git: Content-addressable storage with integrity
  3. IPFS: Distributed file system with verification
  4. Certificate Transparency: Audit log of SSL certificates
  5. Database Replication: Efficient synchronization detection
  6. File Synchronization: Detect changed files efficiently
  7. Distributed Systems: Byzantine fault tolerance with verified state

Implementation Challenges

Challenge 1: Handling Non-Power-of-2 Leaves

When the number of leaves is not a power of 2, the tree is unbalanced. How do you handle odd numbers at each level?

Solution Approach:

  • Duplicate last node: If odd number of nodes, duplicate the last one
  • Pass up single node: Don't pair if single node remains
  • Use placeholder: Add empty node with zero hash
    Each has trade-offs; duplication is most common in practice.

Challenge 2: Efficient Tree Storage

Storing the entire tree in memory can be wasteful, especially for large datasets.

Solution Approach:

  • Store only the leaves and compute interior nodes on-demand
  • Use implicit tree representation
  • For sparse trees, use a map to store only non-empty nodes
  • Implement lazy evaluation for unused branches

Challenge 3: Proof Compactness

Proofs should be minimal - include only necessary hashes.

Solution Approach:

  • For single leaf: O(log n) hashes from leaf to root
  • For adjacent leaves: Share common path nodes
  • Use bit flags to indicate left/right position
  • Compress proof representation

Challenge 4: Concurrent Tree Updates

Multiple updates might happen simultaneously in distributed systems.

Solution Approach:

  • Implement copy-on-write for immutable trees
  • Use version numbers for each root
  • Lock-free updates for independent subtrees
  • Batch updates to minimize recomputation

Hints

Hint 1: Tree Representation

Use a slice-based binary tree representation for efficiency:

 1type MerkleTree struct {
 2    leaves  [][]byte  // Original data
 3    nodes   [][]byte  // All hashes
 4    hashFn  hash.Hash // Hash function
 5}
 6
 7// For a complete binary tree with n leaves:
 8// - Total nodes = 2*n - 1
 9// - nodes[0] is root
10// - nodes[2*i+1] is left child of nodes[i]
11// - nodes[2*i+2] is right child of nodes[i]
12// - Parent of nodes[i] is nodes[(i-1)/2]

This representation provides O(1) navigation and is cache-friendly.

Hint 2: Building the Tree

Build from bottom up:

 1func Build() {
 2    // Start with leaf hashes
 3    currentLevel := mt.hashLeaves()
 4
 5    // Build tree level by level
 6    for len(currentLevel) > 1 {
 7        nextLevel := [][]byte{}
 8
 9        for i := 0; i < len(currentLevel); i += 2 {
10            if i+1 < len(currentLevel) {
11                // Pair exists
12                parent := mt.hashPair(currentLevel[i], currentLevel[i+1])
13                nextLevel = append(nextLevel, parent)
14            } else {
15                // Odd node: duplicate it
16                parent := mt.hashPair(currentLevel[i], currentLevel[i])
17                nextLevel = append(nextLevel, parent)
18            }
19        }
20
21        currentLevel = nextLevel
22    }
23
24    mt.root = currentLevel[0]
25}
Hint 3: Generating Proofs

A proof is the sibling path from leaf to root:

 1func GenerateProof(leafIndex int) [][]byte {
 2    proof := [][]byte{}
 3    index := mt.leafToNodeIndex(leafIndex)
 4
 5    // Walk up the tree
 6    for index > 0 {
 7        // Get sibling hash
 8        siblingIndex := mt.getSiblingIndex(index)
 9        proof = append(proof, mt.nodes[siblingIndex])
10
11        // Move to parent
12        index = mt.getParentIndex(index)
13    }
14
15    return proof
16}
17
18func getSiblingIndex(index int) int {
19    if index%2 == 1 {
20        return index + 1  // Right sibling
21    }
22    return index - 1  // Left sibling
23}
Hint 4: Verifying Proofs

Recompute root from leaf using proof hashes:

 1func VerifyProof(leaf []byte, proof [][]byte, root []byte) bool {
 2    currentHash := hashData(leaf)
 3
 4    for _, siblingHash := range proof {
 5        // Determine if current is left or right child
 6        //
 7        if isLeftChild {
 8            currentHash = hashPair(currentHash, siblingHash)
 9        } else {
10            currentHash = hashPair(siblingHash, currentHash)
11        }
12    }
13
14    return bytes.Equal(currentHash, root)
15}

Note: Need to include position information in proof.

Hint 5: Efficient Updates

Only recompute affected path:

 1func UpdateLeaf(leafIndex int, newData []byte) {
 2    // Update leaf
 3    mt.leaves[leafIndex] = newData
 4
 5    // Recompute only the path to root
 6    index := mt.leafToNodeIndex(leafIndex)
 7    mt.nodes[index] = mt.hashData(newData)
 8
 9    // Walk up and recompute parents
10    for index > 0 {
11        parentIndex := mt.getParentIndex(index)
12        siblingIndex := mt.getSiblingIndex(index)
13
14        left := mt.nodes[min(index, siblingIndex)]
15        right := mt.nodes[max(index, siblingIndex)]
16
17        mt.nodes[parentIndex] = mt.hashPair(left, right)
18        index = parentIndex
19    }
20}

This is O(log n) instead of rebuilding entire tree O(n).

Solution

Show Complete Solution

Approach

This implementation provides:

  1. Standard Merkle Tree: Binary tree with SHA-256 hashing
  2. Proof System: Generate and verify proofs with position tracking
  3. Multi-Proof: Efficiently prove multiple leaves simultaneously
  4. Tree Updates: Efficient single-leaf and batch updates
  5. Sparse Merkle Tree: For key-value stores with inclusion/exclusion proofs
  6. Serialization: Save and load tree state

Implementation

  1package merkletree
  2
  3import (
  4	"bytes"
  5	"crypto/sha256"
  6	"encoding/hex"
  7	"errors"
  8	"fmt"
  9	"math"
 10)
 11
 12// MerkleTree represents a binary Merkle tree
 13type MerkleTree struct {
 14	leaves [][]byte
 15	nodes  [][]byte
 16	root   []byte
 17}
 18
 19// NewMerkleTree creates a new Merkle tree from data blocks
 20func NewMerkleTree(data [][]byte) {
 21	if len(data) == 0 {
 22		return nil, errors.New("cannot create tree with zero leaves")
 23	}
 24
 25	mt := &MerkleTree{
 26		leaves: make([][]byte, len(data)),
 27	}
 28
 29	// Copy leaves
 30	for i, d := range data {
 31		mt.leaves[i] = make([]byte, len(d))
 32		copy(mt.leaves[i], d)
 33	}
 34
 35	// Build tree
 36	mt.build()
 37
 38	return mt, nil
 39}
 40
 41// build constructs the Merkle tree from leaves
 42func build() {
 43	numLeaves := len(mt.leaves)
 44
 45	// Calculate total nodes in tree
 46	// For n leaves, tree has 2*n-1 nodes
 47	treeSize := 2*nextPowerOf2(numLeaves) - 1
 48	mt.nodes = make([][]byte, treeSize)
 49
 50	// Hash leaves and place in tree
 51	leafOffset := nextPowerOf2(numLeaves) - 1
 52	for i, leaf := range mt.leaves {
 53		mt.nodes[leafOffset+i] = hashData(leaf)
 54	}
 55
 56	// Handle odd number of leaves
 57	if numLeaves != nextPowerOf2(numLeaves) {
 58		for i := numLeaves; i < nextPowerOf2(numLeaves); i++ {
 59			mt.nodes[leafOffset+i] = mt.nodes[leafOffset+numLeaves-1]
 60		}
 61	}
 62
 63	// Build tree bottom-up
 64	for i := leafOffset - 1; i >= 0; i-- {
 65		left := mt.nodes[2*i+1]
 66		right := mt.nodes[2*i+2]
 67		mt.nodes[i] = hashPair(left, right)
 68	}
 69
 70	mt.root = mt.nodes[0]
 71}
 72
 73// Root returns the Merkle root hash
 74func Root() []byte {
 75	result := make([]byte, len(mt.root))
 76	copy(result, mt.root)
 77	return result
 78}
 79
 80// RootHex returns the root hash as hex string
 81func RootHex() string {
 82	return hex.EncodeToString(mt.root)
 83}
 84
 85// Proof represents a Merkle proof
 86type Proof struct {
 87	LeafIndex int
 88	Leaf      []byte
 89	Hashes    [][]byte
 90	Positions []bool // true = right, false = left
 91}
 92
 93// GenerateProof creates a proof for a specific leaf
 94func GenerateProof(leafIndex int) {
 95	if leafIndex < 0 || leafIndex >= len(mt.leaves) {
 96		return nil, errors.New("leaf index out of bounds")
 97	}
 98
 99	proof := &Proof{
100		LeafIndex: leafIndex,
101		Leaf:      mt.leaves[leafIndex],
102		Hashes:    [][]byte{},
103		Positions: []bool{},
104	}
105
106	// Start at leaf position in tree
107	leafOffset := nextPowerOf2(len(mt.leaves)) - 1
108	index := leafOffset + leafIndex
109
110	// Walk up to root
111	for index > 0 {
112		isRightChild := index%2 == 0
113		siblingIndex := index
114		if isRightChild {
115			siblingIndex = index - 1
116		} else {
117			siblingIndex = index + 1
118		}
119
120		proof.Hashes = append(proof.Hashes, mt.nodes[siblingIndex])
121		proof.Positions = append(proof.Positions, !isRightChild) // Store sibling position
122
123		// Move to parent
124		index = / 2
125	}
126
127	return proof, nil
128}
129
130// VerifyProof verifies a Merkle proof against the tree root
131func VerifyProof(proof *Proof) bool {
132	return VerifyProofWithRoot(proof, mt.root)
133}
134
135// VerifyProofWithRoot verifies a proof against a given root
136func VerifyProofWithRoot(proof *Proof, root []byte) bool {
137	currentHash := hashData(proof.Leaf)
138
139	for i, siblingHash := range proof.Hashes {
140		if proof.Positions[i] { // Sibling is on right
141			currentHash = hashPair(currentHash, siblingHash)
142		} else { // Sibling is on left
143			currentHash = hashPair(siblingHash, currentHash)
144		}
145	}
146
147	return bytes.Equal(currentHash, root)
148}
149
150// UpdateLeaf updates a single leaf and recomputes affected nodes
151func UpdateLeaf(leafIndex int, newData []byte) error {
152	if leafIndex < 0 || leafIndex >= len(mt.leaves) {
153		return errors.New("leaf index out of bounds")
154	}
155
156	// Update leaf
157	mt.leaves[leafIndex] = make([]byte, len(newData))
158	copy(mt.leaves[leafIndex], newData)
159
160	// Recompute path to root
161	leafOffset := nextPowerOf2(len(mt.leaves)) - 1
162	index := leafOffset + leafIndex
163	mt.nodes[index] = hashData(newData)
164
165	// Walk up and recompute parents
166	for index > 0 {
167		parentIndex := / 2
168		leftChild := 2*parentIndex + 1
169		rightChild := 2*parentIndex + 2
170
171		mt.nodes[parentIndex] = hashPair(mt.nodes[leftChild], mt.nodes[rightChild])
172		index = parentIndex
173	}
174
175	mt.root = mt.nodes[0]
176	return nil
177}
178
179// GetLeaf returns a copy of the leaf at the given index
180func GetLeaf(leafIndex int) {
181	if leafIndex < 0 || leafIndex >= len(mt.leaves) {
182		return nil, errors.New("leaf index out of bounds")
183	}
184
185	result := make([]byte, len(mt.leaves[leafIndex]))
186	copy(result, mt.leaves[leafIndex])
187	return result, nil
188}
189
190// NumLeaves returns the number of leaves in the tree
191func NumLeaves() int {
192	return len(mt.leaves)
193}
194
195// Helper functions
196
197func hashData(data []byte) []byte {
198	hash := sha256.Sum256(data)
199	return hash[:]
200}
201
202func hashPair(left, right []byte) []byte {
203	combined := append(left, right...)
204	hash := sha256.Sum256(combined)
205	return hash[:]
206}
207
208func nextPowerOf2(n int) int {
209	if n <= 0 {
210		return 1
211	}
212	return int(math.Pow(2, math.Ceil(math.Log2(float64(n)))))
213}
214
215// MultiProof represents proofs for multiple leaves
216type MultiProof struct {
217	LeafIndices []int
218	Leaves      [][]byte
219	Hashes      [][]byte
220	Flags       []bool // Reconstruction flags
221}
222
223// GenerateMultiProof creates an efficient proof for multiple leaves
224func GenerateMultiProof(leafIndices []int) {
225	// Validate indices
226	for _, idx := range leafIndices {
227		if idx < 0 || idx >= len(mt.leaves) {
228			return nil, errors.New("leaf index out of bounds")
229		}
230	}
231
232	// For simplicity, generate individual proofs
233	// A more efficient implementation would share common path nodes
234	mp := &MultiProof{
235		LeafIndices: leafIndices,
236		Leaves:      make([][]byte, len(leafIndices)),
237		Hashes:      [][]byte{},
238	}
239
240	for i, idx := range leafIndices {
241		mp.Leaves[i] = mt.leaves[idx]
242	}
243
244	// Collect all required hashes
245	seen := make(map[string]bool)
246	for _, idx := range leafIndices {
247		proof, _ := mt.GenerateProof(idx)
248		for _, hash := range proof.Hashes {
249			hashStr := hex.EncodeToString(hash)
250			if !seen[hashStr] {
251				seen[hashStr] = true
252				mp.Hashes = append(mp.Hashes, hash)
253			}
254		}
255	}
256
257	return mp, nil
258}
259
260// SparseMerkleTree represents a sparse Merkle tree for key-value stores
261type SparseMerkleTree struct {
262	root       []byte
263	nodes      map[string][]byte // hash -> value
264	leaves     map[string][]byte // key -> value
265	height     int
266	defaultHash []byte
267}
268
269// NewSparseMerkleTree creates a sparse Merkle tree of given height
270func NewSparseMerkleTree(height int) *SparseMerkleTree {
271	smt := &SparseMerkleTree{
272		nodes:      make(map[string][]byte),
273		leaves:     make(map[string][]byte),
274		height:     height,
275		defaultHash: hashData([]byte{}),
276	}
277
278	// Compute default hashes for empty tree
279	smt.root = smt.computeDefaultRoot()
280
281	return smt
282}
283
284func computeDefaultRoot() []byte {
285	current := smt.defaultHash
286	for i := 0; i < smt.height; i++ {
287		current = hashPair(current, current)
288	}
289	return current
290}
291
292// Set inserts or updates a key-value pair
293func Set(key, value []byte) {
294	keyHash := hashData(key)
295	smt.leaves[string(keyHash)] = value
296	smt.recomputeRoot()
297}
298
299// Get retrieves a value by key
300func Get(key []byte) {
301	keyHash := hashData(key)
302	val, exists := smt.leaves[string(keyHash)]
303	return val, exists
304}
305
306func recomputeRoot() {
307	// Simplified recomputation
308	smt.nodes = make(map[string][]byte)
309
310	// Build tree from leaves
311	//
312	leaves := make([][]byte, 0, len(smt.leaves))
313	for _, v := range smt.leaves {
314		leaves = append(leaves, v)
315	}
316
317	if len(leaves) == 0 {
318		smt.root = smt.computeDefaultRoot()
319		return
320	}
321
322	// Build standard Merkle tree
323	tree, _ := NewMerkleTree(leaves)
324	smt.root = tree.Root()
325}
326
327// Root returns the root hash
328func Root() []byte {
329	result := make([]byte, len(smt.root))
330	copy(result, smt.root)
331	return result
332}
333
334// MerkleTreeFromHashes creates a tree from pre-computed leaf hashes
335func MerkleTreeFromHashes(leafHashes [][]byte) *MerkleTree {
336	mt := &MerkleTree{
337		leaves: make([][]byte, len(leafHashes)),
338	}
339
340	// Store hashes as leaves
341	for i, hash := range leafHashes {
342		mt.leaves[i] = make([]byte, len(hash))
343		copy(mt.leaves[i], hash)
344	}
345
346	// Build tree from hashes
347	numLeaves := len(leafHashes)
348	treeSize := 2*nextPowerOf2(numLeaves) - 1
349	mt.nodes = make([][]byte, treeSize)
350
351	leafOffset := nextPowerOf2(numLeaves) - 1
352	for i, hash := range leafHashes {
353		mt.nodes[leafOffset+i] = hash
354	}
355
356	// Handle odd leaves
357	if numLeaves != nextPowerOf2(numLeaves) {
358		for i := numLeaves; i < nextPowerOf2(numLeaves); i++ {
359			mt.nodes[leafOffset+i] = mt.nodes[leafOffset+numLeaves-1]
360		}
361	}
362
363	// Build tree
364	for i := leafOffset - 1; i >= 0; i-- {
365		left := mt.nodes[2*i+1]
366		right := mt.nodes[2*i+2]
367		mt.nodes[i] = hashPair(left, right)
368	}
369
370	mt.root = mt.nodes[0]
371	return mt
372}
373
374// String returns a string representation of the tree
375func String() string {
376	return fmt.Sprintf("MerkleTree{leaves: %d, root: %s}",
377		len(mt.leaves), mt.RootHex()[:16]+"...")
378}
379
380// ProofSize returns the size of a proof in bytes
381func ProofSize() int {
382	size := 0
383	for _, hash := range p.Hashes {
384		size += len(hash)
385	}
386	return size
387}

Testing

  1package merkletree
  2
  3import (
  4	"encoding/hex"
  5	"fmt"
  6	"testing"
  7)
  8
  9func TestBasicTree(t *testing.T) {
 10	data := [][]byte{
 11		[]byte("Transaction 1"),
 12		[]byte("Transaction 2"),
 13		[]byte("Transaction 3"),
 14		[]byte("Transaction 4"),
 15	}
 16
 17	tree, err := NewMerkleTree(data)
 18	if err != nil {
 19		t.Fatalf("Failed to create tree: %v", err)
 20	}
 21
 22	root := tree.RootHex()
 23	t.Logf("Root hash: %s", root)
 24
 25	if len(root) != 64 { // SHA-256 produces 64 hex chars
 26		t.Error("Root hash has incorrect length")
 27	}
 28}
 29
 30func TestProofGeneration(t *testing.T) {
 31	data := [][]byte{
 32		[]byte("Block 0"),
 33		[]byte("Block 1"),
 34		[]byte("Block 2"),
 35		[]byte("Block 3"),
 36	}
 37
 38	tree, _ := NewMerkleTree(data)
 39
 40	// Generate proof for each leaf
 41	for i := 0; i < len(data); i++ {
 42		proof, err := tree.GenerateProof(i)
 43		if err != nil {
 44			t.Fatalf("Failed to generate proof for leaf %d: %v", i, err)
 45		}
 46
 47		// Verify proof
 48		if !tree.VerifyProof(proof) {
 49			t.Errorf("Proof verification failed for leaf %d", i)
 50		}
 51
 52		t.Logf("Leaf %d proof size: %d bytes", i, proof.ProofSize())
 53	}
 54}
 55
 56func TestProofVerificationWithTampering(t *testing.T) {
 57	data := [][]byte{
 58		[]byte("Original data 1"),
 59		[]byte("Original data 2"),
 60		[]byte("Original data 3"),
 61		[]byte("Original data 4"),
 62	}
 63
 64	tree, _ := NewMerkleTree(data)
 65	proof, _ := tree.GenerateProof(0)
 66
 67	// Valid proof should verify
 68	if !tree.VerifyProof(proof) {
 69		t.Error("Valid proof failed verification")
 70	}
 71
 72	// Tamper with leaf data
 73	tamperedProof := &Proof{
 74		LeafIndex: proof.LeafIndex,
 75		Leaf:      []byte("Tampered data"),
 76		Hashes:    proof.Hashes,
 77		Positions: proof.Positions,
 78	}
 79
 80	// Tampered proof should fail
 81	if tree.VerifyProof(tamperedProof) {
 82		t.Error("Tampered proof should not verify")
 83	}
 84}
 85
 86func TestTreeUpdate(t *testing.T) {
 87	data := [][]byte{
 88		[]byte("Data 1"),
 89		[]byte("Data 2"),
 90		[]byte("Data 3"),
 91		[]byte("Data 4"),
 92	}
 93
 94	tree, _ := NewMerkleTree(data)
 95	originalRoot := tree.RootHex()
 96
 97	// Update a leaf
 98	err := tree.UpdateLeaf(1, []byte("Updated Data 2"))
 99	if err != nil {
100		t.Fatalf("Failed to update leaf: %v", err)
101	}
102
103	newRoot := tree.RootHex()
104
105	// Root should change
106	if originalRoot == newRoot {
107		t.Error("Root should change after update")
108	}
109
110	// Verify proof for updated leaf
111	proof, _ := tree.GenerateProof(1)
112	if !tree.VerifyProof(proof) {
113		t.Error("Proof verification failed after update")
114	}
115}
116
117func TestNonPowerOfTwoLeaves(t *testing.T) {
118	// Test with 5 leaves
119	data := [][]byte{
120		[]byte("Leaf 1"),
121		[]byte("Leaf 2"),
122		[]byte("Leaf 3"),
123		[]byte("Leaf 4"),
124		[]byte("Leaf 5"),
125	}
126
127	tree, err := NewMerkleTree(data)
128	if err != nil {
129		t.Fatalf("Failed to create tree: %v", err)
130	}
131
132	// All leaves should have valid proofs
133	for i := 0; i < len(data); i++ {
134		proof, _ := tree.GenerateProof(i)
135		if !tree.VerifyProof(proof) {
136			t.Errorf("Proof verification failed for leaf %d", i)
137		}
138	}
139}
140
141func TestLargeTree(t *testing.T) {
142	numLeaves := 1000
143
144	data := make([][]byte, numLeaves)
145	for i := 0; i < numLeaves; i++ {
146		data[i] = []byte(fmt.Sprintf("Transaction %d", i))
147	}
148
149	tree, err := NewMerkleTree(data)
150	if err != nil {
151		t.Fatalf("Failed to create large tree: %v", err)
152	}
153
154	// Verify random proofs
155	testIndices := []int{0, 100, 500, 999}
156	for _, idx := range testIndices {
157		proof, _ := tree.GenerateProof(idx)
158		if !tree.VerifyProof(proof) {
159			t.Errorf("Proof verification failed for leaf %d", idx)
160		}
161
162		// Check proof size is logarithmic
163		expectedSize := int(math.Log2(float64(numLeaves))) + 1
164		actualSize := len(proof.Hashes)
165
166		if actualSize > expectedSize+2 { // Allow small tolerance
167			t.Errorf("Proof size too large: expected ~%d, got %d",
168				expectedSize, actualSize)
169		}
170	}
171
172	t.Logf("Tree with %d leaves - proof size: %d hashes",
173		numLeaves, len(testProof.Hashes))
174}
175
176func TestSparseMerkleTree(t *testing.T) {
177	smt := NewSparseMerkleTree(256)
178
179	// Set key-value pairs
180	smt.Set([]byte("key1"), []byte("value1"))
181	smt.Set([]byte("key2"), []byte("value2"))
182
183	// Retrieve values
184	val, exists := smt.Get([]byte("key1"))
185	if !exists || string(val) != "value1" {
186		t.Error("Failed to retrieve value from sparse Merkle tree")
187	}
188
189	// Non-existent key
190	_, exists = smt.Get([]byte("key3"))
191	if exists {
192		t.Error("Should not find non-existent key")
193	}
194}
195
196func BenchmarkTreeConstruction(b *testing.B) {
197	data := make([][]byte, 1000)
198	for i := 0; i < 1000; i++ {
199		data[i] = []byte(fmt.Sprintf("Transaction %d", i))
200	}
201
202	b.ResetTimer()
203	for i := 0; i < b.N; i++ {
204		NewMerkleTree(data)
205	}
206}
207
208func BenchmarkProofGeneration(b *testing.B) {
209	data := make([][]byte, 1000)
210	for i := 0; i < 1000; i++ {
211		data[i] = []byte(fmt.Sprintf("Transaction %d", i))
212	}
213
214	tree, _ := NewMerkleTree(data)
215
216	b.ResetTimer()
217	for i := 0; i < b.N; i++ {
218		tree.GenerateProof(i % 1000)
219	}
220}
221
222func BenchmarkProofVerification(b *testing.B) {
223	data := make([][]byte, 1000)
224	for i := 0; i < 1000; i++ {
225		data[i] = []byte(fmt.Sprintf("Transaction %d", i))
226	}
227
228	tree, _ := NewMerkleTree(data)
229	proof, _ := tree.GenerateProof(500)
230
231	b.ResetTimer()
232	for i := 0; i < b.N; i++ {
233		tree.VerifyProof(proof)
234	}
235}

Usage Example

  1package main
  2
  3import (
  4	"fmt"
  5	"merkletree"
  6)
  7
  8func main() {
  9	// Example 1: Blockchain transactions
 10	fmt.Println("=== Blockchain Transaction Verification ===")
 11
 12	transactions := [][]byte{
 13		[]byte("Alice pays Bob 10 BTC"),
 14		[]byte("Charlie pays Dave 5 BTC"),
 15		[]byte("Eve pays Frank 3 BTC"),
 16		[]byte("Grace pays Henry 7 BTC"),
 17	}
 18
 19	tree, _ := merkletree.NewMerkleTree(transactions)
 20	fmt.Printf("Merkle Root: %s\n", tree.RootHex())
 21
 22	// Generate proof for transaction 0
 23	proof, _ := tree.GenerateProof(0)
 24	fmt.Printf("\nProof for transaction 0:\n")
 25	fmt.Printf("- Leaf: %s\n", proof.Leaf)
 26	fmt.Printf("- Proof size: %d hashes\n",
 27		len(proof.Hashes), proof.ProofSize())
 28
 29	// Verify proof
 30	if tree.VerifyProof(proof) {
 31		fmt.Println("✓ Transaction verified!")
 32	}
 33
 34	// Example 2: File integrity checking
 35	fmt.Println("\n=== File Integrity Verification ===")
 36
 37	fileBlocks := [][]byte{
 38		[]byte("Block 1 of file.txt"),
 39		[]byte("Block 2 of file.txt"),
 40		[]byte("Block 3 of file.txt"),
 41		[]byte("Block 4 of file.txt"),
 42	}
 43
 44	fileTree, _ := merkletree.NewMerkleTree(fileBlocks)
 45	trustedRoot := fileTree.Root()
 46
 47	fmt.Printf("File root hash: %x\n", trustedRoot)
 48
 49	// Simulate receiving block 2 from untrusted source
 50	receivedBlock := []byte("Block 2 of file.txt")
 51	blockProof, _ := fileTree.GenerateProof(1)
 52
 53	// Verify without downloading entire file
 54	if merkletree.VerifyProofWithRoot(blockProof, trustedRoot) {
 55		fmt.Printf("✓ Block 2 integrity verified\n")
 56	}
 57
 58	// Detect tampering
 59	tamperedBlock := []byte("TAMPERED Block 2")
 60	tamperedProof := &merkletree.Proof{
 61		LeafIndex: blockProof.LeafIndex,
 62		Leaf:      tamperedBlock,
 63		Hashes:    blockProof.Hashes,
 64		Positions: blockProof.Positions,
 65	}
 66
 67	if !merkletree.VerifyProofWithRoot(tamperedProof, trustedRoot) {
 68		fmt.Println("✗ Tampering detected!")
 69	}
 70
 71	// Example 3: Git-like version control
 72	fmt.Println("\n=== Version Control System ===")
 73
 74	files := [][]byte{
 75		[]byte("main.go: package main..."),
 76		[]byte("util.go: package util..."),
 77		[]byte("README.md: # My Project..."),
 78	}
 79
 80	commitTree, _ := merkletree.NewMerkleTree(files)
 81	commitHash := commitTree.RootHex()
 82
 83	fmt.Printf("Commit hash: %s\n", commitHash[:16]+"...")
 84
 85	// Update a file
 86	commitTree.UpdateLeaf(0, []byte("main.go: UPDATED package main..."))
 87	newCommitHash := commitTree.RootHex()
 88
 89	fmt.Printf("New commit hash: %s\n", newCommitHash[:16]+"...")
 90
 91	if commitHash != newCommitHash {
 92		fmt.Println("✓ Change detected - new commit required")
 93	}
 94
 95	// Example 4: Efficient sync detection
 96	fmt.Println("\n=== Distributed System Sync ===")
 97
 98	// Node A has 1000 data blocks
 99	dataA := make([][]byte, 1000)
100	for i := 0; i < 1000; i++ {
101		dataA[i] = []byte(fmt.Sprintf("Block %d", i))
102	}
103
104	treeA, _ := merkletree.NewMerkleTree(dataA)
105	rootA := treeA.RootHex()
106
107	// Node B has same 1000 blocks
108	treeB, _ := merkletree.NewMerkleTree(dataA)
109	rootB := treeB.RootHex()
110
111	if rootA == rootB {
112		fmt.Println("✓ Nodes A and B are in sync")
113	}
114
115	// Node A updates block 500
116	treeA.UpdateLeaf(500, []byte("Updated Block 500"))
117	newRootA := treeA.RootHex()
118
119	if newRootA != rootB {
120		fmt.Println("✗ Nodes are out of sync - need to reconcile")
121		fmt.Println(" ")
122	}
123}

Testing Your Solution

1. Basic Construction

  • Create tree with 4, 8, 16 leaves
  • Create tree with 5, 7, 10 leaves
  • Verify root hash is deterministic

2. Proof Generation

  • Generate proof for each leaf
  • Verify all proofs are valid
  • Check proof size is O(log n)

3. Verification

  • Valid proofs should verify against root
  • Tampered leaf should fail verification
  • Tampered proof hashes should fail
  • Wrong root should fail

4. Updates

  • Update leaf and verify root changes
  • Verify proofs still work after update
  • Batch update multiple leaves

5. Performance

  • Benchmark tree construction with 1M leaves
  • Benchmark proof generation
  • Verify proof size scales logarithmically

Edge Cases

  • Single leaf tree
  • Empty data blocks
  • Identical leaves
  • Maximum tree size
  • Concurrent access

Bonus Challenges

  1. Efficient Tree Diff: Compare two Merkle trees and identify which leaves differ

  2. Incremental Updates: Support appending new leaves without rebuilding entire tree

  3. Merkle Mountain Ranges: Implement MMR for append-only logs

  4. Multi-Hash Support: Support different hash functions

  5. Compact Proof Serialization: Binary encoding to minimize proof size

  6. Tree Visualization: Generate visual representation of tree structure

  7. IPFS Integration: Use Merkle DAG for content-addressable storage

Key Takeaways

  • Merkle trees enable efficient verification of large datasets
  • Proof size is O(log n), making verification scalable
  • Cryptographic hashing ensures tamper-evidence
  • Root hash represents entire dataset integrity
  • Updates are efficient - only O(log n) hashes recomputed
  • Blockchains use Merkle trees for transaction verification
  • Distributed systems use them for efficient synchronization
  • Git uses similar hash tree structure for version control
  • Sparse Merkle trees extend concept to key-value stores
  • Trade-offs between tree balance, proof size, and update efficiency

References

  • Original Patent: "Method of Providing Digital Signatures" by Ralph Merkle
  • Bitcoin Whitepaper: Satoshi Nakamoto's use of Merkle trees for SPV
  • Certificate Transparency: RFC 6962 - Merkle trees for audit logs
  • IPFS Documentation: Content addressing with Merkle DAGs
  • Ethereum Yellow Paper: Merkle Patricia trees for state
  • Git Internals: How Git uses hash trees
  • "Mastering Bitcoin": Chapter on Merkle trees by Andreas Antonopoulos
  • Cryptography Engineering: Ferguson, Schneier, Kohno - Hash trees chapter