Skip to content

Patterns from Git

Git's data model is built on copy-on-write immutable objects and efficient diffing.

PatternWhereWhat It Does
Copy-on-Writeobject-file.cContent-addressed immutable objects; branches share data, copy only on change
Diff / Patchdiff.c, xdiff/Myers' diff algorithm for minimal edit distance between file versions
Bitmaskread-cache-ll.hCE_* cache entry flags — staged, valid, intent-to-add
Bloom Filterbloom.cChanged-path bloom filters for faster git log -- <path>
Trieread-cache.cName hash table for fast directory-level path lookup
LRU Cachepack-objects.cDelta base cache for reusing computed deltas during pack
Merkle Treetree.cContent-addressed Merkle DAG — every commit, tree, blob is hashed; changing one byte changes all hashes up to root

How They Compose: git commit

When you run git commit, multiple patterns work together to create an immutable, verifiable snapshot:

git commit -m "fix bug"
1
Diff / Patch

Git computes the diff between the index (staging area) and the working tree to determine what changed.

2
Copy-on-Write

Changed files become new blob objects. Unchanged files are shared by reference (same SHA-1 → same object). No data is copied unless it actually changed.

3
Merkle Tree

Tree objects hash their children. A changed blob changes its parent tree hash, which changes the commit hash. Any tampering anywhere is detectable from the root.

4
Bloom Filter

The commit-graph file stores changed-path bloom filters. Future git log queries can skip commits that didn't touch a given path without reading the tree.

The core insight is that copy-on-write + Merkle hashing gives Git both space efficiency (shared objects) and integrity verification (tamper-evident hashes) with no trade-off between the two.

Further Reading

Released under the MIT License.