Patterns from Git
Git's data model is built on copy-on-write immutable objects and efficient diffing.
| Pattern | Where | What It Does |
|---|---|---|
| Copy-on-Write | object-file.c | Content-addressed immutable objects; branches share data, copy only on change |
| Diff / Patch | diff.c, xdiff/ | Myers' diff algorithm for minimal edit distance between file versions |
| Bitmask | read-cache-ll.h | CE_* cache entry flags — staged, valid, intent-to-add |
| Bloom Filter | bloom.c | Changed-path bloom filters for faster git log -- <path> |
| Trie | read-cache.c | Name hash table for fast directory-level path lookup |
| LRU Cache | pack-objects.c | Delta base cache for reusing computed deltas during pack |
| Merkle Tree | tree.c | Content-addressed Merkle DAG — every commit, tree, blob is hashed; changing one byte changes all hashes up to root |
How They Compose: git commit
When you run git commit, multiple patterns work together to create an immutable, verifiable snapshot:
git commit -m "fix bug"▼Git computes the diff between the index (staging area) and the working tree to determine what changed.
Changed files become new blob objects. Unchanged files are shared by reference (same SHA-1 → same object). No data is copied unless it actually changed.
Tree objects hash their children. A changed blob changes its parent tree hash, which changes the commit hash. Any tampering anywhere is detectable from the root.
The commit-graph file stores changed-path bloom filters. Future git log queries can skip commits that didn't touch a given path without reading the tree.
The core insight is that copy-on-write + Merkle hashing gives Git both space efficiency (shared objects) and integrity verification (tamper-evident hashes) with no trade-off between the two.