Introduction
In modern computing, multi-core processors are everywhere from smartphones to cloud servers. But with multiple cores working simultaneously, how do we ensure that data stays consistent across caches? This is where cache coherence and synchronization come into play.
Imagine two people trying to update the same bank account from different ATMs at the same time. Without proper synchronization, chaos ensues—transactions get lost, balances become incorrect, and the system behaves unpredictably.
In this post, we'll explore:
- What cache coherence is and why it matters
- How synchronization prevents data corruption
- Different locking mechanisms and their trade-offs
By the end, you'll understand how hardware and software work together to keep shared data consistent.
The Problem: Inconsistent Memory Views
Let's start with a simple example:
1// Shared bank account with $10002int balance = 1000;34// Withdrawal function5void withdraw(int amount) {6 balance = balance - amount;7}
Now, suppose two threads (you and your dad) simultaneously withdraw $100 each. If the operations interleave poorly:
- Thread A reads balance = 1000.
- Thread B reads balance = 1000.
- Both subtract $100 and write back 900.
Instead of ending with $800, the account now has $900—a classic race condition.
Why Does This Happen?
- Each CPU core has its own cache (fast local memory).
- If two cores read and modify the same data independently, their caches become inconsistent.
- Without synchronization, the final result depends on unpredictable timing.
Solution #1: Cache Coherence
To prevent inconsistencies, hardware implements cache coherence protocols, ensuring all caches see a single, up-to-date version of data.
How It Works
Snooping (Bus-Based Systems)
- Each CPU "snoops" on the memory bus.
- If one CPU writes to a memory location, others invalidate their cached copies.
- Example:
- CPU 1 writes X = 32.
- CPU 2 and CPU 3 see this and mark their cached X as invalid.
- Next read forces them to fetch the latest value.
Directory-Based (Scalable Systems)
- A central directory tracks which cores have cached copies.
- On a write, the directory sends invalidation requests only to relevant caches.
Trade-offs
✔ Ensures consistency – All cores see the same data. ❌ Performance overhead – Extra bus traffic and delays.
Solution #2: Synchronization with Locks
Hardware coherence isn't enough—we also need software synchronization for multi-step operations (like withdrawing money).
Using Locks (Mutexes)
1void withdraw(int amount) {2 acquire(&lock); // Enter critical section3 balance -= amount;4 release(&lock); // Exit critical section5}
- Only one thread can hold the lock at a time.
- Others wait (spin or sleep) until the lock is free.
Spinlocks vs. Sleep-Based Locks
Spinlock | Sleep-Based Lock |
---|---|
Waits in a loop (busy-waiting) | Yields CPU to another task |
Low latency (good for short waits) | Avoids wasting CPU cycles |
Can cause cache thrashing | Adds scheduling overhead |
Advanced Locks (MCS, Bakery, Anderson's Lock)
- MCS Lock – Reduces cache contention by queuing waiters.
- Bakery Algorithm – Ensures fairness (like a bakery ticket system).
- Anderson's Lock – Uses an array to minimize cache coherence traffic.
When Synchronization Fails: Real-World Bugs
Even experts get it wrong. The Linux scull driver had a race condition:
1if (!dptr->data[s_pos]) {2 dptr->data[s_pos] = kmalloc(...); // Two threads might allocate twice!3 if (!dptr->data[s_pos]) goto out;4}
Result: Memory leaks or crashes if two threads execute this simultaneously.
Fix: Wrap the check in a lock!
Conclusion
Cache coherence ensures hardware-level consistency. Locks (mutexes, spinlocks) prevent software race conditions. Advanced algorithms (MCS, Bakery) optimize fairness and performance.
In multi-core systems, getting synchronization right is hard—but essential. Whether you're writing kernel code or high-performance apps, understanding these principles helps avoid subtle, costly bugs.