System Programming

Cache Coherence and Synchronization: Ensuring Data Consistency in Multi-Core Systems

2024-April-06 · 7 min read

Learn how cache coherence protocols and synchronization mechanisms work together to prevent data corruption in modern multi-core computing environments.

Introduction

In modern computing, multi-core processors are everywhere from smartphones to cloud servers. But with multiple cores working simultaneously, how do we ensure that data stays consistent across caches? This is where cache coherence and synchronization come into play.

Imagine two people trying to update the same bank account from different ATMs at the same time. Without proper synchronization, chaos ensues—transactions get lost, balances become incorrect, and the system behaves unpredictably.

In this post, we'll explore:

What cache coherence is and why it matters
How synchronization prevents data corruption
Different locking mechanisms and their trade-offs

By the end, you'll understand how hardware and software work together to keep shared data consistent.

Cache Coherence and Synchronization in Multi-Core Systems

The Problem: Inconsistent Memory Views

Let's start with a simple example:

bank-account.c

1// Shared bank account with $1000
2int balance = 1000;
3
4// Withdrawal function
5void withdraw(int amount) {
6    balance = balance - amount;
7}

Now, suppose two threads (you and your dad) simultaneously withdraw $100 each. If the operations interleave poorly:

Thread A reads balance = 1000.
Thread B reads balance = 1000.
Both subtract $100 and write back 900.

Instead of ending with $800, the account now has $900—a classic race condition.

Why Does This Happen?

Each CPU core has its own cache (fast local memory).
If two cores read and modify the same data independently, their caches become inconsistent.
Without synchronization, the final result depends on unpredictable timing.

Solution #1: Cache Coherence

To prevent inconsistencies, hardware implements cache coherence protocols, ensuring all caches see a single, up-to-date version of data.

How It Works

Snooping (Bus-Based Systems)

Each CPU "snoops" on the memory bus.
If one CPU writes to a memory location, others invalidate their cached copies.
Example:
- CPU 1 writes X = 32.
- CPU 2 and CPU 3 see this and mark their cached X as invalid.
- Next read forces them to fetch the latest value.

Directory-Based (Scalable Systems)

A central directory tracks which cores have cached copies.
On a write, the directory sends invalidation requests only to relevant caches.

Trade-offs

✔ Ensures consistency – All cores see the same data. ❌ Performance overhead – Extra bus traffic and delays.

Solution #2: Synchronization with Locks

Hardware coherence isn't enough—we also need software synchronization for multi-step operations (like withdrawing money).

Using Locks (Mutexes)

synchronized-bank.c

1void withdraw(int amount) {
2    acquire(&lock);       // Enter critical section
3    balance -= amount;
4    release(&lock);       // Exit critical section
5}

Only one thread can hold the lock at a time.
Others wait (spin or sleep) until the lock is free.

Spinlocks vs. Sleep-Based Locks

Spinlock	Sleep-Based Lock
Waits in a loop (busy-waiting)	Yields CPU to another task
Low latency (good for short waits)	Avoids wasting CPU cycles
Can cause cache thrashing	Adds scheduling overhead

Advanced Locks (MCS, Bakery, Anderson's Lock)

MCS Lock – Reduces cache contention by queuing waiters.
Bakery Algorithm – Ensures fairness (like a bakery ticket system).
Anderson's Lock – Uses an array to minimize cache coherence traffic.

When Synchronization Fails: Real-World Bugs

Even experts get it wrong. The Linux scull driver had a race condition:

linux-scull-driver.c

1if (!dptr->data[s_pos]) {
2    dptr->data[s_pos] = kmalloc(...);  // Two threads might allocate twice!
3    if (!dptr->data[s_pos]) goto out;
4}

Result: Memory leaks or crashes if two threads execute this simultaneously.

Fix: Wrap the check in a lock!

Conclusion

Cache coherence ensures hardware-level consistency. Locks (mutexes, spinlocks) prevent software race conditions. Advanced algorithms (MCS, Bakery) optimize fairness and performance.

In multi-core systems, getting synchronization right is hard—but essential. Whether you're writing kernel code or high-performance apps, understanding these principles helps avoid subtle, costly bugs.