CAP Theorem in Distributed Systems (NoSQL)
CAP Theorem in Distributed Systems (NoSQL)
The CAP Theorem is an important concept in distributed databases, especially in NoSQL systems. It explains the trade-offs between three important properties when data is replicated across multiple nodes in a distributed system.
The theorem was introduced by Eric Brewer and later formally proven by Seth Gilbert and Nancy Lynch.
1. Why CAP Theorem Is Needed
In distributed databases, data is often replicated across multiple nodes (servers) to improve:
-
Performance
-
Availability
-
Fault tolerance
Example:
A data item X may exist in multiple nodes:
Now suppose two transactions update the data:
If updates are not synchronized immediately:
Now inconsistent copies exist.
Two users reading X from different nodes may get different values.
Managing these situations leads to the CAP theorem trade-offs.
2. The CAP Theorem
The CAP theorem states:
In a distributed system with replicated data, it is impossible to simultaneously guarantee all three properties: Consistency, Availability, and Partition Tolerance.
A system can guarantee only two of the three properties at the same time.
3. The Three Properties of CAP
The three properties represented by C, A, and P are:
3.1 Consistency (C)
Consistency means:
All nodes in the distributed system see the same data at the same time.
After a successful write operation:
-
Every read operation must return the latest updated value.
Example:
All nodes must immediately reflect:
No user should read an outdated value.
This is similar to strong consistency.
3.2 Availability (A)
Availability means:
Every request (read or write) receives a response, even if some nodes fail.
The system must always be operational.
Example:
User sends request:
System must return:
-
the value, or
-
a valid response
But it should not deny service.
3.3 Partition Tolerance (P)
Partition tolerance means:
The system continues functioning even if network failures divide the system into partitions.
A network partition occurs when communication between nodes fails.
Example:
Nodes in different partitions cannot communicate.
Yet the system must continue operating.
4. Why All Three Cannot Be Achieved
Imagine a network partition occurs.
Now the system must choose between:
Option 1 — Maintain Consistency
-
Reject requests from one partition
-
Prevent inconsistent updates
Result:
Option 2 — Maintain Availability
-
Allow both partitions to accept requests
-
But updates may diverge
Result:
Thus:
Only two properties can be guaranteed simultaneously.
5. CAP System Types
Distributed systems typically fall into three categories.
5.1 CP Systems (Consistency + Partition Tolerance)
These systems guarantee:
-
Strong consistency
-
Partition tolerance
But sacrifice:
-
Availability during partitions
Example behavior:
If network partition occurs:
-
System rejects requests to maintain consistency.
Examples:
-
Apache HBase
-
Google Spanner
5.2 AP Systems (Availability + Partition Tolerance)
These systems guarantee:
-
Availability
-
Partition tolerance
But sacrifice:
-
Immediate consistency
Nodes may temporarily contain different data values.
Eventually they synchronize.
This is called eventual consistency.
Examples:
-
Amazon DynamoDB
-
Apache Cassandra
5.3 CA Systems (Consistency + Availability)
These systems guarantee:
-
Consistency
-
Availability
But assume:
-
No network partition occurs.
This is typical for traditional relational databases.
Examples:
-
MySQL
-
PostgreSQL
However, CA systems are difficult to maintain in large distributed environments.
6. Eventual Consistency in NoSQL
Many NoSQL systems choose AP (Availability + Partition tolerance).
They allow temporary inconsistency.
Eventually, all replicas become consistent.
Example timeline:
This model is called:
7. Difference Between CAP Consistency and ACID Consistency
The word consistency in CAP and ACID does not mean exactly the same thing.
CAP consistency means:
All replicas of the same data item have identical values at the same time.
It focuses on replicated data across distributed nodes.
8. Comparison: ACID vs CAP
| Feature | ACID | CAP |
|---|---|---|
| Focus | Transaction reliability | Distributed system trade-offs |
| Consistency Meaning | Integrity constraints maintained | Replicated data identical |
| System Type | Relational databases | Distributed databases |
| Transaction Support | Strong transactions | Often relaxed transactions |
| Performance | Lower scalability | Higher scalability |
9. Why NoSQL Systems Relax Consistency
Large internet systems prioritize:
-
High availability
-
Fault tolerance
-
Massive scalability
Examples include:
-
Social networks
-
Online shopping platforms
-
Messaging systems
In these cases:
For example:
A social media post appearing 2 seconds later on another user's feed is acceptable.
Thus many NoSQL systems choose:
10. Example Scenario
Imagine a social network storing user posts.
If a server fails:
AP system (NoSQL)
-
Users can still post
-
Some users may see old data temporarily
CP system
-
Some operations may be blocked
-
But data consistency is guaranteed
11. Summary
The CAP theorem explains the fundamental limitations of distributed databases.
A distributed system with replication must choose between:
But it can guarantee only two at the same time.
Most NoSQL systems prioritize:
and use eventual consistency instead of strict consistency.
The CAP theorem states that a distributed database system cannot simultaneously guarantee consistency, availability, and partition tolerance. At most, two of these properties can be achieved at the same time.

Comments
Post a Comment