CAP Theorem in Distributed Systems (NoSQL)

The CAP Theorem is an important concept in distributed databases, especially in NoSQL systems. It explains the trade-offs between three important properties when data is replicated across multiple nodes in a distributed system.

The theorem was introduced by Eric Brewer and later formally proven by Seth Gilbert and Nancy Lynch.

1. Why CAP Theorem Is Needed

In distributed databases, data is often replicated across multiple nodes (servers) to improve:

Performance
Availability
Fault tolerance

Example:

A data item X may exist in multiple nodes:


Node1 → X = 50
Node2 → X = 50
Node3 → X = 50

Now suppose two transactions update the data:


T1 updates X → Node1
T2 updates X → Node2

If updates are not synchronized immediately:


Node1 → X = 60
Node2 → X = 70

Now inconsistent copies exist.

Two users reading X from different nodes may get different values.

Managing these situations leads to the CAP theorem trade-offs.

2. The CAP Theorem

The CAP theorem states:

In a distributed system with replicated data, it is impossible to simultaneously guarantee all three properties: Consistency, Availability, and Partition Tolerance.

A system can guarantee only two of the three properties at the same time.

3. The Three Properties of CAP

The three properties represented by C, A, and P are:


C → Consistency
A → Availability
P → Partition Tolerance

3.1 Consistency (C)

Consistency means:

All nodes in the distributed system see the same data at the same time.

After a successful write operation:

Every read operation must return the latest updated value.

Example:


Write X = 100

All nodes must immediately reflect:


Node1 → X = 100
Node2 → X = 100
Node3 → X = 100

No user should read an outdated value.

This is similar to strong consistency.

3.2 Availability (A)

Availability means:

Every request (read or write) receives a response, even if some nodes fail.

The system must always be operational.

Example:

User sends request:


Read X

System must return:

the value, or
a valid response

But it should not deny service.

3.3 Partition Tolerance (P)

Partition tolerance means:

The system continues functioning even if network failures divide the system into partitions.

A network partition occurs when communication between nodes fails.

Example:


Cluster split into two parts

Partition1 → Node1 Node2
Partition2 → Node3 Node4

Nodes in different partitions cannot communicate.

Yet the system must continue operating.

4. Why All Three Cannot Be Achieved

Imagine a network partition occurs.

Now the system must choose between:

Option 1 — Maintain Consistency

Reject requests from one partition
Prevent inconsistent updates

Result:


Consistency maintained
Availability lost

Option 2 — Maintain Availability

Allow both partitions to accept requests
But updates may diverge

Result:


Availability maintained
Consistency lost

Thus:


C + A + P together cannot be guaranteed.

Only two properties can be guaranteed simultaneously.

5. CAP System Types

Distributed systems typically fall into three categories.

5.1 CP Systems (Consistency + Partition Tolerance)

These systems guarantee:

Strong consistency
Partition tolerance

But sacrifice:

Availability during partitions

Example behavior:

If network partition occurs:

System rejects requests to maintain consistency.

Examples:

Apache HBase
Google Spanner

5.2 AP Systems (Availability + Partition Tolerance)

These systems guarantee:

Availability
Partition tolerance

But sacrifice:

Immediate consistency

Nodes may temporarily contain different data values.

Eventually they synchronize.

This is called eventual consistency.

Examples:

Amazon DynamoDB
Apache Cassandra

5.3 CA Systems (Consistency + Availability)

These systems guarantee:

Consistency
Availability

But assume:

No network partition occurs.

This is typical for traditional relational databases.

Examples:

MySQL
PostgreSQL

However, CA systems are difficult to maintain in large distributed environments.

6. Eventual Consistency in NoSQL

Many NoSQL systems choose AP (Availability + Partition tolerance).

They allow temporary inconsistency.

Eventually, all replicas become consistent.

Example timeline:


t1 → Node1 updated
t2 → Node2 still old value
t3 → replication occurs
t4 → all nodes consistent

This model is called:


Eventual Consistency

7. Difference Between CAP Consistency and ACID Consistency

The word consistency in CAP and ACID does not mean exactly the same thing.

CAP consistency means:

All replicas of the same data item have identical values at the same time.

It focuses on replicated data across distributed nodes.

8. Comparison: ACID vs CAP

Feature	ACID	CAP
Focus	Transaction reliability	Distributed system trade-offs
Consistency Meaning	Integrity constraints maintained	Replicated data identical
System Type	Relational databases	Distributed databases
Transaction Support	Strong transactions	Often relaxed transactions
Performance	Lower scalability	Higher scalability

9. Why NoSQL Systems Relax Consistency

Large internet systems prioritize:

High availability
Fault tolerance
Massive scalability

Examples include:

Social networks
Online shopping platforms
Messaging systems

In these cases:


Temporary inconsistency is acceptable

For example:

A social media post appearing 2 seconds later on another user's feed is acceptable.

Thus many NoSQL systems choose:


Availability + Partition tolerance

10. Example Scenario

Imagine a social network storing user posts.

If a server fails:

AP system (NoSQL)

Users can still post
Some users may see old data temporarily

CP system

Some operations may be blocked
But data consistency is guaranteed

11. Summary

The CAP theorem explains the fundamental limitations of distributed databases.

A distributed system with replication must choose between:


Consistency
Availability
Partition Tolerance

But it can guarantee only two at the same time.

Most NoSQL systems prioritize:


Availability + Partition Tolerance

and use eventual consistency instead of strict consistency.

The CAP theorem states that a distributed database system cannot simultaneously guarantee consistency, availability, and partition tolerance. At most, two of these properties can be achieved at the same time.

Search This Blog

Database Management Systems DBMS PCCST402 Semester 4 KTU CS 2024 Scheme - Dr Binu V P