CAP Theorem in Distributed Systems (NoSQL)

 

CAP Theorem in Distributed Systems (NoSQL)

The CAP Theorem is an important concept in distributed databases, especially in NoSQL systems. It explains the trade-offs between three important properties when data is replicated across multiple nodes in a distributed system.

The theorem was introduced by Eric Brewer and later formally proven by Seth Gilbert and Nancy Lynch.


1. Why CAP Theorem Is Needed

In distributed databases, data is often replicated across multiple nodes (servers) to improve:

  • Performance

  • Availability

  • Fault tolerance

Example:

A data item X may exist in multiple nodes:

Node1 → X = 50 Node2 → X = 50 Node3 → X = 50

Now suppose two transactions update the data:

T1 updates X → Node1 T2 updates X → Node2

If updates are not synchronized immediately:

Node1 → X = 60 Node2 → X = 70

Now inconsistent copies exist.

Two users reading X from different nodes may get different values.

Managing these situations leads to the CAP theorem trade-offs.


2. The CAP Theorem

The CAP theorem states:

In a distributed system with replicated data, it is impossible to simultaneously guarantee all three properties: Consistency, Availability, and Partition Tolerance.

A system can guarantee only two of the three properties at the same time.


3. The Three Properties of CAP

The three properties represented by C, A, and P are:

CConsistency AAvailability PPartition Tolerance

3.1 Consistency (C)

Consistency means:

All nodes in the distributed system see the same data at the same time.

After a successful write operation:

  • Every read operation must return the latest updated value.

Example:

Write X = 100

All nodes must immediately reflect:

Node1 → X = 100 Node2 → X = 100 Node3 → X = 100

No user should read an outdated value.

This is similar to strong consistency.


3.2 Availability (A)

Availability means:

Every request (read or write) receives a response, even if some nodes fail.

The system must always be operational.

Example:

User sends request:

Read X

System must return:

  • the value, or

  • a valid response

But it should not deny service.


3.3 Partition Tolerance (P)

Partition tolerance means:

The system continues functioning even if network failures divide the system into partitions.

A network partition occurs when communication between nodes fails.

Example:

Cluster split into two parts Partition1 → Node1 Node2 Partition2 → Node3 Node4

Nodes in different partitions cannot communicate.

Yet the system must continue operating.


4. Why All Three Cannot Be Achieved

Imagine a network partition occurs.

Now the system must choose between:

Option 1 — Maintain Consistency

  • Reject requests from one partition

  • Prevent inconsistent updates

Result:

Consistency maintained Availability lost

Option 2 — Maintain Availability

  • Allow both partitions to accept requests

  • But updates may diverge

Result:

Availability maintained Consistency lost

Thus:

C + A + P together cannot be guaranteed.

Only two properties can be guaranteed simultaneously.


5. CAP System Types

Distributed systems typically fall into three categories.




5.1 CP Systems (Consistency + Partition Tolerance)

These systems guarantee:

  • Strong consistency

  • Partition tolerance

But sacrifice:

  • Availability during partitions

Example behavior:

If network partition occurs:

  • System rejects requests to maintain consistency.

Examples:

  • Apache HBase

  • Google Spanner


5.2 AP Systems (Availability + Partition Tolerance)

These systems guarantee:

  • Availability

  • Partition tolerance

But sacrifice:

  • Immediate consistency

Nodes may temporarily contain different data values.

Eventually they synchronize.

This is called eventual consistency.

Examples:

  • Amazon DynamoDB

  • Apache Cassandra


5.3 CA Systems (Consistency + Availability)

These systems guarantee:

  • Consistency

  • Availability

But assume:

  • No network partition occurs.

This is typical for traditional relational databases.

Examples:

  • MySQL

  • PostgreSQL

However, CA systems are difficult to maintain in large distributed environments.


6. Eventual Consistency in NoSQL

Many NoSQL systems choose AP (Availability + Partition tolerance).

They allow temporary inconsistency.

Eventually, all replicas become consistent.

Example timeline:

t1 → Node1 updated t2 → Node2 still old value t3 → replication occurs t4 → all nodes consistent

This model is called:

Eventual Consistency

7. Difference Between CAP Consistency and ACID Consistency

The word consistency in CAP and ACID does not mean exactly the same thing.

CAP consistency means:

All replicas of the same data item have identical values at the same time.

It focuses on replicated data across distributed nodes.

8. Comparison: ACID vs CAP

FeatureACIDCAP
FocusTransaction reliabilityDistributed system trade-offs
Consistency MeaningIntegrity constraints maintainedReplicated data identical
System TypeRelational databasesDistributed databases
Transaction SupportStrong transactionsOften relaxed transactions
PerformanceLower scalabilityHigher scalability

9. Why NoSQL Systems Relax Consistency

Large internet systems prioritize:

  • High availability

  • Fault tolerance

  • Massive scalability

Examples include:

  • Social networks

  • Online shopping platforms

  • Messaging systems

In these cases:

Temporary inconsistency is acceptable

For example:

A social media post appearing 2 seconds later on another user's feed is acceptable.

Thus many NoSQL systems choose:

Availability + Partition tolerance

10. Example Scenario

Imagine a social network storing user posts.

If a server fails:

AP system (NoSQL)

  • Users can still post

  • Some users may see old data temporarily

CP system

  • Some operations may be blocked

  • But data consistency is guaranteed


11. Summary

The CAP theorem explains the fundamental limitations of distributed databases.

A distributed system with replication must choose between:

Consistency Availability Partition Tolerance

But it can guarantee only two at the same time.

Most NoSQL systems prioritize:

Availability + Partition Tolerance

and use eventual consistency instead of strict consistency.



The CAP theorem states that a distributed database system cannot simultaneously guarantee consistency, availability, and partition tolerance. At most, two of these properties can be achieved at the same time.

Comments

Popular posts from this blog

Database Management Systems DBMS PCCST402 Semester 4 KTU CS 2024 Scheme

Introduction to Database Management System -DBMS

Data Models, Schemas and Instances