Added contents for scalability

donnemartin · jatulya · Apr 12, 2025 · Apr 12, 2025 · Apr 12, 2025 · Apr 12, 2025
commit bf648502edb5c6ec33d6e7e2d4a07f4edcdecd1a
diff --git a/README.md b/README.md
@@ -379,13 +379,112 @@ First, you'll need a basic understanding of common principles, learning about wh
 
 [Scalability Lecture at Harvard](https://www.youtube.com/watch?v=-W9F__D3oY4)
 
-* Topics covered:
-    * Vertical scaling
-    * Horizontal scaling
-    * Caching
-    * Load balancing
-    * Database replication
-    * Database partitioning
+
+## What is Scalability?
+
+**Scalability** is the capability of a system to handle a growing amount of work or its potential to accommodate growth. A scalable system maintains or improves performance as load increases by proportionally increasing system resources.
+
+
+## Vertical Scaling
+
+**Vertical scaling** (scale-up) increases the capacity of a single server by adding more CPU, RAM, or storage.
+
+**Example**: Upgrading your DB server from 8GB RAM to 64GB RAM.
+
+**✅ Pros**
+- Easy to implement
+- No application code changes needed
+
+**❌ Cons**
+- Physical hardware limits
+- Downtime may be required
+- Becomes expensive quickly
+
+---
+
+## Horizontal Scaling
+
+**Horizontal scaling** (scale-out) adds more servers to the system and distributes the workload across them.
+
+**Example**: Adding more web servers behind a load balancer.
+
+**✅ Pros**
+- Scales better for large systems
+- Enables high availability and redundancy
+
+**❌ Cons**
+- More complex to manage
+- Requires stateless architecture and coordination
+
+---
+
+## Caching
+
+**Caching** stores frequently accessed data in memory for faster retrieval, reducing load on backend systems.
+
+### Common caching types:
+- **Browser cache**: Your browser saves website files like images, CSS, or JavaScript so it doesn’t have to download them again the next time you visit. This makes websites load faster.
+- **CDN cache**: A Content Delivery Network (CDN) stores copies of your content in many places around the world. When a user visits your site, the CDN gives them the closest copy, which loads faster.
+- **Server-side cache**: This is caching done on the backend using tools like Redis or Memcached. For example, if a database query is expensive (slow or heavy), the result can be saved in memory so it doesn’t need to be repeated.
+
+**✅ Pros**
+- Significantly improves response times
+- Reduces backend load
+
+**❌ Cons**
+- Sometimes the data in the cache is old and doesn’t match the latest data in the database.
+- It's tricky to know when to delete or update the cached data. If you do it too soon, you lose the benefit of caching; if too late, users may see outdated info.
+
+## Load Balancing
+
+**Load balancing** distributes incoming traffic across multiple servers to ensure no one server is overwhelmed.
+
+### Common strategies:
+- **Round-robin** :  Requests go to each server one by one, in a circle (like taking turns).
+- **Least connections** : Requests go to the server that is currently handling the fewest active connections. This helps keep load balanced more fairly.
+- **IP hashing** : The system uses the user's IP address to decide which server handles their requests. This way, the same user often gets routed to the same server.
+
+**✅ Pros**
+- High availability
+- Fault tolerance
+- Enables horizontal scaling
+
+**❌ Cons**
+- Can become a single point of failure (use redundant balancers)
+
+## 🛢️ Database Replication
+
+**Database replication** copies data from a primary (master) DB to one or more replicas (slaves).
+
+### Types:
+- **Master-slave**: Writes to master, reads from replicas
+- **Master-master**: Multiple writable nodes (more complex)
+
+**✅ Pros**
+- Improved read scalability
+- Redundancy and failover support
+
+**❌ Cons**
+- **Replication lag** : When you copy data from the main database to replicas, there's a small delay. The replicas might not have the very latest updates right away.
+- **Consistency issues in write-heavy apps** : If your app writes a lot of data (e.g., saving user actions), the replicas may fall behind, and different servers might show different versions of the data for a short time.
+
+## Database Partitioning [Sharding](https://learn.microsoft.com/en-us/azure/architecture/patterns/sharding)
+
+**Sharding** splits a large database into smaller parts, called shards, each stored on separate machines.
+
+### Types:
+- **Horizontal partitioning**: Splits by rows (e.g., user_id ranges)
+- **Vertical partitioning**: Splits by columns (e.g., profile vs activity data)
+
+**✅ Pros**
+- Improves performance and scaling
+- Avoids overloading a single node
+
+**❌ Cons**
+- Querying across shards is difficult
+- Requires smart shard key design
+- Rebalancing shards can be tricky
+
 
 ### Step 2: Review the scalability article