Updated system design section

2026-05-03 00:20:54 -04:00 · 2023-03-04 16:48:31 -05:00
parent 0e98989091
commit 78c4fb6a74
6 changed files with 155 additions and 18 deletions
--- a/docs/interview/sd/efficiency.md
+++ b/docs/interview/sd/efficiency.md
@@ -0,0 +1,25 @@
 # Measuring Efficiency
 When describing the efficiency of a system, we look at a few different properties.
 ## Latency
 Latency describes the time it takes for a machine to perform a certain operation.  Every method of retrieving information has different time costs.  A general rule of thumb:
 - Reading 1MB from RAM - 0.25ms
 - Reading 1MB from SSD - 1ms
 - Transfer 1MB over network - 10ms
 - Reading 1MB from HDD - 20ms
 - Intercontinental round-trip - 150ms
 ## Throughput
 While latency measures the time operations take, throughput describes the number of operations that can be processed in a given amount of time, such as requests per second.
 ## Availablility
 Availability describes the percentage of time when a server is up and running.  This is usually measured in percentages or more typically **nines**.  If a server is available 99% of the time, we say that it has 2 nines of availability.  A system is **highly available** if it has five or more nines of availability.
 ### Redundancy
 Redundency is a means of increasing availability by having copies of the system ready to go in case the main system fails.
--- a/docs/interview/sd/impl.md
+++ b/docs/interview/sd/impl.md
@@ -0,0 +1,95 @@
 # Implementing Efficiency Techniques
 ## Leader Election
 Leader election is the process of nodes in a cluster elect a leader to perform the primary functions of the service.  That way all the nodes in the system know who the leader is and can elect a new leader if the current leader dies.  They do this by using a **Concensus Algorithm** such as Paxos or Raft, and using a third-party key-value service such as Etcd and Zookeeper.
 ### Python Implementation using Etcd
 ```python
 import etcd
 import sys
 import time
 from threading import Event
 LEADER_KEY = 'LEADER_KEY'
 def main(server_name):
    client = etcd.client(host="localhost", port=2379)
    while True:
        is_leader, lease = leader_election(client, server_name)
        if is_leader:
            print("I am the leader")
            on_leadership_gained(lease)
        else:
            print("I am a follower")
            wait_for_next_election(client)
 def leader_election(client, server_name):
    print("New leader election happening")
    lease = client.lease(5) # Must renew lease every 5 seconds or new leader is elected
    is_leader = try_insert(client, LEADER_KEY, server_name, lease)
    return is_leader, lease
 def try_insert(client, key, server_name, lease):
    insert_succeeded = client.transaction(
        failure=[],
        success=[client.transaction.put(key, server_name, lease)],
        compare=[client.transaction.version(key) == 0]
    )
    return insert_succeeded
 def on_leadership_gained(lease):
    while True:
        try:
            print("Refreshing lease, still the leader")
            lease.refresh()
            do_work()
        except Exception:
            lease.revoke()
            return
        except KeyboardInterrupt:
            lease.revoke()
            sys.exit(1)
 def wait_for_next_election(client):
    election_event = Event()
    def watch_callback(resp):
        for event in resp.events:
            if isinstance(event, etcd.events.DeleteEvent):
                print("Leader election required")
                election_event.set()
    watch_id = client.add_watch_callback(LEADER_KEY, watch_callback)
    try:
        while not election_event.is_set():
            time.sleep(1)
    except KeyboardInterrupt:
        client.cancel_watch(watch_id)
        sys.exit(1)
    client.cancel_watch()
 def do_work():
    time.sleep(1)
 ```
 ## Polling and Streaming
 **Polling** is the act of requesting data updates at a regular interval.  This is typically done when the server has a REST API.  **Streaming** is the act of getting continuous data updates fed from the server through an open connection.  This is achieved using web sockets, which keeps an open connection between the server and client to allow for either party to send information at either time.  Streaming is preferred when the information is time sensitive or when you would want the data update as soon as it happened.  
 ### Pub Sub
 Pub Sub, or publsihing and subscribing, is a method of dividing streamed data by topics that clients can subscribe to.  Then, when a new event is published for that topic, all of the clients subscribed will receive the update.  These systems often come with guarantees such as at-least-one delivery, persistent storage/queues, ordering of messages, and replayability of messages.  These messages also typically have to be **idempotent** operations, which means the outcome has the be the same regardless of how many times the event takes place.  If the same message is sent multiple times on a pub sub framework, it must typically have the same effect on all clients.  Some popular Pub Sub frameworks include Apache Kafka and Cloud Pub/Sub.
 ## Configuration
 Configuration is a set of variables/parameters that determine certain behaviors within the application.  **Static** configuration is hard coded and shipped with the application, and **Dynamic** configuration is kept outside of the system and can be edited easier.  Typical static configuration languages include JSON and YAML, which dynamic configurations use third-party key-value stores.
 ## Rate Limiting
 Rate Limiting is the process of limiting the number of requests that can be made to the system.  This is typically done by IP address to prevent people from abusing the server and taking up all of the resources.  This type of attack is know as a **DoS Attack**, or denial of service.  If this attack is performed from multiple machines, that is a **DDoS Attack**, or distributed denial of service, and is much harder to defend against.  This is typically done with a key-value store such as Redis to keep track of how many times a particular IP accesses a service.
--- a/docs/interview/sd/improvements.md
+++ b/docs/interview/sd/improvements.md
@@ -0,0 +1,29 @@
 # Improving Efficiency
 ## Caching
 Caching is the process of saving data so that it is faster to retrieve than doing all the way to the data source or performing expensive computations.  This can reduce the overall processing time at the cost of using extra memory.  An example of caching at a large scale are **CDNs**, or Content Delivery Networks, which are third party caches used to cache website data such as Cloudflare and Google Cloud CDN.  Some other popular caching softwares include:
 - Redis: Really fast in memory key-value store with some persistent storage options, often used in rate limiting.
 - Etcd: Strongly consistent and highly available key-value store, often used in leader election.
 - Zookeeper: Another strongly consistent and highly available key-value store, often used for configuration or leader election.
 ## Proxies
 Proxies are processes that stand between the client and server. **Forward** proxies work on the behalf of the client, such as a VPN.  **Reverse** proxies act on the behalf of the server, such as loggers, load balancers, and caches.
 ### Load balancers
 Load balancers receive all traffic for a service and distribute it to multiple processes.  The use a **Server-Selection Strategy** to determine how to divide the traffic.  Some strategies include round robin, random selection, performance-based selection, or location/IP-based selection.  If the server workload becomes un-even, then we have a **hot spot** and the selection strategy may need tuning.
 ## Replication
 Replication is a method of increasing redundency by actively duplicating data from one database server to another.  It can also be used to decrease latency by copying data to servers closer to the user.
 ## Sharding
 Sharding, or data partitioning, is the process of splitting databases in pieces to increase the throughput of the system.  By splitting the data, queries will be able to be performed faster on any single database server.  Sharding can be done by client region, type of data being stored, or some hashing function of a column.  This will require a load balancer to send the requests to the correct database server.
 ## Peer to Peer Networks
 P2P networks use a collection of machines (or peers) to divide a workload amongst themselves and reduce the overall processing time.  This is especially useful for file distribution.  Instead of downloading a file from a single server, the file is spread in chunks to peers across the network.  Then a new peer can request those pieces of data from the other peers, spreading the strain from one server to many.  This can be done without a centralized source of data and is called a **Gossip Protocol**.
--- a/docs/interview/sd/latency.md
+++ b/docs/interview/sd/latency.md
@@ -1,13 +0,0 @@
 # Latency
 Latency describes the time it takes for a machine to perform a certain operation.  Every method of retrieving information has different time costs.  A general rule of thumb:
 - Reading 1MB from RAM - 0.25ms
 - Reading 1MB from SSD - 1ms
 - Transfer 1MB over network - 10ms
 - Reading 1MB from HDD - 20ms
 - Intercontinental round-trip - 150ms
 ## Throughput
 While latency measures the time operations take, throughput describes the number of operations that can be processed in a given amount of time, such as requests per second.
--- a/docs/interview/sd/scaling.md
+++ b/docs/interview/sd/scaling.md
@@ -1,6 +1,6 @@
 # Scaling
-Scaling is the problem of supporting more and more users as your applications grow.  In order to handle more requests, we need more hardware.
+Scaling is the problem of supporting more and more users as your applications grow.  These are ways to increase your throughput. In order to handle more requests, we need more hardware.
 ## Horizontal Scaling
@@ -12,8 +12,7 @@ Some of the disadvantages of this approad are requiring load balancing, slower i
 ## Vertical Scaling
-Vertical scaling is a method of increasing scalability by adding capablility to a single machine.  To 
+Vertical scaling is a method of increasing scalability by adding capablility to a single machine.  To process more requests, you make your server faster by increasing the hardware.  
 process more requests, you make your server faster by increasing the hardware.  
 Some of the advantages of this approach compared to horizontal scaling are that there is no load balancing required, communication between processes is faster, and data will be consistent among processes on the server.
--- a/docs/interview/sidebar.json
+++ b/docs/interview/sidebar.json
@@ -24,8 +24,10 @@
        "text": "System Design",
        "items": [
            {"text": "Basics of the Internet", "link": "/interview/sd/basics"},
-            {"text": "Latency", "link": "/interview/sd/latency"},
+            {"text": "Efficiency", "link": "/interview/sd/efficiency"},
-            {"text": "Scaling", "link": "/interview/sd/scaling"}
+            {"text": "Scaling", "link": "/interview/sd/scaling"},
            {"text": "Improvements", "link": "/interview/sd/improvements"},
            {"text": "Implementation", "link": "/interview/sd/impl"}
        ]
    }
 ]