{ "_hYN0gEi9BL24nptEtXWU": { "title": "Introduction", "description": "System design is the process of defining the elements of a system, as well as their interactions and relationships, in order to satisfy a set of specified requirements.\n\nIt involves taking a problem statement, breaking it down into smaller components and designing each component to work together effectively to achieve the overall goal of the system. This process typically includes analyzing the current system (if any) and determining any deficiencies, creating a detailed plan for the new system, and testing the design to ensure that it meets the requirements. It is an iterative process that may involve multiple rounds of design, testing, and refinement.\n\nIn software engineering, system design is a phase in the software development process that focuses on the high-level design of a software system, including the architecture and components.\n\nIt is also one of the important aspects of the interview process for software engineers. Most of the companies have a dedicated system design interview round, where they ask the candidates to design a system for a given problem statement. The candidates are expected to come up with a detailed design of the system, including the architecture, components, and their interactions. They are also expected to discuss the trade-offs involved in their design and the alternatives that they considered.", "links": [] }, "idLHBxhvcIqZTqmh_E8Az": { "title": "What is System Design?", "description": "System design is the process of defining the elements of a system, as well as their interactions and relationships, in order to satisfy a set of specified requirements.\n\nIt involves taking a problem statement, breaking it down into smaller components and designing each component to work together effectively to achieve the overall goal of the system. This process typically includes analyzing the current system (if any) and determining any deficiencies, creating a detailed plan for the new system, and testing the design to ensure that it meets the requirements. It is an iterative process that may involve multiple rounds of design, testing, and refinement.\n\nIn software engineering, system design is a phase in the software development process that focuses on the high-level design of a software system, including the architecture and components.\n\nIt is also one of the important aspects of the interview process for software engineers. Most of the companies have a dedicated system design interview round, where they ask the candidates to design a system for a given problem statement. The candidates are expected to come up with a detailed design of the system, including the architecture, components, and their interactions. They are also expected to discuss the trade-offs involved in their design and the alternatives that they considered.", "links": [] }, "os3Pa6W9SSNEzgmlBbglQ": { "title": "How to approach System Design?", "description": "There are several steps that can be taken when approaching a system design:\n\n* **Understand the problem**: Gather information about the problem you are trying to solve and the requirements of the system. Identify the users and their needs, as well as any constraints or limitations of the system.\n* **Identify the scope of the system:** Define the boundaries of the system, including what the system will do and what it will not do.\n* **Research and analyze existing systems:** Look at similar systems that have been built in the past and identify what worked well and what didn't. Use this information to inform your design decisions.\n* **Create a high-level design:** Outline the main components of the system and how they will interact with each other. This can include a rough diagram of the system's architecture, or a flowchart outlining the process the system will follow.\n* **Refine the design:** As you work on the details of the design, iterate and refine it until you have a complete and detailed design that meets all the requirements.\n* **Document the design:** Create detailed documentation of your design for future reference and maintenance.\n* **Continuously monitor and improve the system:** The system design is not a one-time process, it needs to be continuously monitored and improved to meet the changing requirements.\n\nNote that this is a general approach to System Design. For interview specific answers, see the following resources:", "links": [ { "title": "How to approach System Design?", "url": "https://github.com/donnemartin/system-design-primer#how-to-approach-a-system-design-interview-question", "type": "opensource" }, { "title": "What are system design questions?", "url": "https://www.hiredintech.com/system-design", "type": "article" }, { "title": "My System Design Template", "url": "https://leetcode.com/discuss/career/229177/My-System-Design-Template", "type": "article" }, { "title": "Intro to Architecture and Systems Design Interviews", "url": "https://www.youtube.com/watch?v=ZgdS0EUmn70", "type": "video" } ] }, "e_15lymUjFc6VWqzPnKxG": { "title": "Performance vs Scalability", "description": "A service is **scalable** if it results in increased **performance** in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as when datasets grow.\n\nAnother way to look at performance vs scalability:\n\n* If you have a **performance** problem, your system is slow for a single user.\n* If you have a **scalability** problem, your system is fast for a single user but slow under heavy load.\n\nTo learn more, visit the following links:", "links": [ { "title": "Scalability, Availability & Stability Patterns", "url": "https://www.slideshare.net/jboner/scalability-availability-stability-patterns/", "type": "article" }, { "title": "A Word on Scalability", "url": "https://www.allthingsdistributed.com/2006/03/a_word_on_scalability.html", "type": "article" }, { "title": "Performance vs Scalability", "url": "https://blog.professorbeekums.com/performance-vs-scalability/", "type": "article" }, { "title": "Explore top posts about Performance", "url": "https://app.daily.dev/tags/performance?ref=roadmapsh", "type": "article" } ] }, "O3wAHLnzrkvLWr4afHDdr": { "title": "Latency vs Throughput", "description": "Latency and throughput are two important measures of a system's performance. **Latency** refers to the amount of time it takes for a system to respond to a request. **Throughput** refers to the number of requests that a system can handle at the same time.\n\nGenerally, you should aim for maximal throughput with acceptable latency.\n\nLearn more from the following links:", "links": [ { "title": "System Design: Latency vs Throughput", "url": "https://cs.fyi/guide/latency-vs-throughput/", "type": "article" }, { "title": "Understanding Latency versus Throughput", "url": "https://community.cadence.com/cadence_blogs_8/b/fv/posts/understanding-latency-vs-throughput", "type": "article" }, { "title": "Latency and Throughput - MIT", "url": "https://www.youtube.com/watch?v=3HIV4MnLGCw", "type": "video" } ] }, "uJc27BNAuP321HQNbjftn": { "title": "Availability vs Consistency", "description": "Availability refers to the ability of a system to provide its services to clients even in the presence of failures. This is often measured in terms of the percentage of time that the system is up and running, also known as its uptime.\n\nConsistency, on the other hand, refers to the property that all clients see the same data at the same time. This is important for maintaining the integrity of the data stored in the system.\n\nIn distributed systems, it is often a trade-off between availability and consistency. Systems that prioritize high availability may sacrifice consistency, while systems that prioritize consistency may sacrifice availability. Different distributed systems use different approaches to balance the trade-off between availability and consistency, such as using replication or consensus algorithms.\n\nHave a look at the following resources to learn more:", "links": [ { "title": "CAP FAQ", "url": "https://github.com/henryr/cap-faq", "type": "opensource" }, { "title": "CAP Theorem Revisited", "url": "https://robertgreiner.com/cap-theorem-revisited/", "type": "article" }, { "title": "A plain english introduction to CAP Theorem", "url": "http://ksat.me/a-plain-english-introduction-to-cap-theorem", "type": "article" }, { "title": "CAP Theorem", "url": "https://www.youtube.com/watch?v=_RbsFXWRZ10&t=1s", "type": "video" } ] }, "tcGdVQsCEobdV9hgOq3eG": { "title": "CAP Theorem", "description": "According to CAP theorem, in a distributed system, you can only support two of the following guarantees:\n\n* **Consistency** - Every read receives the most recent write or an error\n* **Availability** - Every request receives a response, without guarantee that it contains the most recent version of the information\n* **Partition Tolerance** - The system continues to operate despite arbitrary partitioning due to network failures\n\nNetworks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.\n\nCP - consistency and partition tolerance\n----------------------------------------\n\nWaiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.\n\nAP - availability and partition tolerance\n-----------------------------------------\n\nResponses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.\n\nAP is a good choice if the business needs to allow for [eventual consistency](https://github.com/donnemartin/system-design-primer#eventual-consistency) or when the system needs to continue working despite external errors.\n\nFor more information, have a look at the following resources:", "links": [ { "title": "CAP FAQ", "url": "https://github.com/henryr/cap-faq", "type": "opensource" }, { "title": "CAP theorem revisited", "url": "http://robertgreiner.com/2014/08/cap-theorem-revisited/", "type": "article" }, { "title": "A plain english introduction to CAP theorem", "url": "http://ksat.me/a-plain-english-introduction-to-cap-theorem", "type": "article" }, { "title": "The CAP theorem", "url": "https://www.youtube.com/watch?v=k-Yaq8AHlFA", "type": "video" } ] }, "GHe8V-REu1loRpDnHbyUn": { "title": "Consistency Patterns", "description": "Consistency patterns refer to the ways in which data is stored and managed in a distributed system, and how that data is made available to users and applications. There are three main types of consistency patterns:\n\n* Strong consistency\n* Weak consistency\n* Eventual Consistency\n\nEach of these patterns has its own advantages and disadvantages, and the choice of which pattern to use will depend on the specific requirements of the application or system.\n\nHave a look at the following resources to learn more:", "links": [ { "title": "Consistency Patterns in Distributed Systems", "url": "https://cs.fyi/guide/consistency-patterns-week-strong-eventual/", "type": "article" } ] }, "EKD5AikZtwjtsEYRPJhQ2": { "title": "Weak Consistency", "description": "After an update is made to the data, it is not guaranteed that any subsequent read operation will immediately reflect the changes made. The read may or may not see the recent write.\n\nTo learn more, visit the following links:", "links": [ { "title": "Consistency Patterns in Distributed Systems", "url": "https://cs.fyi/guide/consistency-patterns-week-strong-eventual/", "type": "article" } ] }, "rRDGVynX43inSeQ9lR_FS": { "title": "Eventual Consistency", "description": "Eventual consistency is a form of Weak Consistency. After an update is made to the data, it will be eventually visible to any subsequent read operations. The data is replicated in an asynchronous manner, ensuring that all copies of the data are eventually updated.\n\nTo learn more, visit the following links:", "links": [ { "title": "Consistency Patterns in Distributed Systems", "url": "https://cs.fyi/guide/consistency-patterns-week-strong-eventual/", "type": "article" } ] }, "JjB7eB8gdRCAYf5M0RcT7": { "title": "Strong Consistency", "description": "After an update is made to the data, it will be immediately visible to any subsequent read operations. The data is replicated in a synchronous manner, ensuring that all copies of the data are updated at the same time.\n\nTo learn more, visit the following links:", "links": [ { "title": "Consistency Patterns in Distributed Systems", "url": "https://cs.fyi/guide/consistency-patterns-week-strong-eventual/", "type": "article" } ] }, "ezptoTqeaepByegxS5kHL": { "title": "Availability Patterns", "description": "Availability is measured as a percentage of uptime, and defines the proportion of time that a system is functional and working. Availability is affected by system errors, infrastructure problems, malicious attacks, and system load. Cloud applications typically provide users with a service level agreement (SLA), which means that applications must be designed and implemented to maximize availability.", "links": [ { "title": "Availability Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#availability", "type": "article" } ] }, "L_jRfjvMGjFbHEbozeVQl": { "title": "Fail-Over", "description": "Failover is an availability pattern that is used to ensure that a system can continue to function in the event of a failure. It involves having a backup component or system that can take over in the event of a failure.\n\nIn a failover system, there is a primary component that is responsible for handling requests, and a secondary (or backup) component that is on standby. The primary component is monitored for failures, and if it fails, the secondary component is activated to take over its duties. This allows the system to continue functioning with minimal disruption.\n\nFailover can be implemented in various ways, such as active-passive, active-active, and hot-standby.\n\nActive-passive\n--------------\n\nWith active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active's IP address and resumes service.\n\nThe length of downtime is determined by whether the passive server is already running in 'hot' standby or whether it needs to start up from 'cold' standby. Only the active server handles traffic.\n\nActive-passive failover can also be referred to as master-slave failover.\n\nActive-active\n-------------\n\nIn active-active, both servers are managing traffic, spreading the load between them.\n\nIf the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internal-facing, application logic would need to know about both servers.\n\nActive-active failover can also be referred to as master-master failover.\n\nDisadvantages of Failover\n-------------------------\n\n* Fail-over adds more hardware and additional complexity.\n* There is a potential for loss of data if the active system fails before any newly written data can be replicated to the passive.\n\nTo learn more visit the following links:", "links": [ { "title": "Fail Over Pattern - High Availability", "url": "https://www.filecloud.com/blog/2015/12/architectural-patterns-for-high-availability/", "type": "article" } ] }, "0RQ5jzZKdadYY0h_QZ0Bb": { "title": "Replication", "description": "Replication is an availability pattern that involves having multiple copies of the same data stored in different locations. In the event of a failure, the data can be retrieved from a different location. There are two main types of replication: Master-Master replication and Master-Slave replication.\n\n* **Master-Master replication:** In this type of replication, multiple servers are configured as \"masters,\" and each one can accept read and write operations. This allows for high availability and allows any of the servers to take over if one of them fails. However, this type of replication can lead to conflicts if multiple servers update the same data at the same time, so some conflict resolution mechanism is needed to handle this.\n \n* **Master-Slave replication:** In this type of replication, one server is designated as the \"master\" and handles all write operations, while multiple \"slave\" servers handle read operations. If the master fails, one of the slaves can be promoted to take its place. This type of replication is simpler to set up and maintain compared to Master-Master replication.\n \n\nVisit the following links for more resources:", "links": [ { "title": "Replication: Availability Pattern", "url": "https://github.com/donnemartin/system-design-primer#replication", "type": "opensource" } ] }, "uHdrZllrZFAnVkwIB3y5-": { "title": "Availability in Numbers", "description": "Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in number of 9s--a service with 99.99% availability is described as having four 9s.\n\n99.9% Availability - Three 9s:\n------------------------------\n\n Duration | Acceptable downtime\n ------------- | -------------\n Downtime per year | 8h 41min 38s\n Downtime per month | 43m 28s\n Downtime per week | 10m 4.8s\n Downtime per day | 1m 26s\n \n\n99.99% Availability - Four 9s\n-----------------------------\n\n Duration | Acceptable downtime\n ------------- | -------------\n Downtime per year | 52min 9.8s\n Downtime per month | 4m 21s\n Downtime per week | 1m 0.5s\n Downtime per day | 8.6s\n \n\nAvailability in parallel vs in sequence\n---------------------------------------\n\nIf a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.\n\n### In sequence\n\nOverall availability decreases when two components with availability < 100% are in sequence:\n\n Availability (Total) = Availability (Foo) * Availability (Bar)\n \n\nIf both `Foo` and `Bar` each had 99.9% availability, their total availability in sequence would be 99.8%.\n\n### In parallel\n\nOverall availability increases when two components with availability < 100% are in parallel:\n\n Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))\n \n\nIf both `Foo` and `Bar` each had 99.9% availability, their total availability in parallel would be 99.9999%.\n\nTo learn more, visit the following links:", "links": [ { "title": "Availability in System Design", "url": "https://www.enjoyalgorithms.com/blog/availability-system-design-concept/", "type": "article" }, { "title": "Uptime calculator: How much downtime corresponds to 99.9 % uptime", "url": "https://uptime.is/", "type": "article" } ] }, "DOESIlBThd_wp2uOSd_CS": { "title": "Background Jobs", "description": "Background jobs in system design refer to tasks that are executed in the background, independently of the main execution flow of the system. These tasks are typically initiated by the system itself, rather than by a user or another external agent.\n\nBackground jobs can be used for a variety of purposes, such as:\n\n* Performing maintenance tasks: such as cleaning up old data, generating reports, or backing up the database.\n* Processing large volumes of data: such as data import, data export, or data transformation.\n* Sending notifications or messages: such as sending email notifications or push notifications to users.\n* Performing long-running computations: such as machine learning or data analysis.\n\nLearn more from the following links:", "links": [ { "title": "Background Jobs - Best Practices", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/background-jobs", "type": "article" } ] }, "NEsPjQifNDlZJE-2YLVl1": { "title": "Event-Driven", "description": "Event-driven invocation uses a trigger to start the background task. Examples of using event-driven triggers include:\n\n* The UI or another job places a message in a queue. The message contains data about an action that has taken place, such as the user placing an order. The background task listens on this queue and detects the arrival of a new message. It reads the message and uses the data in it as the input to the background job. This pattern is known as asynchronous message-based communication.\n* The UI or another job saves or updates a value in storage. The background task monitors the storage and detects changes. It reads the data and uses it as the input to the background job.\n* The UI or another job makes a request to an endpoint, such as an HTTPS URI, or an API that is exposed as a web service. It passes the data that is required to complete the background task as part of the request. The endpoint or web service invokes the background task, which uses the data as its input.\n\nLearn more from the following links:", "links": [ { "title": "Background Jobs - Event Driven Triggers", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/background-jobs#event-driven-triggers", "type": "article" } ] }, "zoViI4kzpKIxpU20T89K_": { "title": "Schedule Driven", "description": "Schedule-driven invocation uses a timer to start the background task. Examples of using schedule-driven triggers include:\n\n* A timer that is running locally within the application or as part of the application's operating system invokes a background task on a regular basis.\n* A timer that is running in a different application, such as Azure Logic Apps, sends a request to an API or web service on a regular basis. The API or web service invokes the background task.\n* A separate process or application starts a timer that causes the background task to be invoked once after a specified time delay, or at a specific time.\n\nTypical examples of tasks that are suited to schedule-driven invocation include batch-processing routines (such as updating related-products lists for users based on their recent behavior), routine data processing tasks (such as updating indexes or generating accumulated results), data analysis for daily reports, data retention cleanup, and data consistency checks.\n\nLearn more from the following links:", "links": [ { "title": "Schedule Driven - Background Jobs", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/background-jobs#schedule-driven-triggers", "type": "article" } ] }, "2gRIstNT-fTkv5GZ692gx": { "title": "Returning Results", "description": "Background jobs execute asynchronously in a separate process, or even in a separate location, from the UI or the process that invoked the background task. Ideally, background tasks are \"fire and forget\" operations, and their execution progress has no impact on the UI or the calling process. This means that the calling process does not wait for completion of the tasks. Therefore, it cannot automatically detect when the task ends.\n\nLearn more from the following links:", "links": [ { "title": "Returning Results - Background Jobs", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/background-jobs#returning-results", "type": "article" } ] }, "Uk6J8JRcKVEFz4_8rLfnQ": { "title": "Domain Name System", "description": "A Domain Name System (DNS) translates a domain name such as [www.example.com](http://www.example.com) to an IP address.\n\nDNS is hierarchical, with a few authoritative servers at the top level. Your router or ISP provides information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays. DNS results can also be cached by your browser or OS for a certain period of time, determined by the time to live (TTL).\n\n* NS record (name server) - Specifies the DNS servers for your domain/subdomain.\n* MX record (mail exchange) - Specifies the mail servers for accepting messages.\n* A record (address) - Points a name to an IP address.\n* CNAME (canonical) - Points a name to another name or CNAME ([example.com](http://example.com) to [www.example.com](http://www.example.com)) or to an A record.\n\nServices such as [CloudFlare](https://www.cloudflare.com/dns/) and [Route53](https://aws.amazon.com/route53/) provide managed DNS services. Some DNS services can route traffic through various methods:\n\n* [@article@Weighted Round Robin](https://www.jscape.com/blog/load-balancing-algorithms)\n * Prevent traffic from going to servers under maintenance\n * Balance between varying cluster sizes\n * A/B testing\n* [@article@Latency Based](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-latency)\n* [@article@Geolocation Based](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy.html#routing-policy-geo)\n\nTo learn more, visit the following links:", "links": [ { "title": "Getting started with Domain Name System", "url": "https://github.com/donnemartin/system-design-primer#domain-name-system", "type": "opensource" }, { "title": "What is DNS?", "url": "https://www.cloudflare.com/learning/dns/what-is-dns/", "type": "article" } ] }, "O730v5Ww3ByAiBSs6fwyM": { "title": "Content Delivery Networks", "description": "A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon's CloudFront support dynamic content. The site's DNS resolution will tell clients which server to contact.\n\nServing content from CDNs can significantly improve performance in two ways:\n\n* Users receive content from data centers close to them\n* Your servers do not have to serve requests that the CDN fulfills\n\nLearn more about CDNs from the following links:", "links": [ { "title": "Introduction to CDNs", "url": "https://github.com/donnemartin/system-design-primer#content-delivery-network", "type": "opensource" }, { "title": "The Differences Between Push And Pull CDNs", "url": "http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/", "type": "article" }, { "title": "Brief about Content delivery network", "url": "https://en.wikipedia.org/wiki/Content_delivery_network", "type": "article" } ] }, "uIerrf_oziiLg-KEyz8WM": { "title": "Push CDNs", "description": "Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.\n\nSites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.\n\nTo learn more, visit the following links:", "links": [ { "title": "Introduction to CDNs", "url": "https://github.com/donnemartin/system-design-primer#content-delivery-network", "type": "opensource" } ] }, "HkXiEMLqxJoQyAHav3ccL": { "title": "Pull CDNs", "description": "Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.\n\nA time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed. Sites with heavy traffic work well with pull CDNs, as traffic is spread out more evenly with only recently-requested content remaining on the CDN.\n\nTo learn more, visit the following links:", "links": [ { "title": "Introduction to CDNs", "url": "https://github.com/donnemartin/system-design-primer#content-delivery-network", "type": "opensource" }, { "title": "The Differences Between Push And Pull CDNs", "url": "http://www.travelblogadvice.com/technical/the-differences-between-push-and-pull-cdns/", "type": "article" } ] }, "14KqLKgh090Rb3MDwelWY": { "title": "Load Balancers", "description": "Load balancers distribute incoming client requests to computing resources such as application servers and databases. In each case, the load balancer returns the response from the computing resource to the appropriate client. Load balancers are effective at:\n\n* Preventing requests from going to unhealthy servers\n* Preventing overloading resources\n* Helping to eliminate a single point of failure\n\nLoad balancers can be implemented with hardware (expensive) or with software such as HAProxy. Additional benefits include:\n\n* **SSL termination** - Decrypt incoming requests and encrypt server responses so backend servers do not have to perform these potentially expensive operations\n * Removes the need to install X.509 certificates on each server\n* **Session persistence** - Issue cookies and route a specific client's requests to same instance if the web apps do not keep track of sessions\n\nDisadvantages of load balancer\n------------------------------\n\n* The load balancer can become a performance bottleneck if it does not have enough resources or if it is not configured properly.\n* Introducing a load balancer to help eliminate a single point of failure results in increased complexity.\n* A single load balancer is a single point of failure, configuring multiple load balancers further increases complexity.\n\nTo learn more, visit the following links:", "links": [ { "title": "Scalability", "url": "https://cs.fyi/guide/scalability-for-dummies", "type": "article" }, { "title": "NGINX Architecture", "url": "https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/", "type": "article" }, { "title": "HAProxy Architecture Guide", "url": "http://www.haproxy.org/download/1.2/doc/architecture.txt", "type": "article" } ] }, "ocdcbhHrwjJX0KWgmsOL6": { "title": "LB vs Reverse Proxy", "description": "* Deploying a load balancer is useful when you have multiple servers. Often, load balancers route traffic to a set of servers serving the same function.\n* Reverse proxies can be useful even with just one web server or application server, opening up the benefits described in the previous section.\n* Solutions such as NGINX and HAProxy can support both layer 7 reverse proxying and load balancing.\n\nDisadvantages of Reverse Proxy:\n-------------------------------\n\n* Introducing a reverse proxy results in increased complexity.\n* A single reverse proxy is a single point of failure, configuring multiple reverse proxies (ie a failover) further increases complexity.\n\nTo learn more visit the following links:", "links": [ { "title": "Reverse Proxy vs Load Balancer", "url": "https://www.nginx.com/resources/glossary/reverse-proxy-vs-load-balancer/", "type": "article" }, { "title": "NGINX Architecture", "url": "https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/", "type": "article" }, { "title": "HAProxy Architecture Guide", "url": "http://www.haproxy.org/download/1.2/doc/architecture.txt", "type": "article" }, { "title": "Reverse Proxy", "url": "https://en.wikipedia.org/wiki/Reverse_proxy", "type": "article" } ] }, "urSjLyLTE5IIz0TFxMBWL": { "title": "Load Balancing Algorithms", "description": "A load balancer is a software or hardware device that keeps any one server from becoming overloaded. A load balancing algorithm is the logic that a load balancer uses to distribute network traffic between servers (an algorithm is a set of predefined rules).\n\nThere are two primary approaches to load balancing. Dynamic load balancing uses algorithms that take into account the current state of each server and distribute traffic accordingly. Static load balancing distributes traffic without making these adjustments. Some static algorithms send an equal amount of traffic to each server in a group, either in a specified order or at random.\n\nTo learn more, visit the following links:", "links": [ { "title": "Types of Load Balancing Algorithms", "url": "https://www.cloudflare.com/learning/performance/types-of-load-balancing-algorithms/", "type": "article" }, { "title": "Explore top posts about Algorithms", "url": "https://app.daily.dev/tags/algorithms?ref=roadmapsh", "type": "article" } ] }, "e69-JVbDj7dqV_p1j1kML": { "title": "Layer 7 Load Balancing", "description": "Layer 7 load balancers look at the application layer to decide how to distribute requests. This can involve contents of the header, message, and cookies. Layer 7 load balancers terminate network traffic, reads the message, makes a load-balancing decision, then opens a connection to the selected server. For example, a layer 7 load balancer can direct video traffic to servers that host videos while directing more sensitive user billing traffic to security-hardened servers.\n\nAt the cost of flexibility, layer 4 load balancing requires less time and computing resources than Layer 7, although the performance impact can be minimal on modern commodity hardware.", "links": [] }, "MpM9rT1-_LGD7YbnBjqOk": { "title": "Layer 4 Load Balancing", "description": "Layer 4 load balancers look at info at the transport layer to decide how to distribute requests. Generally, this involves the source, destination IP addresses, and ports in the header, but not the contents of the packet. Layer 4 load balancers forward network packets to and from the upstream server, performing Network Address Translation (NAT).", "links": [] }, "IkUCfSWNY-02wg2WCo1c6": { "title": "Horizontal Scaling", "description": "Load balancers can also help with horizontal scaling, improving performance and availability. Scaling out using commodity machines is more cost efficient and results in higher availability than scaling up a single server on more expensive hardware, called Vertical Scaling. It is also easier to hire for talent working on commodity hardware than it is for specialized enterprise systems.\n\nDisadvantages of horizontal scaling\n-----------------------------------\n\n* Scaling horizontally introduces complexity and involves cloning servers\n * Servers should be stateless: they should not contain any user-related data like sessions or profile pictures\n * Sessions can be stored in a centralized data store such as a database (SQL, NoSQL) or a persistent cache (Redis, Memcached)\n* Downstream servers such as caches and databases need to handle more simultaneous connections as upstream servers scale out.", "links": [] }, "XXuzTrP5UNVwSpAk-tAGr": { "title": "Application Layer", "description": "Separating out the web layer from the application layer (also known as platform layer) allows you to scale and configure both layers independently. Adding a new API results in adding application servers without necessarily adding additional web servers. The single responsibility principle advocates for small and autonomous services that work together. Small teams with small services can plan more aggressively for rapid growth.\n\n![](https://i.imgur.com/F0cjurv.png)\n\nDisadvantages\n-------------\n\n* Adding an application layer with loosely coupled services requires a different approach from an architectural, operations, and process viewpoint (vs a monolithic system).\n* Microservices can add complexity in terms of deployments and operations.\n\nFor more resources, visit the following links:", "links": [ { "title": "Intro to architecting systems for scale", "url": "http://lethain.com/introduction-to-architecting-systems-for-scale/#platform_layer", "type": "article" } ] }, "UKTiaHCzYXnrNw31lHriv": { "title": "Microservices", "description": "Related to the \"Application Layer\" discussion are microservices, which can be described as a suite of independently deployable, small, modular services. Each service runs a unique process and communicates through a well-defined, lightweight mechanism to serve a business goal. 1\n\nPinterest, for example, could have the following microservices: user profile, follower, feed, search, photo upload, etc.\n\nTo learn more, visit the following links:", "links": [ { "title": "Introduction to Microservices", "url": "https://aws.amazon.com/microservices/", "type": "article" }, { "title": "Microservices - Wikipedia", "url": "https://en.wikipedia.org/wiki/Microservices", "type": "article" }, { "title": "Microservices", "url": "https://martinfowler.com/articles/microservices.html", "type": "article" }, { "title": "Explore top posts about Microservices", "url": "https://app.daily.dev/tags/microservices?ref=roadmapsh", "type": "article" } ] }, "Nt0HUWLOl4O77elF8Is1S": { "title": "Service Discovery", "description": "Systems such as [Consul](https://www.consul.io/docs/index.html), [Etcd](https://coreos.com/etcd/docs/latest), and [Zookeeper](http://www.slideshare.net/sauravhaloi/introduction-to-apache-zookeeper) can help services find each other by keeping track of registered names, addresses, and ports. [Health checks](https://www.consul.io/intro/getting-started/checks.html) help verify service integrity and are often done using an HTTP endpoint. Both Consul and Etcd have a built in key-value store that can be useful for storing config values and other shared data.\n\nVisit the following links to learn more:", "links": [ { "title": "Intro to Service Discovery", "url": "https://github.com/donnemartin/system-design-primer#Service-Discovery", "type": "opensource" }, { "title": "What is Service-oriented architecture?", "url": "https://en.wikipedia.org/wiki/Service-oriented_architecture", "type": "article" }, { "title": "Explore top posts about Architecture", "url": "https://app.daily.dev/tags/architecture?ref=roadmapsh", "type": "article" } ] }, "5FXwwRMNBhG7LT5ub6t2L": { "title": "Databases", "description": "Picking the right database for a system is an important decision, as it can have a significant impact on the performance, scalability, and overall success of the system. Some of the key reasons why it's important to pick the right database include:\n\n* Performance: Different databases have different performance characteristics, and choosing the wrong one can lead to poor performance and slow response times.\n* Scalability: As the system grows and the volume of data increases, the database needs to be able to scale accordingly. Some databases are better suited for handling large amounts of data than others.\n* Data Modeling: Different databases have different data modeling capabilities and choosing the right one can help to keep the data consistent and organized.\n* Data Integrity: Different databases have different capabilities for maintaining data integrity, such as enforcing constraints, and can have different levels of data security.\n* Support and maintenance: Some databases have more active communities and better documentation, making it easier to find help and resources.\n\nOverall, by choosing the right database, you can ensure that your system will perform well, scale as needed, and be maintainable in the long run.", "links": [ { "title": "Explore top posts about Backend Development", "url": "https://app.daily.dev/tags/backend?ref=roadmapsh", "type": "article" }, { "title": "Scaling up to your first 10 million users", "url": "https://www.youtube.com/watch?v=kKjm4ehYiMs", "type": "video" } ] }, "KLnpMR2FxlQkCHZP6-tZm": { "title": "SQL vs NoSQL", "description": "SQL databases, such as MySQL and PostgreSQL, are best suited for structured, relational data and use a fixed schema. They provide robust ACID (Atomicity, Consistency, Isolation, Durability) transactions and support complex queries and joins.\n\nNoSQL databases, such as MongoDB and Cassandra, are best suited for unstructured, non-relational data and use a flexible schema. They provide high scalability and performance for large amounts of data and are often used in big data and real-time web applications.\n\nThe choice between SQL and NoSQL depends on the specific use case and requirements of the project. If you need to store and query structured data with complex relationships, an SQL database is likely a better choice. If you need to store and query large amounts of unstructured data with high scalability and performance, a NoSQL database may be a better choice.\n\nLearn more from the following links:", "links": [ { "title": "SQL vs NoSQL: The Differences", "url": "https://www.sitepoint.com/sql-vs-nosql-differences/", "type": "article" }, { "title": "SQL vs. NoSQL Databases: What’s the Difference?", "url": "https://www.ibm.com/blog/sql-vs-nosql/", "type": "article" }, { "title": "NoSQL vs. SQL Databases", "url": "https://www.mongodb.com/nosql-explained/nosql-vs-sql", "type": "article" }, { "title": "Explore top posts about NoSQL", "url": "https://app.daily.dev/tags/nosql?ref=roadmapsh", "type": "article" } ] }, "dc-aIbBwUdlwgwQKGrq49": { "title": "Replication", "description": "Replication is the process of copying data from one database to another. Replication is used to increase availability and scalability of databases. There are two types of replication: master-slave and master-master.\n\nMaster-slave Replication:\n-------------------------\n\nThe master serves reads and writes, replicating writes to one or more slaves, which serve only reads. Slaves can also replicate to additional slaves in a tree-like fashion. If the master goes offline, the system can continue to operate in read-only mode until a slave is promoted to a master or a new master is provisioned.\n\nMaster-master Replication:\n--------------------------\n\nBoth masters serve reads and writes and coordinate with each other on writes. If either master goes down, the system can continue to operate with both reads and writes.", "links": [] }, "FX6dcV_93zOfbZMdM_-li": { "title": "Sharding", "description": "Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.\n\nSimilar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you'll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.\n\nLearn more from the following links:", "links": [ { "title": "The coming of the Shard", "url": "http://highscalability.com/blog/2009/8/6/an-unorthodox-approach-to-database-design-the-coming-of-the.html", "type": "article" }, { "title": "Shard (database architecture)", "url": "https://en.wikipedia.org/wiki/Shard_(database_architecture)", "type": "article" }, { "title": "Explore top posts about Backend Development", "url": "https://app.daily.dev/tags/backend?ref=roadmapsh", "type": "article" } ] }, "DGmVRI7oWdSOeIUn_g0rI": { "title": "Federation", "description": "Federation (or functional partitioning) splits up databases by function. For example, instead of a single, monolithic database, you could have three databases: forums, users, and products, resulting in less read and write traffic to each database and therefore less replication lag. Smaller databases result in more data that can fit in memory, which in turn results in more cache hits due to improved cache locality. With no single central master serializing writes you can write in parallel, increasing throughput.", "links": [] }, "Zp9D4--DgtlAjE2nIfaO_": { "title": "Denormalization", "description": "Denormalization attempts to improve read performance at the expense of some write performance. Redundant copies of the data are written in multiple tables to avoid expensive joins. Some RDBMS such as PostgreSQL and Oracle support materialized views which handle the work of storing redundant information and keeping redundant copies consistent.\n\nOnce data becomes distributed with techniques such as federation and sharding, managing joins across data centers further increases complexity. Denormalization might circumvent the need for such complex joins.\n\nTo learn more, visit the following links:", "links": [ { "title": "Denormalization", "url": "https://en.wikipedia.org/wiki/Denormalization", "type": "article" } ] }, "fY8zgbB13wxZ1CFtMSdZZ": { "title": "SQL Tuning", "description": "SQL tuning is a broad topic and many books have been written as reference. It's important to benchmark and profile to simulate and uncover bottlenecks.\n\n* Benchmark - Simulate high-load situations with tools such as ab.\n* Profile - Enable tools such as the slow query log to help track performance issues.\n\nBenchmarking and profiling might point you to the following optimizations.\n\nTo learn more, visit the following links:", "links": [ { "title": "Optimizing MySQL Queries", "url": "https://aiddroid.com/10-tips-optimizing-mysql-queries-dont-suck/", "type": "article" }, { "title": "How we optimized PostgreSQL queries 100x", "url": "https://towardsdatascience.com/how-we-optimized-postgresql-queries-100x-ff52555eabe?gi=13caf5bcf32e", "type": "article" }, { "title": "Explore top posts about SQL", "url": "https://app.daily.dev/tags/sql?ref=roadmapsh", "type": "article" } ] }, "KFtdmmce4bRkDyvFXZzLN": { "title": "Key-Value Store", "description": "A key-value store generally allows for `O(1)` reads and writes and is often backed by memory or SSD. Data stores can maintain keys in lexicographic order, allowing efficient retrieval of key ranges. Key-value stores can allow for storing of metadata with a value.\n\nKey-value stores provide high performance and are often used for simple data models or for rapidly-changing data, such as an in-memory cache layer. Since they offer only a limited set of operations, complexity is shifted to the application layer if additional operations are needed.\n\nTo learn more, visit the following links:", "links": [ { "title": "Key–value database", "url": "https://en.wikipedia.org/wiki/Key%E2%80%93value_database", "type": "article" }, { "title": "What are the disadvantages of using a key/value table?", "url": "https://stackoverflow.com/questions/4056093/what-are-the-disadvantages-of-using-a-key-value-table-over-nullable-columns-or", "type": "article" } ] }, "didEznSlVHqqlijtyOSr3": { "title": "Document Store", "description": "A document store is centered around documents (XML, JSON, binary, etc), where a document stores all information for a given object. Document stores provide APIs or a query language to query based on the internal structure of the document itself. Note, many key-value stores include features for working with a value's metadata, blurring the lines between these two storage types.\n\nBased on the underlying implementation, documents are organized by collections, tags, metadata, or directories. Although documents can be organized or grouped together, documents may have fields that are completely different from each other.\n\nTo learn more, visit the following links:", "links": [ { "title": "Document-oriented database", "url": "https://en.wikipedia.org/wiki/Document-oriented_database", "type": "article" } ] }, "WHq1AdISkcgthaugE9uY7": { "title": "Wide Column Store", "description": "A wide column store's basic unit of data is a column (name/value pair). A column can be grouped in column families (analogous to a SQL table). Super column families further group column families. You can access each column independently with a row key, and columns with the same row key form a row. Each value contains a timestamp for versioning and for conflict resolution.\n\nGoogle introduced Bigtable as the first wide column store, which influenced the open-source HBase often-used in the Hadoop ecosystem, and Cassandra from Facebook. Stores such as BigTable, HBase, and Cassandra maintain keys in lexicographic order, allowing efficient retrieval of selective key ranges.\n\nLearn more from the following links:", "links": [ { "title": "Bigtable architecture", "url": "https://www.read.seas.harvard.edu/~kohler/class/cs239-w08/chang06bigtable.pdf", "type": "article" } ] }, "6RLgnL8qLBzYkllHeaI-Z": { "title": "Graph Databases", "description": "In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.\n\nGraphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.\n\nLearn more from the following links:", "links": [ { "title": "Graph database", "url": "https://en.wikipedia.org/wiki/Graph_database", "type": "article" }, { "title": "Explore top posts about Backend Development", "url": "https://app.daily.dev/tags/backend?ref=roadmapsh", "type": "article" }, { "title": "Introduction to NoSQL", "url": "https://www.youtube.com/watch?v=qI_g07C_Q5I", "type": "video" } ] }, "-X4g8kljgVBOBcf1DDzgi": { "title": "Caching", "description": "Caching is the process of storing frequently accessed data in a temporary storage location, called a cache, in order to quickly retrieve it without the need to query the original data source. This can improve the performance of an application by reducing the number of times a data source must be accessed.\n\nThere are several caching strategies:\n\n* Refresh Ahead\n* Write-Behind\n* Write-through\n* Cache Aside\n\nAlso, you can have the cache in several places, examples include:\n\n* Client Caching\n* CDN Caching\n* Web Server Caching\n* Database Caching\n* Application Caching\n\nTo learn more, visit the following links:", "links": [ { "title": "Caching Strategies", "url": "https://medium.com/@mmoshikoo/cache-strategies-996e91c80303", "type": "article" } ] }, "Bgqgl67FK56ioLNFivIsc": { "title": "Refresh Ahead", "description": "You can configure the cache to automatically refresh any recently accessed cache entry prior to its expiration.\n\nRefresh-ahead can result in reduced latency vs read-through if the cache can accurately predict which items are likely to be needed in the future.\n\nDisadvantage of refresh-ahead:\n------------------------------\n\n* Not accurately predicting which items are likely to be needed in the future can result in reduced performance than without refresh-ahead.\n\n![](https://i.imgur.com/sBXb7lb.png)\n\nTo learn more, visit the following links:", "links": [ { "title": "From cache to in-memory data grid", "url": "http://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast", "type": "article" } ] }, "vNndJ-MWetcbaF2d-3-JP": { "title": "Write-behind", "description": "In write-behind, the application does the following:\n\n* Add/update entry in cache\n* Asynchronously write entry to the data store, improving write performance\n\nDisadvantages of write-behind:\n------------------------------\n\n* There could be data loss if the cache goes down prior to its contents hitting the data store.\n* It is more complex to implement write-behind than it is to implement cache-aside or write-through.\n\n![Scalability, availability, stability, patterns](https://i.imgur.com/XDsb7RS.png)\n\nTo learn more, visit the following links:", "links": [ { "title": "Scalability, availability, stability, patterns", "url": "http://www.slideshare.net/jboner/scalability-availability-stability-patterns/", "type": "article" } ] }, "RNITLR1FUQWkRbSBXTD_z": { "title": "Write-through", "description": "The application uses the cache as the main data store, reading and writing data to it, while the cache is responsible for reading and writing to the database:\n\n* Application adds/updates entry in cache\n* Cache synchronously writes entry to data store\n* Return\n\nApplication code:\n\n set_user(12345, {\"foo\":\"bar\"})\n \n\nCache code:\n\n def set_user(user_id, values):\n user = db.query(\"UPDATE Users WHERE id = {0}\", user_id, values)\n cache.set(user_id, user)\n \n\nWrite-through is a slow overall operation due to the write operation, but subsequent reads of just written data are fast. Users are generally more tolerant of latency when updating data than reading data. Data in the cache is not stale.\n\nDisadvantages\n-------------\n\n* When a new node is created due to failure or scaling, the new node will not cache entries until the entry is updated in the database. Cache-aside in conjunction with write through can mitigate this issue.\n* Most data written might never be read, which can be minimized with a TTL.\n\n![Write through](https://i.imgur.com/Ujf0awN.png)\n\nHave a look at the following resources to learn more:", "links": [ { "title": "Scalability, availability, stability, patterns", "url": "http://www.slideshare.net/jboner/scalability-availability-stability-patterns/", "type": "article" } ] }, "bffJlvoLHFldS0CluWifP": { "title": "Cache Aside", "description": "The application is responsible for reading and writing from storage. The cache does not interact with storage directly. The application does the following:\n\n* Look for entry in cache, resulting in a cache miss\n* Load entry from the database\n* Add entry to cache\n* Return entry\n\n def get_user(self, user_id):\n user = cache.get(\"user.{0}\", user_id)\n if user is None:\n user = db.query(\"SELECT * FROM users WHERE user_id = {0}\", user_id)\n if user is not None:\n key = \"user.{0}\".format(user_id)\n cache.set(key, json.dumps(user))\n return user\n \n\n[Memcached](https://memcached.org/) is generally used in this manner. Subsequent reads of data added to cache are fast. Cache-aside is also referred to as lazy loading. Only requested data is cached, which avoids filling up the cache with data that isn't requested.\n\n![Cache Aside](https://i.imgur.com/Ujf0awN.png)\n\nTo learn more, have a look at the following resources:", "links": [ { "title": "From cache to in-memory data grid", "url": "https://www.slideshare.net/tmatyashovsky/from-cache-to-in-memory-data-grid-introduction-to-hazelcast", "type": "article" } ] }, "RHNRb6QWiGvCK3KQOPK3u": { "title": "Client Caching", "description": "Client-side caching refers to the practice of storing frequently accessed data on the client's device rather than the server. This type of caching can help improve the performance of an application by reducing the number of times the client needs to request data from the server.\n\nOne common example of client-side caching is web browsers caching frequently accessed web pages and resources. When a user visits a web page, the browser stores a copy of the page and its resources (such as images, stylesheets, and scripts) in the browser's cache. If the user visits the same page again, the browser can retrieve the cached version of the page and its resources instead of requesting them from the server, which can reduce the load time of the page.\n\nAnother example of client-side caching is application-level caching. Some applications, such as mobile apps, can cache data on the client's device to improve performance and reduce the amount of data that needs to be transferred over the network.\n\nClient side caching has some advantages like reducing server load, faster page load times, and reducing network traffic. However, it also has some drawbacks like the potential for stale data if the client-side cache is not properly managed, or consuming memory or disk space on the client's device.\n\nLearn more from the following links:", "links": [ { "title": "HTTP Caching", "url": "https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching", "type": "article" } ] }, "Kisvxlrjb7XnKFCOdxRtb": { "title": "CDN Caching", "description": "A Content Delivery Network (CDN) is a distributed network of servers that are strategically placed in various locations around the world. The main purpose of a CDN is to serve content to end-users with high availability and high performance by caching frequently accessed content on servers that are closer to the end-users.\n\nWhen a user requests content from a website that is using a CDN, the CDN will first check if the requested content is available in the cache of a nearby server. If the content is found in the cache, it is served to the user from the nearby server. If the content is not found in the cache, it is requested from the origin server (the original source of the content) and then cached on the nearby server for future requests.\n\nCDN caching can significantly improve the performance and availability of a website by reducing the distance that data needs to travel, reducing the load on the origin server, and allowing for faster delivery of content to end-users.", "links": [] }, "o532nPnL-d2vXJn9k6vMl": { "title": "Web Server Caching", "description": "[Reverse proxies](https://github.com/donnemartin/system-design-primer#reverse-proxy-web-server) and caches such as [Varnish](https://www.varnish-cache.org/) can serve static and dynamic content directly. Web servers can also cache requests, returning responses without having to contact application servers.", "links": [] }, "BeIg4jzbij2cwc_a_VpYG": { "title": "Database Caching", "description": "Your database usually includes some level of caching in a default configuration, optimized for a generic use case. Tweaking these settings for specific usage patterns can further boost performance. it's like having a quick-access memory for frequently used data in applications. Here's a simplified explanation:\n\n1. **Quick Access**: Imagine you're looking up information in a big library (the database). Instead of going to the library every time you need the same book (data), you keep a copy of it on your desk (cache).\n \n2. **Faster Retrieval**: When you need that book again, you first check your desk (cache). If it's there, great! You get it right away without going to the library (database) again.\n \n3. **Saving Time**: If the book isn't on your desk (cache miss), you go to the library (database) to get it. But you make sure to put a copy on your desk for next time, so you won't have to go to the library again if you need it soon.\n \n4. **Different Types**: There are different ways to do this caching. You can cache the results of searches (like bookmarking), whole pieces of information (like keeping a paper copy), or even entire web pages (like saving a snapshot).\n \n5. **Benefits**: By keeping frequently used data close by, you save time and reduce the strain on the library (database). It's like having your most-used books right at your fingertips, making your work faster and more efficient.\n \n\nHowever, it's important to keep the cached data up to date. Otherwise, you might end up with outdated information, like using an old edition of a book instead of the latest one. So, managing this cache properly is key to keeping things running smoothly.", "links": [] }, "5Ux_JBDOkflCaIm4tVBgO": { "title": "Application Caching", "description": "In-memory caches such as Memcached and Redis are key-value stores between your application and your data storage. Since the data is held in RAM, it is much faster than typical databases where data is stored on disk. RAM is more limited than disk, so [cache invalidation](https://en.wikipedia.org/wiki/Cache_algorithms) algorithms such as [least recently used (LRU)](https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_\\(LRU\\)) can help invalidate 'cold' entries and keep 'hot' data in RAM.\n\nRedis has the following additional features:\n\n* Persistence option\n* Built-in data structures such as sorted sets and lists\n\nGenerally, you should try to avoid file-based caching, as it makes cloning and auto-scaling more difficult.\n\nVisit the following links to learn more:", "links": [ { "title": "Intro to Application Caching", "url": "https://github.com/donnemartin/system-design-primer#application-caching", "type": "opensource" } ] }, "84N4XY31PwXRntXX1sdCU": { "title": "Asynchronism", "description": "Asynchronous workflows help reduce request times for expensive operations that would otherwise be performed in-line. They can also help by doing time-consuming work in advance, such as periodic aggregation of data.\n\nTo learn more, visit the following links:", "links": [ { "title": "Asynchronous Thinking for Microservice System Design", "url": "https://www.datamachines.io/blog/asynchronous-thinking-for-microservice-system-design", "type": "article" }, { "title": "Patterns for microservices - Sync vs Async", "url": "https://medium.com/inspiredbrilliance/patterns-for-microservices-e57a2d71ff9e", "type": "article" }, { "title": "Applying back pressure when overloaded", "url": "http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html", "type": "article" }, { "title": "Little's law", "url": "https://en.wikipedia.org/wiki/Little%27s_law", "type": "article" }, { "title": "What is the difference between a message queue and a task queue?", "url": "https://www.quora.com/What-is-the-difference-between-a-message-queue-and-a-task-queue-Why-would-a-task-queue-require-a-message-broker-like-RabbitMQ-Redis-Celery-or-IronMQ-to-function", "type": "article" }, { "title": "It's all a numbers game", "url": "https://www.youtube.com/watch?v=1KRYH75wgy4", "type": "video" } ] }, "YiYRZFE_zwPMiCZxz9FnP": { "title": "Back Pressure", "description": "If queues start to grow significantly, the queue size can become larger than memory, resulting in cache misses, disk reads, and even slower performance. [Back pressure](http://mechanical-sympathy.blogspot.com/2012/05/apply-back-pressure-when-overloaded.html) can help by limiting the queue size, thereby maintaining a high throughput rate and good response times for jobs already in the queue. Once the queue fills up, clients get a server busy or HTTP 503 status code to try again later. Clients can retry the request at a later time, perhaps with [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff).", "links": [] }, "a9wGW_H1HpvvdYCXoS-Rf": { "title": "Task Queues", "description": "Tasks queues receive tasks and their related data, runs them, then delivers their results. They can support scheduling and can be used to run computationally-intensive jobs in the background.\n\n[Celery](https://docs.celeryproject.org/en/stable/) has support for scheduling and primarily has python support.\n\nTo learn more, visit the following links:", "links": [ { "title": "Celery - Distributed Task Queue", "url": "https://docs.celeryq.dev/en/stable/", "type": "article" } ] }, "37X1_9eCmkZkz5RDudE5N": { "title": "Message Queues", "description": "Message queues receive, hold, and deliver messages. If an operation is too slow to perform inline, you can use a message queue with the following workflow:\n\n* An application publishes a job to the queue, then notifies the user of job status\n* A worker picks up the job from the queue, processes it, then signals the job is complete\n\nThe user is not blocked and the job is processed in the background. During this time, the client might optionally do a small amount of processing to make it seem like the task has completed. For example, if posting a tweet, the tweet could be instantly posted to your timeline, but it could take some time before your tweet is actually delivered to all of your followers.\n\n* [@article@Redis](https://redis.io/) is useful as a simple message broker but messages can be lost.\n* [@article@RabbitMQ](https://www.rabbitmq.com/) is popular but requires you to adapt to the 'AMQP' protocol and manage your own nodes.\n* [@article@AWS SQS](https://aws.amazon.com/sqs/) is hosted but can have high latency and has the possibility of messages being delivered twice.\n* [@article@Apache Kafka](https://kafka.apache.org/) is a distributed event store and stream-processing platform.\n\nTo learn more, visit the following links:", "links": [ { "title": "What is Redis?", "url": "https://redis.io/", "type": "article" }, { "title": "RabbitMQ in Message Queues", "url": "https://www.rabbitmq.com/", "type": "article" }, { "title": "Overview of Amazon SQS", "url": "https://aws.amazon.com/sqs/", "type": "article" }, { "title": "Apache Kafka", "url": "https://kafka.apache.org/", "type": "article" }, { "title": "RabbitMQ for beginners", "url": "https://www.cloudamqp.com/blog/part1-rabbitmq-for-beginners-what-is-rabbitmq.html", "type": "article" } ] }, "3pRi8M4xQXsehkdfUNtYL": { "title": "Idempotent Operations", "description": "Idempotent operations are operations that can be applied multiple times without changing the result beyond the initial application. In other words, if an operation is idempotent, it will have the same effect whether it is executed once or multiple times.\n\nIt is also important to understand the benefits of [idempotent](https://en.wikipedia.org/wiki/Idempotence#Computer_science_meaning) operations, especially when using message or task queues that do not guarantee _exactly once_ processing. Many queueing systems guarantee _at least once_ message delivery or processing. These systems are not completely synchronized, for instance, across geographic regions, which simplifies some aspects of their implementation or design. Designing the operations that a task queue executes to be idempotent allows one to use a queueing system that has accepted this design trade-off.\n\nTo learn more, visit the following links:", "links": [ { "title": "What is an idempotent operation?", "url": "https://stackoverflow.com/questions/1077412/what-is-an-idempotent-operation", "type": "article" }, { "title": "Overview of Idempotent Operation", "url": "https://www.baeldung.com/cs/idempotent-operations", "type": "article" } ] }, "uQFzD_ryd-8Dr1ppjorYJ": { "title": "Communication", "description": "Network protocols are a key part of systems today, as no system can exist in isolation - they all need to communicate with each other. You should learn about the networking protocols such as HTTP, TCP, UDP. Also, learn about the architectural styles such as RPC, REST, GraphQL and gRPC.", "links": [] }, "I_nR6EwjNXSG7_hw-_VhX": { "title": "HTTP", "description": "HTTP is a method for encoding and transporting data between a client and a server. It is a request/response protocol: clients issue requests and servers issue responses with relevant content and completion status info about the request. HTTP is self-contained, allowing requests and responses to flow through many intermediate routers and servers that perform load balancing, caching, encryption, and compression.\n\nA basic HTTP request consists of a verb (method) and a resource (endpoint). Below are common HTTP verbs:\n\n Verb | Description | Idempotent* | Safe | Cacheable |\n -------|-------------------------------|-------------|------|-----------------------------------------|\n GET | Reads a resource | Yes | Yes | Yes |\n POST | Creates a resource or trigger | No | No | Yes if response contains freshness info |\n PUT | Creates or replace a resource | Yes | No | No |\n PATCH | Partially updates a resource | No | No | Yes if response contains freshness info |\n DELETE | Deletes a resource | Yes | No | No |\n \n \n\nHTTP is an application layer protocol relying on lower-level protocols such as TCP and UDP.", "links": [ { "title": "Everything you need to know about HTTP", "url": "https://cs.fyi/guide/http-in-depth", "type": "article" }, { "title": "What Is HTTP?", "url": "https://www.nginx.com/resources/glossary/http/", "type": "article" }, { "title": "What is the difference between HTTP protocol and TCP protocol?", "url": "https://www.quora.com/What-is-the-difference-between-HTTP-protocol-and-TCP-protocol", "type": "article" } ] }, "2nF5uC6fYKbf0RFgGNHiP": { "title": "TCP", "description": "TCP is a connection-oriented protocol over an [IP network](https://en.wikipedia.org/wiki/Internet_Protocol). Connection is established and terminated using a [handshake](https://en.wikipedia.org/wiki/Handshaking). All packets sent are guaranteed to reach the destination in the original order and without corruption through:\n\n* Sequence numbers and [checksum fields](https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Checksum_computation) for each packet\n* [@article@Acknowledgement](https://en.wikipedia.org/wiki/Acknowledgement_\\(data_networks\\)) packets and automatic retransmission\n\nIf the sender does not receive a correct response, it will resend the packets. If there are multiple timeouts, the connection is dropped. TCP also implements [flow control](https://en.wikipedia.org/wiki/Flow_control_\\(data\\)) and congestion control. These guarantees cause delays and generally result in less efficient transmission than UDP.\n\nTo ensure high throughput, web servers can keep a large number of TCP connections open, resulting in high memory usage. It can be expensive to have a large number of open connections between web server threads and say, a [memcached server](https://memcached.org/). [Connection pooling](https://en.wikipedia.org/wiki/Connection_pool) can help in addition to switching to UDP where applicable.\n\nTCP is useful for applications that require high reliability but are less time critical. Some examples include web servers, database info, SMTP, FTP, and SSH.\n\nUse TCP over UDP when:\n\n* You need all of the data to arrive intact\n* You want to automatically make a best estimate use of the network throughput\n\nTo learn more, visit the following links:", "links": [ { "title": "What Is TCP?", "url": "https://github.com/donnemartin/system-design-primer#transmission-control-protocol-tcp", "type": "opensource" }, { "title": "What is the difference between HTTP protocol and TCP protocol?", "url": "https://www.quora.com/What-is-the-difference-between-HTTP-protocol-and-TCP-protocol", "type": "article" }, { "title": "Networking for game programming", "url": "http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/", "type": "article" }, { "title": "Key differences between TCP and UDP protocols", "url": "http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/", "type": "article" }, { "title": "Difference between TCP and UDP", "url": "http://stackoverflow.com/questions/5970383/difference-between-tcp-and-udp", "type": "article" }, { "title": "Transmission control protocol", "url": "https://en.wikipedia.org/wiki/Transmission_Control_Protocol", "type": "article" }, { "title": "User datagram protocol", "url": "https://en.wikipedia.org/wiki/User_Datagram_Protocol", "type": "article" }, { "title": "Scaling memcache at Facebook", "url": "http://www.cs.bu.edu/~jappavoo/jappavoo.github.com/451/papers/memcache-fb.pdf", "type": "article" } ] }, "LC5aTmUKNiw9RuSUt3fSE": { "title": "UDP", "description": "UDP is connectionless. Datagrams (analogous to packets) are guaranteed only at the datagram level. Datagrams might reach their destination out of order or not at all. UDP does not support congestion control. Without the guarantees that TCP support, UDP is generally more efficient.\n\nUDP can broadcast, sending datagrams to all devices on the subnet. This is useful with DHCP because the client has not yet received an IP address, thus preventing a way for TCP to stream without the IP address.\n\nUDP is less reliable but works well in real time use cases such as VoIP, video chat, streaming, and realtime multiplayer games.\n\nUse UDP over TCP when:\n\n* You need the lowest latency\n* Late data is worse than loss of data\n* You want to implement your own error correction\n\nTo learn more, visit the following link:", "links": [ { "title": "Networking for game programming", "url": "http://gafferongames.com/networking-for-game-programmers/udp-vs-tcp/", "type": "article" }, { "title": "Key differences between TCP and UDP protocols", "url": "http://www.cyberciti.biz/faq/key-differences-between-tcp-and-udp-protocols/", "type": "article" }, { "title": "Difference between TCP and UDP", "url": "http://stackoverflow.com/questions/5970383/difference-between-tcp-and-udp", "type": "article" }, { "title": "Transmission control protocol", "url": "https://en.wikipedia.org/wiki/Transmission_Control_Protocol", "type": "article" }, { "title": "User datagram protocol", "url": "https://en.wikipedia.org/wiki/User_Datagram_Protocol", "type": "article" }, { "title": "Scaling memcache at Facebook", "url": "http://www.cs.bu.edu/~jappavoo/jappavoo.github.com/451/papers/memcache-fb.pdf", "type": "article" } ] }, "ixqucoAkgnphWYAFnsMe-": { "title": "RPC", "description": "In an RPC, a client causes a procedure to execute on a different address space, usually a remote server. The procedure is coded as if it were a local procedure call, abstracting away the details of how to communicate with the server from the client program. Remote calls are usually slower and less reliable than local calls so it is helpful to distinguish RPC calls from local calls. Popular RPC frameworks include [Protobuf](https://developers.google.com/protocol-buffers/), [Thrift](https://thrift.apache.org/), and [Avro](https://avro.apache.org/docs/current/).\n\nRPC is a request-response protocol:\n\n* Client program - Calls the client stub procedure. The parameters are pushed onto the stack like a local procedure call.\n* Client stub procedure - Marshals (packs) procedure id and arguments into a request message.\n* Client communication module - OS sends the message from the client to the server.\n* Server communication module - OS passes the incoming packets to the server stub procedure.\n* Server stub procedure - Unmarshalls the results, calls the server procedure matching the procedure id and passes the given arguments.\n* The server response repeats the steps above in reverse order.\n\nSample RPC calls:\n\n GET /someoperation?data=anId\n \n POST /anotheroperation\n {\n \"data\":\"anId\";\n \"anotherdata\": \"another value\"\n }\n \n\nRPC is focused on exposing behaviors. RPCs are often used for performance reasons with internal communications, as you can hand-craft native calls to better fit your use cases.\n\nDisadvantage of RPC\n-------------------\n\n* RPC clients become tightly coupled to the service implementation.\n* A new API must be defined for every new operation or use case.\n* It can be difficult to debug RPC.\n* You might not be able to leverage existing technologies out of the box. For example, it might require additional effort to ensure [RPC calls are properly cached](http://etherealbits.com/2012/12/debunking-the-myths-of-rpc-rest/) on caching servers such as [Squid](http://www.squid-cache.org/).\n\nTo learn more, visit the following links:", "links": [ { "title": "What Is RPC?", "url": "https://github.com/donnemartin/system-design-primer#remote-procedure-call-rpc", "type": "opensource" }, { "title": "Explore top posts about Backend Development", "url": "https://app.daily.dev/tags/backend?ref=roadmapsh", "type": "article" } ] }, "6-bgmfDTAQ9zABhpmVoHV": { "title": "REST", "description": "REST is an architectural style enforcing a client/server model where the client acts on a set of resources managed by the server. The server provides a representation of resources and actions that can either manipulate or get a new representation of resources. All communication must be stateless and cacheable.\n\nThere are four qualities of a RESTful interface:\n\n* Identify resources (URI in HTTP) - use the same URI regardless of any operation.\n* Change with representations (Verbs in HTTP) - use verbs, headers, and body.\n* Self-descriptive error message (status response in HTTP) - Use status codes, don't reinvent the wheel.\n* HATEOAS (HTML interface for HTTP) - your web service should be fully accessible in a browser.\n\nREST is focused on exposing data. It minimizes the coupling between client/server and is often used for public HTTP APIs. REST uses a more generic and uniform method of exposing resources through URIs, representation through headers, and actions through verbs such as GET, POST, PUT, DELETE, and PATCH. Being stateless, REST is great for horizontal scaling and partitioning.\n\nTo learn more, visit the following links:", "links": [ { "title": "What Is REST?", "url": "https://github.com/donnemartin/system-design-primer#representational-state-transfer-rest", "type": "opensource" }, { "title": "What are the drawbacks of using RESTful APIs?", "url": "https://www.quora.com/What-are-the-drawbacks-of-using-RESTful-APIs", "type": "article" }, { "title": "Explore top posts about REST API", "url": "https://app.daily.dev/tags/rest-api?ref=roadmapsh", "type": "article" } ] }, "Hw2v1rCYn24qxBhhmdc28": { "title": "gRPC", "description": "gRPC is a high-performance, open-source framework for building remote procedure call (RPC) APIs. It is based on the Protocol Buffers data serialization format and supports a variety of programming languages, including C#, Java, and Python.\n\nLearn more from the following links:", "links": [ { "title": "What Is gRPC?", "url": "https://www.wallarm.com/what/the-concept-of-grpc", "type": "article" }, { "title": "Explore top posts about gRPC", "url": "https://app.daily.dev/tags/grpc?ref=roadmapsh", "type": "article" } ] }, "jwv2g2Yeq-6Xv5zSd746R": { "title": "GraphQL", "description": "GraphQL is a query language and runtime for building APIs. It allows clients to define the structure of the data they need and the server will return exactly that. This is in contrast to traditional REST APIs, where the server exposes a fixed set of endpoints and the client must work with the data as it is returned.\n\nTo learn more, visit the following links:", "links": [ { "title": "GraphQL Server", "url": "https://www.howtographql.com/basics/3-big-picture/", "type": "article" }, { "title": "What is GraphQL?", "url": "https://www.redhat.com/en/topics/api/what-is-graphql", "type": "article" }, { "title": "Explore top posts about GraphQL", "url": "https://app.daily.dev/tags/graphql?ref=roadmapsh", "type": "article" } ] }, "p--uEm6klLx_hKxKJiXE5": { "title": "Performance Antipatterns", "description": "Performance antipatterns in system design refer to common mistakes or suboptimal practices that can lead to poor performance in a system. These patterns can occur at different levels of the system and can be caused by a variety of factors such as poor design, lack of optimization, or lack of understanding of the workload.\n\nSome of the examples of performance antipatterns include:\n\n* **N+1 queries:** This occurs when a system makes multiple queries to a database to retrieve related data, instead of using a single query to retrieve all the necessary data.\n* **Chatty interfaces:** This occurs when a system makes too many small and frequent requests to an external service or API, instead of making fewer, larger requests.\n* **Unbounded data:** This occurs when a system retrieves or processes more data than is necessary for the task at hand, leading to increased resource usage and reduced performance.\n* **Inefficient algorithms:** This occurs when a system uses an algorithm that is not well suited to the task at hand, leading to increased resource usage and reduced performance.\n\nLearn more from the following links:", "links": [ { "title": "Performance antipatterns for cloud applications", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/", "type": "article" }, { "title": "Explore top posts about Performance", "url": "https://app.daily.dev/tags/performance?ref=roadmapsh", "type": "article" } ] }, "hxiV2uF7tvhZKe4K-4fTn": { "title": "Busy Database", "description": "A busy database in system design refers to a database that is handling a high volume of requests or transactions, this can occur when a system is experiencing high traffic or when a database is not properly optimized for the workload it is handling. This can lead to Performance degradation, Increased resource utilization, Deadlocks and contention, Data inconsistencies. To address a busy database, a number of approaches can be taken such as Scaling out, Optimizing the schema, Caching, and Indexing.\n\nTo learn more, visit the following links:", "links": [ { "title": "Busy Database antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/busy-database/", "type": "article" }, { "title": "Explore top posts about Database", "url": "https://app.daily.dev/tags/database?ref=roadmapsh", "type": "article" } ] }, "i_2M3VloG-xTgWDWp4ngt": { "title": "Busy Frontend", "description": "Performing asynchronous work on a large number of background threads can starve other concurrent foreground tasks of resources, decreasing response times to unacceptable levels.\n\nResource-intensive tasks can increase the response times for user requests and cause high latency. One way to improve response times is to offload a resource-intensive task to a separate thread. This approach lets the application stay responsive while processing happens in the background. However, tasks that run on a background thread still consume resources. If there are too many of them, they can starve the threads that are handling requests.\n\nThis problem typically occurs when an application is developed as monolithic piece of code, with all of the business logic combined into a single tier shared with the presentation layer.\n\nTo learn more about this and how to fix this pattern, visit the following link:", "links": [ { "title": "Busy Front End antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/busy-front-end/", "type": "article" }, { "title": "Explore top posts about Frontend Development", "url": "https://app.daily.dev/tags/frontend?ref=roadmapsh", "type": "article" } ] }, "0IzQwuYi_E00bJwxDuw2B": { "title": "Chatty I/O", "description": "The cumulative effect of a large number of I/O requests can have a significant impact on performance and responsiveness.\n\nNetwork calls and other I/O operations are inherently slow compared to compute tasks. Each I/O request typically has significant overhead, and the cumulative effect of numerous I/O operations can slow down the system. Here are some common causes of chatty I/O.\n\n* Reading and writing individual records to a database as distinct requests\n* Implementing a single logical operation as a series of HTTP requests\n* Reading and writing to a file on disk\n\nTo learn more, visit the following links:", "links": [ { "title": "Chatty I/O antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/chatty-io/", "type": "article" } ] }, "6u3XmtJFWyJnyZUnJcGYb": { "title": "Extraneous Fetching", "description": "Extraneous fetching in system design refers to the practice of retrieving more data than is needed for a specific task or operation. This can occur when a system is not optimized for the specific workload or when the system is not properly designed to handle the data requirements.\n\nExtraneous fetching can lead to a number of issues, such as:\n\n* Performance degradation\n* Increased resource utilization\n* Increased network traffic\n* Poor user experience\n\nVisit the following links to learn more:", "links": [ { "title": "Extraneous Fetching antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/extraneous-fetching/", "type": "article" } ] }, "lwMs4yiUHF3nQwcvauers": { "title": "Improper Instantiation", "description": "Improper instantiation in system design refers to the practice of creating unnecessary instances of an object, class or service, which can lead to performance and scalability issues. This can happen when the system is not properly designed, when the code is not written in an efficient way, or when the code is not optimized for the specific use case.\n\nLearn more from the following links:", "links": [ { "title": "Improper Instantiation antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/improper-instantiation/", "type": "article" } ] }, "p1QhCptnwzTGUXVMnz_Oz": { "title": "Monolithic Persistence", "description": "Monolithic Persistence refers to the use of a single, monolithic database to store all of the data for an application or system. This approach can be used for simple, small-scale systems but as the system grows and evolves it can become a bottleneck, resulting in poor scalability, limited flexibility, and increased complexity. To address these limitations, a number of approaches can be taken such as Microservices, Sharding, and NoSQL databases.\n\nTo learn more, visit the following links:", "links": [ { "title": "Monolithic Persistence antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/monolithic-persistence/", "type": "article" } ] }, "klvHk1_e03Jarn5T46QNi": { "title": "No Caching", "description": "No caching antipattern occurs when a cloud application that handles many concurrent requests, repeatedly fetches the same data. This can reduce performance and scalability.\n\nWhen data is not cached, it can cause a number of undesirable behaviors, including:\n\n* Repeatedly fetching the same information from a resource that is expensive to access, in terms of I/O overhead or latency.\n* Repeatedly constructing the same objects or data structures for multiple requests.\n* Making excessive calls to a remote service that has a service quota and throttles clients past a certain limit.\n\nIn turn, these problems can lead to poor response times, increased contention in the data store, and poor scalability.", "links": [ { "title": "No Caching antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/no-caching/", "type": "article" } ] }, "r7uQxmurvfsYtTCieHqly": { "title": "Noisy Neighbor", "description": "Noisy neighbor refers to a situation in which one or more components of a system are utilizing a disproportionate amount of shared resources, leading to resource contention and reduced performance for other components. This can occur when a system is not properly designed or configured to handle the workload, or when a component is behaving unexpectedly.\n\nExamples of noisy neighbor scenarios include:\n\n* One user on a shared server utilizing a large amount of CPU or memory, leading to reduced performance for other users on the same server.\n* One process on a shared server utilizing a large amount of I/O, causing other processes to experience slow I/O and increased latency.\n* One application consuming a large amount of network bandwidth, causing other applications to experience reduced throughput.\n\nLearn from the following links:", "links": [ { "title": "Noisy Neighbor antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/noisy-neighbor/noisy-neighbor", "type": "article" } ] }, "LNmAJmh2ndFtOQIpvX_ga": { "title": "Retry Storm", "description": "Retry Storm refers to a situation in which a large number of retries are triggered in a short period of time, leading to a significant increase in traffic and resource usage. This can occur when a system is not properly designed to handle failures or when a component is behaving unexpectedly. This can lead to Performance degradation, Increased resource utilization, Increased network traffic, and Poor user experience. To address retry storms, a number of approaches can be taken such as Exponential backoff, Circuit breaking, and Monitoring and alerting.\n\nTo learn more, visit the following links:", "links": [ { "title": "Retry Storm antipattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/antipatterns/retry-storm/", "type": "article" }, { "title": "How To Avoid Retry Storms In Distributed Systems", "url": "https://faun.pub/how-to-avoid-retry-storms-in-distributed-systems-91bf34f43c7f", "type": "article" } ] }, "Ihnmxo_bVgZABDwg1QGGk": { "title": "Synchronous I/O", "description": "Blocking the calling thread while I/O completes can reduce performance and affect vertical scalability.\n\nA synchronous I/O operation blocks the calling thread while the I/O completes. The calling thread enters a wait state and is unable to perform useful work during this interval, wasting processing resources.\n\nCommon examples of I/O include:\n\n* Retrieving or persisting data to a database or any type of persistent storage.\n* Sending a request to a web service.\n* Posting a message or retrieving a message from a queue.\n* Writing to or reading from a local file.\n\nThis antipattern typically occurs because:", "links": [] }, "hDFYlGFYwcwWXLmrxodFX": { "title": "Monitoring", "description": "Distributed applications and services running in the cloud are, by their nature, complex pieces of software that comprise many moving parts. In a production environment, it's important to be able to track the way in which users use your system, trace resource utilization, and generally monitor the health and performance of your system. You can use this information as a diagnostic aid to detect and correct issues, and also to help spot potential problems and prevent them from occurring.\n\nVisit the following to learn more:", "links": [ { "title": "Monitoring and Diagnostics Guidance", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "hkjYvLoVt9xKDzubm0Jy3": { "title": "Health Monitoring", "description": "A system is healthy if it is running and capable of processing requests. The purpose of health monitoring is to generate a snapshot of the current health of the system so that you can verify that all components of the system are functioning as expected.\n\nLearn more from the following:", "links": [ { "title": "Health Monitoring of a System", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#health-monitoring", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "rVrwaioGURvrqNBufF2dj": { "title": "Availability Monitoring", "description": "A truly healthy system requires that the components and subsystems that compose the system are available. Availability monitoring is closely related to health monitoring. But whereas health monitoring provides an immediate view of the current health of the system, availability monitoring is concerned with tracking the availability of the system and its components to generate statistics about the uptime of the system.\n\nLearn more from the following:", "links": [ { "title": "Availability Monitoring", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#availability-monitoring", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "x1i3qWFtNNjd00-kAvFHw": { "title": "Performance Monitoring", "description": "As the system is placed under more and more stress (by increasing the volume of users), the size of the datasets that these users access grows and the possibility of failure of one or more components becomes more likely. Frequently, component failure is preceded by a decrease in performance. If you're able detect such a decrease, you can take proactive steps to remedy the situation.\n\nLearn more from following links:", "links": [ { "title": "Performance Monitoring", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#performance-monitoring", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "I_NfmDcBph8-oyFVFTknL": { "title": "Security Monitoring", "description": "All commercial systems that include sensitive data must implement a security structure. The complexity of the security mechanism is usually a function of the sensitivity of the data. In a system that requires users to be authenticated, you should record:\n\n* All sign-in attempts, whether they fail or succeed.\n* All operations performed by—and the details of all resources accessed by—an authenticated user.\n* When a user ends a session and signs out.\n\nMonitoring might be able to help detect attacks on the system. For example, a large number of failed sign-in attempts might indicate a brute-force attack. An unexpected surge in requests might be the result of a distributed denial-of-service (DDoS) attack. You must be prepared to monitor all requests to all resources regardless of the source of these requests. A system that has a sign-in vulnerability might accidentally expose resources to the outside world without requiring a user to actually sign in.\n\nVisit the following to learn more:", "links": [ { "title": "Security Monitoring", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#security-monitoring", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "eSZq74lROh5lllLyTBK5a": { "title": "Usage Monitoring", "description": "Usage monitoring tracks how the features and components of an application are used. An operator can use the gathered data to:\n\n* Determine which features are heavily used and determine any potential hotspots in the system. High-traffic elements might benefit from functional partitioning or even replication to spread the load more evenly. An operator can also use this information to ascertain which features are infrequently used and are possible candidates for retirement or replacement in a future version of the system.\n* Obtain information about the operational events of the system under normal use. For example, in an e-commerce site, you can record the statistical information about the number of transactions and the volume of customers that are responsible for them. This information can be used for capacity planning as the number of customers grows.\n* Detect (possibly indirectly) user satisfaction with the performance or functionality of the system. For example, if a large number of customers in an e-commerce system regularly abandon their shopping carts, this might be due to a problem with the checkout functionality.\n* Generate billing information. A commercial application or multitenant service might charge customers for the resources that they use.\n* Enforce quotas. If a user in a multitenant system exceeds their paid quota of processing time or resource usage during a specified period, their access can be limited or processing can be throttled.\n\nLearn more from the following links:", "links": [ { "title": "Usage Monitoring", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#usage-monitoring", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "Q0fKphqmPwjTD0dhqiP6K": { "title": "Instrumentation", "description": "Instrumentation is a critical part of the monitoring process. You can make meaningful decisions about the performance and health of a system only if you first capture the data that enables you to make these decisions. The information that you gather by using instrumentation should be sufficient to enable you to assess performance, diagnose problems, and make decisions without requiring you to sign in to a remote production server to perform tracing (and debugging) manually. Instrumentation data typically comprises metrics and information that's written to trace logs.\n\nLearn more from the following links:", "links": [ { "title": "Instrumenting an application", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#instrumenting-an-application", "type": "article" }, { "title": "Instrumenting using Open Telemetry", "url": "https://opentelemetry.io/docs/concepts/what-is-opentelemetry", "type": "article" } ] }, "IwMOTpsYHApdvHZOhXtIw": { "title": "Visualization & Alerts", "description": "An important aspect of any monitoring system is the ability to present the data in such a way that an operator can quickly spot any trends or problems. Also important is the ability to quickly inform an operator if a significant event has occurred that might require attention.\n\nLearn more from the following links:", "links": [ { "title": "Visualize Data and Raise Alerts", "url": "https://learn.microsoft.com/en-us/azure/architecture/best-practices/monitoring#visualizing-data-and-raising-alerts", "type": "article" } ] }, "THlzcZTNnPGLRiHPWT-Jv": { "title": "Cloud Design Patterns", "description": "Cloud design patterns are solutions to common problems that arise when building systems that run on a cloud platform. These patterns provide a way to design and implement systems that can take advantage of the unique characteristics of the cloud, such as scalability, elasticity, and pay-per-use pricing. Some common cloud design patterns include Scalability, Elasticity, Fault Tolerance, Microservices, Serverless, Data Management, Front-end and Back-end separation and Hybrid.\n\nTo learn more, visit the following links:", "links": [ { "title": "Cloud Design Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/", "type": "article" }, { "title": "Explore top posts about Cloud", "url": "https://app.daily.dev/tags/cloud?ref=roadmapsh", "type": "article" } ] }, "dsWpta3WIBvv2K9pNVPo0": { "title": "Messaging", "description": "Messaging is a pattern that allows for the communication and coordination between different components or systems, using messaging technologies such as message queues, message brokers, and event buses. This pattern allows for decoupling of the sender and receiver, and can be used to build scalable and flexible systems.\n\nLearn more from the following links:", "links": [ { "title": "Messaging Cloud Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/category/messaging", "type": "article" } ] }, "VgvUWAC6JYFyPZKBRoEqf": { "title": "Sequential Convoy", "description": "Sequential Convoy is a pattern that allows for the execution of a series of tasks, or convoy, in a specific order. This pattern can be used to ensure that a set of dependent tasks are executed in the correct order and to handle errors or failures during the execution of the tasks. It can be used in scenarios like workflow and transaction. It can be implemented using a variety of technologies such as state machines, workflows, and transactions.\n\nLearn more from the following links:", "links": [ { "title": "What is Sequential Convoy?", "url": "https://learn.microsoft.com/en-us/biztalk/core/sequential-convoys", "type": "article" }, { "title": "Overview - Sequential Convoy pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/sequential-convoy", "type": "article" } ] }, "uR1fU6pm7zTtdBcNgSRi4": { "title": "Scheduling Agent Supervisor", "description": "Coordinate a set of distributed actions as a single operation. If any of the actions fail, try to handle the failures transparently, or else undo the work that was performed, so the entire operation succeeds or fails as a whole. This can add resiliency to a distributed system, by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.\n\nLearn more from the following links:", "links": [ { "title": "Scheduler Agent Supervisor pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/scheduler-agent-supervisor", "type": "article" } ] }, "LncTxPg-wx8loy55r5NmV": { "title": "Queu-based Load Leveling", "description": "Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.\n\nLearn more from the following links:", "links": [ { "title": "Queue-Based Load Leveling pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling", "type": "article" } ] }, "2ryzJhRDTo98gGgn9mAxR": { "title": "Publisher/Subscriber", "description": "Enable an application to announce events to multiple interested consumers asynchronously, without coupling the senders to the receivers.\n\nLearn more from the following links:", "links": [ { "title": "Publisher-Subscriber pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/publisher-subscriber", "type": "article" } ] }, "DZcZEOi7h3u0744YhASet": { "title": "Priority Queue", "description": "Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those with a lower priority. This pattern is useful in applications that offer different service level guarantees to individual clients.\n\nLearn more from the following links:", "links": [ { "title": "Priority Queue pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/priority-queue", "type": "article" } ] }, "siXdR3TB9-4wx_qWieJ5w": { "title": "Pipes and Filters", "description": "Decompose a task that performs complex processing into a series of separate elements that can be reused. This can improve performance, scalability, and reusability by allowing task elements that perform the processing to be deployed and scaled independently.\n\nLearn more from the following links:", "links": [ { "title": "Pipes and Filters pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/pipes-and-filters", "type": "article" } ] }, "9Ld07KLOqP0ICtXEjngYM": { "title": "Competing Consumers", "description": "Enable multiple concurrent consumers to process messages received on the same messaging channel. With multiple concurrent consumers, a system can process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload.\n\nLearn more from the following links:", "links": [ { "title": "Competing Consumers pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/competing-consumers", "type": "article" } ] }, "aCzRgUkVBvtHUeLU6p5ZH": { "title": "Choreography", "description": "Have each component of the system participate in the decision-making process about the workflow of a business transaction, instead of relying on a central point of control.\n\nLearn more from the following links:", "links": [ { "title": "Choreography pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/choreography", "type": "article" } ] }, "kl4upCnnZvJSf2uII1Pa0": { "title": "Claim Check", "description": "Split a large message into a claim check and a payload. Send the claim check to the messaging platform and store the payload to an external service. This pattern allows large messages to be processed, while protecting the message bus and the client from being overwhelmed or slowed down. This pattern also helps to reduce costs, as storage is usually cheaper than resource units used by the messaging platform.\n\nLearn more from the following links:", "links": [ { "title": "Claim Check - Cloud Design patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/claim-check", "type": "article" } ] }, "eNFNXPsFiryVxFe4unVxk": { "title": "Async Request Reply", "description": "Decouple backend processing from a frontend host, where backend processing needs to be asynchronous, but the frontend still needs a clear response.\n\nLearn more from the following links:", "links": [ { "title": "Asynchronous Request-Reply pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/async-request-reply", "type": "article" } ] }, "W0cUCrhiwH_Nrzxw50x3L": { "title": "Data Management", "description": "Data management is the key element of cloud applications, and influences most of the quality attributes. Data is typically hosted in different locations and across multiple servers for reasons such as performance, scalability or availability, and this can present a range of challenges. For example, data consistency must be maintained, and data will typically need to be synchronized across different locations.\n\nLearn more from the following links:", "links": [ { "title": "Data management patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/category/data-management", "type": "article" }, { "title": "Explore top posts about Data Management", "url": "https://app.daily.dev/tags/data-management?ref=roadmapsh", "type": "article" } ] }, "stZOcr8EUBOK_ZB48uToj": { "title": "Valet Key", "description": "Use a token that provides clients with restricted direct access to a specific resource, in order to offload data transfer from the application. This is particularly useful in applications that use cloud-hosted storage systems or queues, and can minimize cost and maximize scalability and performance.\n\nLearn more from the following links:", "links": [ { "title": "Valet Key pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/valet-key", "type": "article" } ] }, "-lKq-LT7EPK7r3xbXLgwS": { "title": "Static Content Hosting", "description": "Deploy static content to a cloud-based storage service that can deliver them directly to the client. This can reduce the need for potentially expensive compute instances.\n\nLearn more from the following links:", "links": [ { "title": "Static Content Hosting pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/static-content-hosting", "type": "article" } ] }, "R6YehzA3X6DDo6oGBoBAx": { "title": "Sharding", "description": "Sharding is a technique used to horizontally partition a large data set across multiple servers, in order to improve the performance, scalability, and availability of a system. This is done by breaking the data set into smaller chunks, called shards, and distributing the shards across multiple servers. Each shard is self-contained and can be managed and scaled independently of the other shards. Sharding can be used in scenarios like scalability, availability, and geo-distribution. Sharding can be implemented using several different algorithms such as range-based sharding, hash-based sharding, and directory-based sharding.\n\nLearn more from the following links:", "links": [ { "title": "Sharding pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/sharding", "type": "article" }, { "title": "Explore top posts about Backend Development", "url": "https://app.daily.dev/tags/backend?ref=roadmapsh", "type": "article" } ] }, "WB7vQ4IJ0TPh2MbZvxP6V": { "title": "Materialized View", "description": "Generate prepopulated views over the data in one or more data stores when the data isn't ideally formatted for required query operations. This can help support efficient querying and data extraction, and improve application performance.\n\nLearn more from the following links:", "links": [ { "title": "Materialized View pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/materialized-view", "type": "article" } ] }, "AH0nVeVsfYOjcI3vZvcdz": { "title": "Index Table", "description": "Create indexes over the fields in data stores that are frequently referenced by queries. This pattern can improve query performance by allowing applications to more quickly locate the data to retrieve from a data store.\n\nLearn more from the following links:", "links": [ { "title": "Index Table pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/index-table", "type": "article" } ] }, "7OgRKlwFqrk3XO2z49EI1": { "title": "Event Sourcing", "description": "Instead of storing just the current state of the data in a domain, use an append-only store to record the full series of actions taken on that data. The store acts as the system of record and can be used to materialize the domain objects. This can simplify tasks in complex domains, by avoiding the need to synchronize the data model and the business domain, while improving performance, scalability, and responsiveness. It can also provide consistency for transactional data, and maintain full audit trails and history that can enable compensating actions.\n\nLearn more from the following links:", "links": [ { "title": "Event Sourcing pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing", "type": "article" }, { "title": "Explore top posts about Architecture", "url": "https://app.daily.dev/tags/architecture?ref=roadmapsh", "type": "article" } ] }, "LTD3dn05c0ruUJW0IQO7z": { "title": "CQRS", "description": "CQRS stands for Command and Query Responsibility Segregation, a pattern that separates read and update operations for a data store. Implementing CQRS in your application can maximize its performance, scalability, and security. The flexibility created by migrating to CQRS allows a system to better evolve over time and prevents update commands from causing merge conflicts at the domain level.\n\nLearn more from the following links:", "links": [ { "title": "CQRS pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs", "type": "article" } ] }, "PK4V9OWNVi8StdA2N13X2": { "title": "Cache-Aside", "description": "Load data on demand into a cache from a data store. This can improve performance and also helps to maintain consistency between data held in the cache and data in the underlying data store.\n\nLearn more from the following links:", "links": [ { "title": "Cache-Aside pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/cache-aside", "type": "article" } ] }, "PtJ7-v1VCLsyaWWYHYujV": { "title": "Design & Implementation", "description": "Good design encompasses factors such as consistency and coherence in component design and deployment, maintainability to simplify administration and development, and reusability to allow components and subsystems to be used in other applications and in other scenarios. Decisions made during the design and implementation phase have a huge impact on the quality and the total cost of ownership of cloud hosted applications and services.\n\nTo learn more, visit the following links:", "links": [ { "title": "Design and implementation patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/category/design-implementation", "type": "article" } ] }, "VIbXf7Jh9PbQ9L-g6pHUG": { "title": "Strangler Fig", "description": "Incrementally migrate a legacy system by gradually replacing specific pieces of functionality with new applications and services. As features from the legacy system are replaced, the new system eventually replaces all of the old system's features, strangling the old system and allowing you to decommission it.\n\nTo learn more, visit the following links:", "links": [ { "title": "What is Strangler fig?", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/strangler-fig", "type": "article" } ] }, "izPT8NfJy1JC6h3i7GJYl": { "title": "Static Content Hosting", "description": "Deploy static content to a cloud-based storage service that can deliver them directly to the client. This can reduce the need for potentially expensive compute instances.\n\nLearn more from the following links:", "links": [ { "title": "Static Content Hosting pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/static-content-hosting", "type": "article" } ] }, "AAgOGrra5Yz3_eG6tD2Fx": { "title": "Sidecar", "description": "Deploy components of an application into a separate process or container to provide isolation and encapsulation. This pattern can also enable applications to be composed of heterogeneous components and technologies.\n\nThis pattern is named Sidecar because it resembles a sidecar attached to a motorcycle. In the pattern, the sidecar is attached to a parent application and provides supporting features for the application. The sidecar also shares the same lifecycle as the parent application, being created and retired alongside the parent. The sidecar pattern is sometimes referred to as the sidekick pattern and is a decomposition pattern.\n\nTo learn more, visit the following links:", "links": [ { "title": "Sidecar pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/sidecar", "type": "article" }, { "title": "Explore top posts about Infrastructure", "url": "https://app.daily.dev/tags/infrastructure?ref=roadmapsh", "type": "article" } ] }, "WkoFezOXLf1H2XI9AtBtv": { "title": "Pipes & Filters", "description": "Decompose a task that performs complex processing into a series of separate elements that can be reused. This can improve performance, scalability, and reusability by allowing task elements that perform the processing to be deployed and scaled independently.\n\nTo learn more, visit the following links:", "links": [ { "title": "Pipe and Filter Architectural Style", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/pipes-and-filters", "type": "article" } ] }, "beWKUIB6Za27yhxQwEYe3": { "title": "Leader Election", "description": "Coordinate the actions performed by a collection of collaborating instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the others. This can help to ensure that instances don't conflict with each other, cause contention for shared resources, or inadvertently interfere with the work that other instances are performing.\n\nTo learn more, visit the following links:", "links": [ { "title": "Leader Election Pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/leader-election", "type": "article" } ] }, "LXH_mDlILqcyIKtMYTWqy": { "title": "Gateway Routing", "description": "Route requests to multiple services or multiple service instances using a single endpoint. The pattern is useful when you want to:\n\n* Expose multiple services on a single endpoint and route to the appropriate service based on the request\n* Expose multiple instances of the same service on a single endpoint for load balancing or availability purposes\n* Expose differing versions of the same service on a single endpoint and route traffic across the different versions\n\nTo learn more, visit the following links:", "links": [ { "title": "Gateway Routing pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/gateway-routing", "type": "article" } ] }, "0SOWAA8hrLM-WsG5k66fd": { "title": "Gateway Offloading", "description": "Offload shared or specialized service functionality to a gateway proxy. This pattern can simplify application development by moving shared service functionality, such as the use of SSL certificates, from other parts of the application into the gateway.\n\nTo learn more, visit the following links:", "links": [ { "title": "Gateway Offloading pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/gateway-offloading", "type": "article" } ] }, "bANGLm_5zR9mqMd6Oox8s": { "title": "Gateway Aggregation", "description": "Use a gateway to aggregate multiple individual requests into a single request. This pattern is useful when a client must make multiple calls to different backend systems to perform an operation.\n\nTo learn more, visit the following links:", "links": [ { "title": "Gateway Aggregation pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/gateway-aggregation", "type": "article" } ] }, "BrgXwf7g2F-6Rqfjryvpj": { "title": "External Config Store", "description": "Move configuration information out of the application deployment package to a centralized location. This can provide opportunities for easier management and control of configuration data, and for sharing configuration data across applications and application instances.\n\nTo learn more, visit the following links:", "links": [ { "title": "External Configuration Store pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/external-configuration-store", "type": "article" } ] }, "ODjVoXnvJasPvCS2A5iMO": { "title": "Compute Resource Consolidation", "description": "Consolidate multiple tasks or operations into a single computational unit. This can increase compute resource utilization, and reduce the costs and management overhead associated with performing compute processing in cloud-hosted applications.\n\nTo learn more, visit the following links:", "links": [ { "title": "Compute Resource Consolidation pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/compute-resource-consolidation", "type": "article" } ] }, "ivr3mh0OES5n86FI1PN4N": { "title": "CQRS", "description": "CQRS stands for Command and Query Responsibility Segregation, a pattern that separates read and update operations for a data store. Implementing CQRS in your application can maximize its performance, scalability, and security. The flexibility created by migrating to CQRS allows a system to better evolve over time and prevents update commands from causing merge conflicts at the domain level.\n\nLearn more from the following links:", "links": [ { "title": "CQRS pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs", "type": "article" } ] }, "n4It-lr7FFtSY83DcGydX": { "title": "Backens for Frontend", "description": "Create separate backend services to be consumed by specific frontend applications or interfaces. This pattern is useful when you want to avoid customizing a single backend for multiple interfaces. This pattern was first described by Sam Newman.\n\nTo learn more, visit the following links:", "links": [ { "title": "Backends for Frontends pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/backends-for-frontends", "type": "article" }, { "title": "Explore top posts about Frontend Development", "url": "https://app.daily.dev/tags/frontend?ref=roadmapsh", "type": "article" } ] }, "4hi7LvjLcv8eR6m-uk8XQ": { "title": "Anti-Corruption Layer", "description": "Implement a facade or adapter layer between different subsystems that don't share the same semantics. This layer translates requests that one subsystem makes to the other subsystem. Use this pattern to ensure that an application's design is not limited by dependencies on outside subsystems. This pattern was first described by Eric Evans in Domain-Driven Design.\n\nTo learn more, visit the following links:", "links": [ { "title": "Anti-corruption Layer pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/anti-corruption-layer", "type": "article" } ] }, "Hja4YF3JcgM6CPwB1mxmo": { "title": "Ambassador", "description": "Create helper services that send network requests on behalf of a consumer service or application. An ambassador service can be thought of as an out-of-process proxy that is co-located with the client.\n\nThis pattern can be useful for offloading common client connectivity tasks such as monitoring, logging, routing, security (such as TLS), and resiliency patterns in a language agnostic way. It is often used with legacy applications, or other applications that are difficult to modify, in order to extend their networking capabilities. It can also enable a specialized team to implement those features.\n\nTo learn more, visit the following links:", "links": [ { "title": "Ambassador pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/ambassador", "type": "article" } ] }, "DYkdM_L7T2GcTPAoZNnUR": { "title": "Reliability Patterns", "description": "These patterns provide a way to design and implement systems that can withstand failures, maintain high levels of performance, and recover quickly from disruptions. Some common reliability patterns include Failover, Circuit Breaker, Retry, Bulkhead, Backpressure, Cache-Aside, Idempotent Operations and Health Endpoint Monitoring.\n\nLearn more from the following links:", "links": [ { "title": "Reliability Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns", "type": "article" } ] }, "Xzkvf4naveszLGV9b-8ih": { "title": "Availability", "description": "Availability is measured as a percentage of uptime, and defines the proportion of time that a system is functional and working. Availability is affected by system errors, infrastructure problems, malicious attacks, and system load. Cloud applications typically provide users with a service level agreement (SLA), which means that applications must be designed and implemented to maximize availability.\n\nTo learn more visit the following links:", "links": [ { "title": "Availability Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#availability", "type": "article" } ] }, "FPPJw-I1cw8OxKwmDh0dT": { "title": "Deployment Stamps", "description": "The deployment stamp pattern involves provisioning, managing, and monitoring a heterogeneous group of resources to host and operate multiple workloads or tenants. Each individual copy is called a stamp, or sometimes a service unit, scale unit, or cell. In a multi-tenant environment, every stamp or scale unit can serve a predefined number of tenants. Multiple stamps can be deployed to scale the solution almost linearly and serve an increasing number of tenants. This approach can improve the scalability of your solution, allow you to deploy instances across multiple regions, and separate your customer data.\n\nTo learn more visit the following links:", "links": [ { "title": "Deployment Stamps pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/deployment-stamp", "type": "article" }, { "title": "Deployment Stamps 101", "url": "https://blog.devgenius.io/deployment-stamps-101-7c04a6f704a2", "type": "article" }, { "title": "Explore top posts about CI/CD", "url": "https://app.daily.dev/tags/cicd?ref=roadmapsh", "type": "article" } ] }, "Ml9lPDGjRAJTHkBnX51Un": { "title": "Geodes", "description": "The Geode pattern involves deploying a collection of backend services into a set of geographical nodes, each of which can service any request for any client in any region. This pattern allows serving requests in an active-active style, improving latency and increasing availability by distributing request processing around the globe.\n\nTo learn more visit the following links:", "links": [ { "title": "Geode pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/geodes", "type": "article" }, { "title": "Geode Formation, Types & Appearance | What is a Geode?", "url": "https://study.com/academy/lesson/geode-formation-types-appearance.html", "type": "article" } ] }, "cNJQoMNZmxNygWAJIA8HI": { "title": "Health Endpoint Monitoring", "description": "Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.\n\nTo learn more visit the following links:", "links": [ { "title": "Health Endpoint Monitoring pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring", "type": "article" }, { "title": "Explaining the health endpoint monitoring pattern", "url": "https://www.oreilly.com/library/view/java-ee-8/9781788830621/5012c01e-90ca-4809-a210-d3736574f5b3.xhtml", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "-M3Zd8w79sKBAY6_aJRE8": { "title": "Queue-Based Load Leveling", "description": "Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.\n\nTo learn more visit the following links:", "links": [ { "title": "Queue-Based Load Leveling pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling", "type": "article" } ] }, "6YVkguDOtwveyeP4Z1NL3": { "title": "Throttling", "description": "Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.\n\nTo learn more visit the following links:", "links": [ { "title": "Throttling pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/throttling", "type": "article" } ] }, "wPe7Xlwqws7tEpTAVvYjr": { "title": "High Availability", "description": "Azure infrastructure is composed of geographies, regions, and Availability Zones, which limit the blast radius of a failure and therefore limit potential impact to customer applications and data. The Azure Availability Zones construct was developed to provide a software and networking solution to protect against datacenter failures and to provide increased high availability (HA) to our customers. With HA architecture there is a balance between high resilience, low latency, and cost.\n\nLearn more from the following links:", "links": [ { "title": "High availability Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#high-availability", "type": "article" } ] }, "Ze471tPbAwlwZyU4oIzH9": { "title": "Deployment Stamps", "description": "The deployment stamp pattern involves provisioning, managing, and monitoring a heterogeneous group of resources to host and operate multiple workloads or tenants. Each individual copy is called a stamp, or sometimes a service unit, scale unit, or cell. In a multi-tenant environment, every stamp or scale unit can serve a predefined number of tenants. Multiple stamps can be deployed to scale the solution almost linearly and serve an increasing number of tenants. This approach can improve the scalability of your solution, allow you to deploy instances across multiple regions, and separate your customer data.\n\nTo learn more visit the following links:", "links": [ { "title": "Deployment Stamps pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/deployment-stamp", "type": "article" }, { "title": "Deployment Stamps 101", "url": "https://blog.devgenius.io/deployment-stamps-101-7c04a6f704a2", "type": "article" }, { "title": "Explore top posts about CI/CD", "url": "https://app.daily.dev/tags/cicd?ref=roadmapsh", "type": "article" } ] }, "6hOSEZJZ7yezVN67h5gmS": { "title": "Geodes", "description": "The Geode pattern involves deploying a collection of backend services into a set of geographical nodes, each of which can service any request for any client in any region. This pattern allows serving requests in an active-active style, improving latency and increasing availability by distributing request processing around the globe.\n\nTo learn more visit the following links:", "links": [ { "title": "Geode pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/geodes", "type": "article" }, { "title": "Geode Formation, Types & Appearance | What is a Geode?", "url": "https://study.com/academy/lesson/geode-formation-types-appearance.html", "type": "article" } ] }, "uK5o7NgDvr2pV0ulF0Fh9": { "title": "Health Endpoint Monitoring", "description": "Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.\n\nTo learn more visit the following links:", "links": [ { "title": "Health Endpoint Monitoring pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring", "type": "article" }, { "title": "Explaining the health endpoint monitoring pattern", "url": "https://www.oreilly.com/library/view/java-ee-8/9781788830621/5012c01e-90ca-4809-a210-d3736574f5b3.xhtml", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "IR2_kgs2U9rnAJiDBmpqK": { "title": "Bulkhead", "description": "The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function. It's named after the sectioned partitions (bulkheads) of a ship's hull. If the hull of a ship is compromised, only the damaged section fills with water, which prevents the ship from sinking.\n\nLearn more from the following links:", "links": [ { "title": "Bulkhead pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead", "type": "article" }, { "title": "Get started with Bulkhead", "url": "https://dzone.com/articles/resilient-microservices-pattern-bulkhead-pattern", "type": "article" } ] }, "D1OmCoqvd3-_af3u0ciHr": { "title": "Circuit Breaker", "description": "Handle faults that might take a variable amount of time to recover from, when connecting to a remote service or resource. This can improve the stability and resiliency of an application.\n\nLearn more from the following links:", "links": [ { "title": "Circuit breaker design pattern", "url": "https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern", "type": "article" }, { "title": "Overview of Circuit Breaker", "url": "https://medium.com/geekculture/design-patterns-for-microservices-circuit-breaker-pattern-276249ffab33", "type": "article" } ] }, "wlAWMjxZF6yav3ZXOScxH": { "title": "Resiliency", "description": "Resiliency is the ability of a system to gracefully handle and recover from failures, both inadvertent and malicious.\n\nThe nature of cloud hosting, where applications are often multi-tenant, use shared platform services, compete for resources and bandwidth, communicate over the Internet, and run on commodity hardware means there is an increased likelihood that both transient and more permanent faults will arise. The connected nature of the internet and the rise in sophistication and volume of attacks increase the likelihood of a security disruption.\n\nDetecting failures and recovering quickly and efficiently, is necessary to maintain resiliency.\n\nLearn more from the following links:", "links": [ { "title": "Resiliency Patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/framework/resiliency/reliability-patterns#resiliency", "type": "article" } ] }, "PLn9TF9GYnPcbpTdDMQbG": { "title": "Bulkhead", "description": "The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function. It's named after the sectioned partitions (bulkheads) of a ship's hull. If the hull of a ship is compromised, only the damaged section fills with water, which prevents the ship from sinking.\n\nLearn more from the following links:", "links": [ { "title": "Bulkhead pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead", "type": "article" }, { "title": "Get started with Bulkhead", "url": "https://dzone.com/articles/resilient-microservices-pattern-bulkhead-pattern", "type": "article" } ] }, "O4zYDqvVWD7sMI27k_0Nl": { "title": "Circuit Breaker", "description": "Handle faults that might take a variable amount of time to recover from, when connecting to a remote service or resource. This can improve the stability and resiliency of an application.\n\nLearn more from the following links:", "links": [ { "title": "Circuit breaker design pattern", "url": "https://en.wikipedia.org/wiki/Circuit_breaker_design_pattern", "type": "article" }, { "title": "Overview of Circuit Breaker", "url": "https://medium.com/geekculture/design-patterns-for-microservices-circuit-breaker-pattern-276249ffab33", "type": "article" } ] }, "MNlWNjrG8eh5OzPVlbb9t": { "title": "Compensating Transaction", "description": "Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the steps fail. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.\n\nLearn more from the following resources:", "links": [ { "title": "Compensating Transaction pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/compensating-transaction", "type": "article" }, { "title": "Intro to Compensation Transaction", "url": "https://en.wikipedia.org/wiki/Compensating_transaction", "type": "article" } ] }, "CKCNk3obx4u43rBqUj2Yf": { "title": "Health Endpoint Monitoring", "description": "Implement functional checks in an application that external tools can access through exposed endpoints at regular intervals. This can help to verify that applications and services are performing correctly.\n\nTo learn more visit the following links:", "links": [ { "title": "Health Endpoint Monitoring pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring", "type": "article" }, { "title": "Explaining the health endpoint monitoring pattern", "url": "https://www.oreilly.com/library/view/java-ee-8/9781788830621/5012c01e-90ca-4809-a210-d3736574f5b3.xhtml", "type": "article" }, { "title": "Explore top posts about Monitoring", "url": "https://app.daily.dev/tags/monitoring?ref=roadmapsh", "type": "article" } ] }, "AJLBFyAsEdQYF6ygO0MmQ": { "title": "Leader Election", "description": "Coordinate the actions performed by a collection of collaborating instances in a distributed application by electing one instance as the leader that assumes responsibility for managing the others. This can help to ensure that instances don't conflict with each other, cause contention for shared resources, or inadvertently interfere with the work that other instances are performing.\n\nTo learn more, visit the following links:", "links": [ { "title": "Leader Election Pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/leader-election", "type": "article" } ] }, "NybkOwl1lgaglZPRJQJ_Z": { "title": "Queue-Based Load Leveling", "description": "Use a queue that acts as a buffer between a task and a service it invokes in order to smooth intermittent heavy loads that can cause the service to fail or the task to time out. This can help to minimize the impact of peaks in demand on availability and responsiveness for both the task and the service.\n\nTo learn more visit the following links:", "links": [ { "title": "Queue-Based Load Leveling pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling", "type": "article" } ] }, "xX_9VGUaOkBYFH3jPjnww": { "title": "Retry ", "description": "Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.\n\nLearn more from the following resources:", "links": [ { "title": "Retry pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/retry", "type": "article" } ] }, "RTEJHZ26znfBLrpQPtNvn": { "title": "Scheduler Agent Supervisor", "description": "Coordinate a set of distributed actions as a single operation. If any of the actions fail, try to handle the failures transparently, or else undo the work that was performed, so the entire operation succeeds or fails as a whole. This can add resiliency to a distributed system, by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.\n\nLearn more from the following links:", "links": [ { "title": "Scheduler Agent Supervisor pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/scheduler-agent-supervisor", "type": "article" } ] }, "ZvYpE6-N5dAtRDIwqcAu6": { "title": "Security", "description": "Security provides confidentiality, integrity, and availability assurances against malicious attacks on information systems (and safety assurances for attacks on operational technology systems). Losing these assurances can negatively impact your business operations and revenue, as well as your organization's reputation in the marketplace. Maintaining security requires following well-established practices (security hygiene) and being vigilant to detect and rapidly remediate vulnerabilities and active attacks.\n\nLearn more from the following links:", "links": [ { "title": "Security patterns", "url": "https://learn.microsoft.com/en-us/azure/architecture/framework/security/security-patterns", "type": "article" }, { "title": "Explore top posts about Security", "url": "https://app.daily.dev/tags/security?ref=roadmapsh", "type": "article" } ] }, "lHPl-kr1ArblR7bJeQEB9": { "title": "Federated Identity", "description": "Delegate authentication to an external identity provider. This can simplify development, minimize the requirement for user administration, and improve the user experience of the application.\n\nTo learn more, visit the following links:", "links": [ { "title": "Federated Identity pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/federated-identity", "type": "article" } ] }, "DTQJu0AvgWOhMFcOYqzTD": { "title": "Gatekeeper", "description": "Protect applications and services using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. This can provide an additional layer of security and limit the system's attack surface.\n\nLearn more from the following resources:", "links": [ { "title": "Gatekeeper pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/gatekeeper", "type": "article" } ] }, "VltZgIrApHOwZ8YHvdmHB": { "title": "Valet Key", "description": "Use a token that provides clients with restricted direct access to a specific resource, in order to offload data transfer from the application. This is particularly useful in applications that use cloud-hosted storage systems or queues, and can minimize cost and maximize scalability and performance.\n\nLearn more from the following links:", "links": [ { "title": "Valet Key pattern", "url": "https://learn.microsoft.com/en-us/azure/architecture/patterns/valet-key", "type": "article" } ] } }