developer-roadmap/public/roadmap-content/data-engineer.json

{
  "WSYIFni7G2C9Jr0pwuami": {
    "title": "Introduction",
    "description": "Data engineers are responsible for laying the foundations for the acquisition, storage, transformation, and management of data in an organization. They manage the design, creation, and maintenance of database architecture and data processing systems, ensuring that the subsequent work of analysis, BI, and machine learning model development can be carried out seamlessly, continuously, securely, and effectively.\n\nData engineers are one of the most technical profiles in the field of data science, bridging the gap between software and application developers and traditional data science positions.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "How to Become a Data Engineer in 2025: 5 Steps for Career Success",
        "url": "https://www.datacamp.com/blog/how-to-become-a-data-engineer",
        "type": "article"
      },
      {
        "title": "What Does a Data Engineer ACTUALLY Do?",
        "url": "https://www.youtube.com/watch?v=hTjo-QVWcK0",
        "type": "video"
      }
    ]
  },
  "WB2PRVI9C6RIbJ6l9zdbd": {
    "title": "What is Data Engineering?",
    "description": "Data engineering is the practice of designing and building systems for the aggregation, storage and analysis of data at scale. Data engineers excel at creating and deploying algorithms, data pipelines and workflows that sort raw data into ready-to-use datasets. Data engineering is an integral component of the modern data platform and makes it possible for businesses to analyze and apply the data they receive, regardless of the data source or format.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is data engineering?",
        "url": "https://www.ibm.com/think/topics/data-engineering",
        "type": "article"
      },
      {
        "title": "How to Become a Data Engineer in 2025: 5 Steps for Career Success",
        "url": "https://www.datacamp.com/blog/how-to-become-a-data-engineer",
        "type": "article"
      },
      {
        "title": "WHow Data Engineering Works?",
        "url": "https://www.youtube.com/watch?v=qWru-b6m030",
        "type": "video"
      }
    ]
  },
  "jJukG4XxfFcID_VlQKqe-": {
    "title": "Data Engineering vs Data Science",
    "description": "Data engineering and data science are distinct but complementary roles within the field of data. Data engineering focuses on building and maintaining the infrastructure for data collection, storage, and processing, essentially creating the systems that make data available for downstream users. On the other hand, data science professionals, like data analysts and data scientists, uses that data to extract insights, build predictive models, and ultimately inform decision-making.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Data Scientist vs Data Engineer",
        "url": "https://www.datacamp.com/blog/data-scientist-vs-data-engineer",
        "type": "article"
      },
      {
        "title": "Should You Be a Data Scientist, Analyst or Engineer?",
        "url": "https://www.youtube.com/watch?v=dUnKYhripIE",
        "type": "video"
      }
    ]
  },
  "3BxbkrBp8veZj38zdwN8s": {
    "title": "Skills and Responsibilities",
    "description": "Here’s a list of essential data engineering skills:\n\n1.  SQL & Database Management: Ability to query, manipulate, and design relational databases efficiently using SQL. This is the bread-and-butter for extracting, transforming, and analyzing data.\n    \n2.  Data Modeling: Designing schemas and structures (star, snowflake, normalized forms) to optimize storage, performance, and usability of data.\n    \n3.  ETL/ELT Development: Building Extract-Transform-Load (or Load-Transform) pipelines to move and reshape data between systems while ensuring quality and consistency.\n    \n4.  Big Data Frameworks: Proficiency with tools like Apache Spark, Hadoop, or Flink to process and analyze massive datasets in distributed environments.\n    \n5.  Cloud Platforms: Working knowledge of AWS, Azure, or GCP for storage, compute, and orchestration (e.g., S3, BigQuery, Dataflow, Redshift).\n    \n6.  Data Warehousing: Understanding concepts and tools (Snowflake, BigQuery, Redshift) for centralizing, optimizing, and querying large volumes of business data.\n    \n7.  Workflow Orchestration: Using tools like Apache Airflow, Prefect, or Dagster to automate and schedule complex data pipelines reliably.\n    \n8.  Scripting & Programming: Strong skills in Python or Scala for building data processing scripts, automation tasks, and integration with APIs.\n    \n9.  Data Governance & Security: Applying practices for data quality, lineage tracking, access control, compliance (GDPR, HIPAA), and encryption.\n    \n10.  Monitoring & Performance Optimization: Setting up alerts, logging, and tuning pipelines to ensure they run efficiently, catch errors early, and scale smoothly.\n    \n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Top Data Engineer Skills and Responsibilities",
        "url": "https://www.simplilearn.com/data-engineer-role-article",
        "type": "article"
      },
      {
        "title": "5 Essential Data Engineering Skills For 2025",
        "url": "https://www.datacamp.com/blog/essential-data-engineering-skills",
        "type": "article"
      },
      {
        "title": "What skills do you need as a Data Engineer?",
        "url": "https://www.youtube.com/watch?v=sF04UxNAvmg",
        "type": "video"
      }
    ]
  },
  "Ouph2bHeLQsrHl45ar4Cs": {
    "title": "Data Engineering Lifecycle",
    "description": "The data engineering lifecycle encompasses the entire process of transforming raw data into a useful end product. It involves several stages, each with specific roles and responsibilities. This lifecycle ensures that data is handled efficiently and effectively, from its initial generation to its final consumption.\n\nIt involves 4 steps:\n\n1.  Data Generation: Collecting data from various source systems.\n2.  Data Storage: Safely storing data for future processing and analysis.\n3.  Data Ingestion: Transforming and bringing data into a centralized system.\n4.  Data Data Serving: Providing data to end-users for decision-making and operational purposes.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Data Engineering Lifecycle",
        "url": "hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e",
        "type": "article"
      },
      {
        "title": "Fundamentals of Data Engineering",
        "url": "https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/",
        "type": "article"
      },
      {
        "title": "Getting Into Data Engineering",
        "url": "https://www.youtube.com/watch?v=hZu_87l62J4",
        "type": "video"
      }
    ]
  },
  "_MpdVlvvkrsgzigYMZ_P8": {
    "title": "Choosing the Right Technologies",
    "description": "The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:\n\n*   **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.\n*   **Interoperability**. When choosing a technology or system, you’ll need to ensure that it interacts and operates smoothly with other technologies.\n*   **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.\n*   **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.\n*   **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.\n*   **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Build hybrid and multicloud architectures using Google Cloud",
        "url": "https://cloud.google.com/architecture/hybrid-multicloud-patterns",
        "type": "article"
      },
      {
        "title": "The Unfulfilled Promise of Serverless",
        "url": "https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/",
        "type": "article"
      },
      {
        "title": "Fundamentals of Data Engineering",
        "url": "https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/",
        "type": "article"
      }
    ]
  },
  "_2Ofq3Df-VRXDgKyveZ0U": {
    "title": "Programming Skills",
    "description": "To be successful as a data engineer, you need to be proficient in coding. This involves knowning basic concepts and principles that form the foundation of any computer programming language. These include understanding variables, which store data for processing, control structures such as loops and conditional statements that direct the flow of a program, data structures which organize and store data efficiently, and algorithms which step by step instructions to solve specific problems or perform specific tasks.",
    "links": []
  },
  "ILs5azr4L_uLK0CDFKVaz": {
    "title": "Python",
    "description": "Python’s inherent characteristics and the wealth of resources that have grown around it have made it the data engineer’s language of choice. Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Python Website",
        "url": "https://www.python.org/",
        "type": "article"
      },
      {
        "title": "Python - Wiki",
        "url": "https://en.wikipedia.org/wiki/Python_(programming_language)",
        "type": "article"
      },
      {
        "title": "Tutorial Series: How to Code in Python",
        "url": "https://www.digitalocean.com/community/tutorials/how-to-write-your-first-python-3-program",
        "type": "article"
      },
      {
        "title": "Google's Python Class",
        "url": "https://developers.google.com/edu/python",
        "type": "article"
      },
      {
        "title": "Explore top posts about Python",
        "url": "https://app.daily.dev/tags/python?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Learn Python - Full Course",
        "url": "https://www.youtube.com/watch?v=4M87qBgpafk",
        "type": "video"
      }
    ]
  },
  "LZ4t8CoCjGWMzE0hScTGZ": {
    "title": "Java",
    "description": "Java has had a big influence on data engineering because many core big data tools and frameworks, like Hadoop, Spark (originally in Scala, which runs on the JVM), and Kafka, are built using Java or run on the Java Virtual Machine (JVM). This means Java’s performance, scalability, and cross-platform capabilities have shaped how large-scale data processing systems are designed.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "@courseIntroduction to Java by Hyperskill (JetBrains Academy)",
        "url": "https://hyperskill.org/courses/8",
        "type": "article"
      },
      {
        "title": "Thinking in Java",
        "url": "https://www.amazon.co.uk/Thinking-Java-Eckel-Bruce-February/dp/B00IBON6C6",
        "type": "article"
      },
      {
        "title": "Effective Java",
        "url": "https://www.amazon.com/Effective-Java-Joshua-Bloch/dp/0134685997",
        "type": "article"
      },
      {
        "title": "Java: The Complete Reference",
        "url": "https://www.amazon.co.uk/gp/product/B09JL8BMK7/ref=dbs_a_def_rwt_bibl_vppi_i2",
        "type": "article"
      },
      {
        "title": "Explore top posts about Java",
        "url": "https://app.daily.dev/tags/java?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Java Tutorial for Beginners",
        "url": "https://www.youtube.com/watch?v=eIrMbAQSU34&feature=youtu.be",
        "type": "video"
      },
      {
        "title": "Java + DSA + Interview Preparation Course (For beginners)",
        "url": "https://www.youtube.com/playlist?list=PL9gnSGHSqcnr_DxHsP7AW9ftq0AtAyYqJ",
        "type": "video"
      }
    ]
  },
  "WHJXJ5ukJd-tK_3LFLJBg": {
    "title": "Scala",
    "description": "Scala is a programming language that combines the strengths of object-oriented and functional programming, and it runs on the Java Virtual Machine (JVM). In data engineering, Scala is especially important because Apache Spark, one of the most popular big data processing frameworks, was written in Scala. This means Scala can use Spark’s features directly and efficiently, often with cleaner and more concise code than Java. Its ability to handle complex data transformations with less code makes it a powerful tool for building fast, scalable data pipelines.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "The Scala Programming Language",
        "url": "https://www.scala-lang.org/",
        "type": "article"
      },
      {
        "title": "Scala for Beginners: An Introduction",
        "url": "https://daily.dev/blog/scala-for-beginners-an-introduction",
        "type": "article"
      },
      {
        "title": "Scala Tutorial",
        "url": "https://www.youtube.com/playlist?list=PLS1QulWo1RIagob5D6kMIAvu7DQC5VTh3",
        "type": "video"
      }
    ]
  },
  "4z2i5NXTo9h3YY0kJvRrz": {
    "title": "Go",
    "description": "Go, also known as Golang, is a statically typed, compiled programming language designed by Google. It combines the efficiency of compiled languages with the ease of use of dynamically typed interpreted languages. Go features built-in concurrency support through goroutines and channels, making it well-suited for networked and multicore systems. It has a simple and clean syntax, fast compilation times, and efficient garbage collection. Go's standard library is comprehensive, reducing the need for external dependencies. The language emphasizes simplicity and readability, with features like implicit interfaces and a lack of inheritance. Go is particularly popular for building microservices, web servers, and distributed systems. Its performance, simplicity, and robust tooling make it a favored choice for cloud-native development, DevOps tools, and large-scale backend systems.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated Go Roadmap",
        "url": "https://roadmap.sh/golang",
        "type": "article"
      },
      {
        "title": "Go Reference Documentation",
        "url": "https://go.dev/doc/",
        "type": "article"
      },
      {
        "title": "Go by Example - annotated example programs",
        "url": "https://gobyexample.com/",
        "type": "article"
      },
      {
        "title": "Go, the Programming Language of the Cloud",
        "url": "https://thenewstack.io/go-the-programming-language-of-the-cloud/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Golang",
        "url": "https://app.daily.dev/tags/golang?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Go Programming â€“ Golang Course with Bonus Projects",
        "url": "https://www.youtube.com/watch?v=un6ZyFkqFKo",
        "type": "video"
      }
    ]
  },
  "fqmn6DPOA5MH7UWYv6ayn": {
    "title": "Data Structures and Algorithms",
    "description": "**Data Structures** are primarily used to collect, organize and perform operations on the stored data more effectively. They are essential for designing advanced-level Android applications. Examples include Array, Linked List, Stack, Queue, Hash Map, and Tree.\n\n**Algorithms** are a sequence of instructions or rules for performing a particular task. Algorithms can be used for data searching, sorting, or performing complex business logic. Some commonly used algorithms are Binary Search, Bubble Sort, Selection Sort, etc. A deep understanding of data structures and algorithms is crucial in optimizing the performance and the memory consumption of data pipelines\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Interview Questions about Data Structures",
        "url": "https://www.csharpstar.com/csharp-algorithms/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Algorithms",
        "url": "https://app.daily.dev/tags/algorithms?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Data Structures Illustrated",
        "url": "https://www.youtube.com/watch?v=9rhT3P1MDHk&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY",
        "type": "video"
      },
      {
        "title": "Intro to Algorithms",
        "url": "https://www.youtube.com/watch?v=rL8X2mlNHPM",
        "type": "video"
      }
    ]
  },
  "02TADW_PPVtTU_rWV3jf1": {
    "title": "Git and GitHub",
    "description": "**Git** is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.\n\n**GitHub** is a web-based platform that provides hosting for software development and version control using Git. It is widely used by developers and organizations around the world to manage and collaborate on software projects.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated Git & GitHub Roadmap",
        "url": "https://roadmap.sh/git-github",
        "type": "article"
      },
      {
        "title": "Git Documentation",
        "url": "https://git-scm.com/",
        "type": "article"
      },
      {
        "title": "GitHub Documentation",
        "url": "https://docs.github.com/en/get-started/quickstart",
        "type": "article"
      },
      {
        "title": "Learn Git with Tutorials, News and Tips - Atlassian",
        "url": "https://www.atlassian.com/git",
        "type": "article"
      },
      {
        "title": "Git Cheat Sheet",
        "url": "https://cs.fyi/guide/git-cheatsheet",
        "type": "article"
      },
      {
        "title": "What is GitHub?",
        "url": "https://www.youtube.com/watch?v=w3jLJU7DT5E",
        "type": "video"
      },
      {
        "title": "Git & GitHub Crash Course For Beginners",
        "url": "https://www.youtube.com/watch?v=SWYqp7iY_Tc",
        "type": "video"
      }
    ]
  },
  "FXQ_QsljK59zDULLgTqCB": {
    "title": "Linux Basics",
    "description": "Knowledge of UNIX is a must for almost all kind of development as most of the code that you write is most likely going to be finally deployed on a UNIX/Linux machine. Linux has been the backbone of the free and open source software movement, providing a simple and elegant operating system for almost all your needs.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Coursera - Unix Courses",
        "url": "https://www.coursera.org/courses?query=unix",
        "type": "course"
      },
      {
        "title": "Visit Dedicated Linux Roadmap",
        "url": "https://roadmap.sh/linux",
        "type": "article"
      },
      {
        "title": "Linux Basics",
        "url": "https://dev.to/rudrakshi99/linux-basics-2onj",
        "type": "article"
      },
      {
        "title": "Unix / Linux Tutorial",
        "url": "https://www.tutorialspoint.com/unix/index.htm",
        "type": "article"
      },
      {
        "title": "Explore top posts about Linux",
        "url": "https://app.daily.dev/tags/linux?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Linux Operating System - Crash Course",
        "url": "https://www.youtube.com/watch?v=ROjZy1WbCIA",
        "type": "video"
      }
    ]
  },
  "cgkzFMmQils2sYj4NW8VW": {
    "title": "Networking Fundamentals",
    "description": "Networking is the process of connecting two or more computing devices together for the purpose of sharing data. In a data network, shared data may be as simple as a printer or as complex as a global financial transaction.\n\nIf you have networking experience or want to be a reliability engineer or operations engineer, expect questions from these topics. Otherwise, this is just good to know.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Khan Academy - Networking",
        "url": "https://www.khanacademy.org/computing/code-org/computers-and-the-internet",
        "type": "article"
      },
      {
        "title": "Explore top posts about Networking",
        "url": "https://app.daily.dev/tags/networking?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Computer Networking Course - Network Engineering",
        "url": "https://www.youtube.com/watch?v=qiQR5rTSshw",
        "type": "video"
      },
      {
        "title": "Networking Video Series (21 videos)",
        "url": "https://www.youtube.com/playlist?list=PLEbnTDJUr_IegfoqO4iPnPYQui46QqT0j",
        "type": "video"
      }
    ]
  },
  "c1dadtQgbqXwcsQhI6de0": {
    "title": "Distributed Systems Basics",
    "description": "A distributed system is a collection of independent computers that communicate and coordinate to appear as a single unified system. They are widely used for scalability, fault tolerance, and high availability in modern applications. However, they bring challenges such as synchronization, consistency trade-offs (CAP theorem), concurrency, and network latency.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Introduction to Distributed Systems",
        "url": "https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/",
        "type": "article"
      },
      {
        "title": "Distributed Systems Guide",
        "url": "https://www.baeldung.com/cs/distributed-systems-guide",
        "type": "article"
      },
      {
        "title": "Quick overview",
        "url": "https://www.youtube.com/watch?v=IJWwfMyPu1c",
        "type": "video"
      }
    ]
  },
  "AWf1y87pd1JFW71cZ_iE1": {
    "title": "Data Generation",
    "description": "Data generation refers to the different ways data is produced and generated. Thanks to progress in computing power and storage, as well as technology breakthrough in sensor technology (for example, IoT devices), the number of these so-called source systems is rapidly growing. Data is created in many ways, both analog and digital.\n\n**Analog data** refers to continuous, real-world information that is represented by a range of values. It can take on any value within a given range and is often used to describe physical quantities like temperature or sounds.\n\nBy contrast, **digital data** is either created by converting analog data to digital form (eg. images or videos) or is the native product of a digital system, such as logs from a mobile app or syntetic data.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "The Concept of Data Generation",
        "url": "https://www.marktechpost.com/2023/02/27/the-concept-of-data-generation/",
        "type": "article"
      },
      {
        "title": "Analog vs. Digital",
        "url": "https://www.youtube.com/watch?v=zzvglgC5ut0",
        "type": "video"
      }
    ]
  },
  "wydtifF3ZhMWCbVt8Hd2t": {
    "title": "Data Storage",
    "description": "Data storage is the process of saving and preserving digital information on various physical or cloud-based media for future retrieval and use. It encompasses the use of technologies and devices like hard drives and cloud platforms to store data.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is data storage?",
        "url": "https://www.ibm.com/think/topics/data-storage",
        "type": "article"
      }
    ]
  },
  "CvCOkyWcgzaUJec_v5F4L": {
    "title": "Data Ingestion",
    "description": "Data ingestion is the third step in the data engineering lifecycle. It entails the process of collecting and importing data files from various sources into a database for storage, processing and analysis. The goal of data ingestion is to clean and store data in an accessible and consistent central repository to prepare it for use within the organization.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Data Ingestion?",
        "url": "https://www.ibm.com/think/topics/data-ingestion",
        "type": "article"
      },
      {
        "title": "WData Ingestion",
        "url": "https://www.qlik.com/us/data-ingestion",
        "type": "article"
      }
    ]
  },
  "RspQLpkICyHUmthLlxQ84": {
    "title": "Data Serving",
    "description": "Data serving is the last step in the data engineering process. Once the data is stored in your data architectures and transformed into coherent and useful format, it's time for get value from it. Data serving refers to the different ways data is used by downstream applications and users to create value. There are many ways companies can extract value from data, including training machine learning models, BI Analytics, and reverse ETL.",
    "links": []
  },
  "w3cfuNC-IdUKA7CEXs0fT": {
    "title": "Data Engineering Lifecycle",
    "description": "The data engineering lifecycle encompasses the entire process of transforming raw data into a useful end product. It involves several stages, each with specific roles and responsibilities. This lifecycle ensures that data is handled efficiently and effectively, from its initial generation to its final consumption.\n\nIt involves 4 steps:\n\n1.  Data Generation: Collecting data from various source systems.\n2.  Data Storage: Safely storing data for future processing and analysis.\n3.  Data Ingestion: Transforming and bringing data into a centralized system.\n4.  Data Data Serving: Providing data to end-users for decision-making and operational purposes.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Data Engineering Lifecycle",
        "url": "hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e",
        "type": "article"
      },
      {
        "title": "Fundamentals of Data Engineering",
        "url": "https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/",
        "type": "article"
      },
      {
        "title": "Getting Into Data Engineering",
        "url": "https://www.youtube.com/watch?v=hZu_87l62J4",
        "type": "video"
      }
    ]
  },
  "zGKTlMUzhrbVbqpLZBsMZ": {
    "title": "Sources of Data",
    "description": "Sources of data are origins or locations from which data is collected, categorized as primary (direct, firsthand information) or secondary (collected by others). Common primary sources include surveys, interviews, experiments, and sensor data. Secondary sources encompass databases, published reports, government data, books, articles, and web data like social media posts. Data sources can also be classified as internal (within an organization) or external (from outside sources).",
    "links": []
  },
  "qRHeaD2udDaItAxmiIiUg": {
    "title": "Database",
    "description": "A database is an organized, structured collection of electronic data that is stored, managed, and accessed via a computer system, usually controlled by a Database Management System (DBMS). Databases organize various types of data, such as words, numbers, images, and videos, allowing users to easily retrieve, update, and modify it for various purposes, from managing customer information to analyzing business processes.",
    "links": []
  },
  "cxTriSZvrmXP4axKynIZW": {
    "title": "APIs",
    "description": "Application Programming Interfaces, better known as APIs, play a fundamental role in the work of data engineers, particularly in the process of data collection. APIs are sets of protocols, routines, and tools that enable different software applications to communicate with each other. An API allows developers to interact with a service or platform through a defined set of rules and endpoints, enabling data exchange and functionality use without needing to understand the underlying code. In data engineering, APIs are used extensively to collect, exchange, and manipulate data from different sources in a secure and efficient manner.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is an API?",
        "url": "https://aws.amazon.com/what-is/api/",
        "type": "article"
      },
      {
        "title": "A Beginner's Guide to APIs",
        "url": "https://www.postman.com/what-is-an-api/",
        "type": "article"
      }
    ]
  },
  "s-wUPMaagyRupT2RdfHks": {
    "title": "Logs",
    "description": "Logs are files that record events, activities, and system operations over time. They provide a detailed historical record of what has happened within a system, including timestamps, event details, performance data, errors, and user actions. Logs are crucial for troubleshooting problems, monitoring system health and performance, investigating security incidents, and understanding how users interact with a system.",
    "links": []
  },
  "dJZqe47kzRqYIG-4AZTlz": {
    "title": "Mobile Apps",
    "description": "Mobile apps are programs for phones and tablets, usually from app stores. They can be native (for one OS like iOS or Android), hybrid (web tech in a native shell), or cross-platform (like React Native). Apps use phone features like GPS and cameras. They do many things from games to shopping. Good mobile apps focus on easy use, speed, offline working, and security.",
    "links": []
  },
  "KeGCHoJRHp-mBX-P5to4Y": {
    "title": "IoT",
    "description": "IoT, or Internet of Things, defines a network of connected devices interacting with their environment. IoT devices extend beyond standard devices such as PC's, Laptops or Smartphones, including smart locks, connected thermostats and temperature sensors. In industrial settings, this also includes connected machines, robots, and package tracking devices, and many more. IoT Devices measure and collect data about their environment and some also interact by performing certain predefined actions, for example turning the heat up or down.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is the Internet of Things (IoT)?",
        "url": "https://www.ibm.com/think/topics/internet-of-things",
        "type": "article"
      },
      {
        "title": "Internet of Things",
        "url": "https://en.wikipedia.org/wiki/Internet_of_things",
        "type": "article"
      },
      {
        "title": "What is IoT (Internet of Things)? An Introduction",
        "url": "https://www.youtube.com/watch?v=4FxU-xpuCww",
        "type": "video"
      }
    ]
  },
  "wDDWQgMVBYK4WcmHq_d6l": {
    "title": "Data Collection Considerations",
    "description": "Before designing the technology archecture to collect and store data, you should consider the following factors:\n\n*   **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.\n*   **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.\n*   **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.\n*   **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.\n*   **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isn’t lost or corrupted during the data collection process.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Fundamentals of Data Engineering",
        "url": "https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/",
        "type": "article"
      }
    ]
  },
  "g4UC0go7OPCJYJlac9w-i": {
    "title": "Database Fundamentals",
    "description": "A database is a collection of useful data of one or more related organizations structured in a way to make data an asset to the organization. A database management system is a software designed to assist in maintaining and extracting large collections of data in a timely fashion.\n\nA **Relational database** is a type of database that stores and provides access to data points that are related to one another. Relational databases store data in a series of tables.\n\n**NoSQL databases** offer data storage and retrieval that is modelled differently to \"traditional\" relational databases. NoSQL databases typically focus more on horizontal scaling, eventual consistency, speed and flexibility and is used commonly for big data and real-time streaming applications.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Oracle: What is a Database?",
        "url": "https://www.oracle.com/database/what-is-database/",
        "type": "article"
      },
      {
        "title": "Prisma.io: What are Databases?",
        "url": "https://www.prisma.io/dataguide/intro/what-are-databases",
        "type": "article"
      },
      {
        "title": "Intro To Relational Databases",
        "url": "https://www.udacity.com/course/intro-to-relational-databases--ud197",
        "type": "article"
      },
      {
        "title": "NoSQL Explained",
        "url": "https://www.mongodb.com/nosql-explained",
        "type": "article"
      },
      {
        "title": "Explore top posts about Database",
        "url": "https://app.daily.dev/tags/database?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "What is Relational Database",
        "url": "https://youtu.be/OqjJjpjDRLc",
        "type": "video"
      },
      {
        "title": "How do NoSQL Databases work",
        "url": "https://www.youtube.com/watch?v=0buKQHokLK8",
        "type": "video"
      }
    ]
  },
  "kVPEoUX-ZAGwstieD20Qa": {
    "title": "Data Normalization",
    "description": "Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design).\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example",
        "url": "https://www.guru99.com/database-normalization.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about Database",
        "url": "https://app.daily.dev/tags/database?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Complete guide to Database Normalization in SQL",
        "url": "https://www.youtube.com/watch?v=rBPQ5fg_kiY",
        "type": "video"
      }
    ]
  },
  "SlQHO8n97F7-_fc6EUXlj": {
    "title": "Data Modelling Techniques",
    "description": "A data model is a specification of data structures and business rules. It creates a visual representation of data and illustrates how different data elements are related to each other. Different techniques are employed depending on the complexity of the data and the goals. Below you can find a list with the most common data modelling techniques:\n\n*   **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).\n*   **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.\n*   **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors\n*   **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "7 data modeling techniques and concepts for business",
        "url": "https://www.techtarget.com/searchdatamanagement/tip/7-data-modeling-techniques-and-concepts-for-business",
        "type": "article"
      },
      {
        "title": "@articleData Modeling Explained: Techniques, Examples, and Best Practices",
        "url": "https://www.datacamp.com/blog/data-modeling",
        "type": "article"
      }
    ]
  },
  "AslPFjoakcC44CmPB5nuw": {
    "title": "CAP Theorem",
    "description": "The CAP Theorem, also known as Brewer's Theorem, is a fundamental principle in distributed database systems. It states that in a distributed system, it's impossible to simultaneously guarantee all three of the following properties: Consistency (all nodes see the same data at the same time), Availability (every request receives a response, without guarantee that it contains the most recent version of the data), and Partition tolerance (the system continues to operate despite network failures between nodes). According to the theorem, a distributed system can only strongly provide two of these three guarantees at any given time. This principle guides the design and architecture of distributed systems, influencing decisions on data consistency models, replication strategies, and failure handling. Understanding the CAP Theorem is crucial for designing robust, scalable distributed systems and for choosing appropriate database solutions for specific use cases in distributed computing environments.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is CAP Theorem?",
        "url": "https://www.bmc.com/blogs/cap-theorem/",
        "type": "article"
      },
      {
        "title": "An Illustrated Proof of the CAP Theorem",
        "url": "https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/",
        "type": "article"
      },
      {
        "title": "CAP Theorem and its applications in NoSQL Databases",
        "url": "https://www.ibm.com/uk-en/cloud/learn/cap-theorem",
        "type": "article"
      },
      {
        "title": "What is CAP Theorem?",
        "url": "https://www.youtube.com/watch?v=_RbsFXWRZ10",
        "type": "video"
      }
    ]
  },
  "-VQQmIUGesnrT1N6kH5et": {
    "title": "OLTP vs OLAP",
    "description": "Online Transaction Processing (OLTP) refers to a class of systems designed to manage transaction-oriented applications, typically for data entry and retrieval transactions in database systems. OLTP systems are characterized by a large number of short online transactions (INSERT, UPDATE, DELETE), where the emphasis is on speed, efficiency, and maintaining data integrity in multi-access environments. PostgreSQL supports OLTP workloads through features like ACID compliance (Atomicity, Consistency, Isolation, Durability), MVCC (Multi-Version Concurrency Control) for high concurrency, efficient indexing, and robust transaction management. These features ensure reliable, fast, and consistent processing of high-volume, high-frequency transactions critical to OLTP applications.\n\nOnline Analytical Processing (OLAP) refers to a class of systems designed for query-intensive tasks, typically used for data analysis and business intelligence. OLAP systems handle complex queries that aggregate large volumes of data, often from multiple sources, to support decision-making processes.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is OLTP?",
        "url": "https://www.oracle.com/uk/database/what-is-oltp/",
        "type": "article"
      },
      {
        "title": "What is OLAP? - Online Analytical Processing Explained",
        "url": "https://aws.amazon.com/what-is/olap/",
        "type": "article"
      },
      {
        "title": "OLTP vs OLAP",
        "url": "https://www.youtube.com/watch?v=iw-5kFzIdgY",
        "type": "video"
      }
    ]
  },
  "5KgPfywItqLFQRnIZldZH": {
    "title": "Slowly Changing Dimension - SCD",
    "description": "Slowly Changing Dimensions (SCDs) are a data warehousing technique used to track changes in dimension data over time. Instead of simply overwriting old data with new data, SCDs allow you to maintain historical records of how dimension attributes have changed. This is crucial for accurate analysis of historical trends and business performance.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "WMastering Slowly Changing Dimensions (SCD)",
        "url": "https://www.datacamp.com/tutorial/mastering-slowly-changing-dimensions-scd",
        "type": "article"
      },
      {
        "title": "Implementing Slowly Changing Dimensions (SCDs) in Data Warehouses",
        "url": "https://www.sqlshack.com/implementing-slowly-changing-dimensions-scds-in-data-warehouses/",
        "type": "article"
      }
    ]
  },
  "k_XSLLwb0Jk0Dd1sw-MpR": {
    "title": "Horizontal vs Vertical Scaling",
    "description": "Horizontal scaling is the process of adding more machines or nodes to a an existing pool in a system to distribute the workload and address increased load.\n\nBy contrast, vertical scaling involves increasing the computing power of individual machines in a system. This is achieved by adjusting or upgrading hardware components, such as CPU, RAM, and network speed.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Horizontal Vs. Vertical Scaling: Which Should You Choose?",
        "url": "https://www.cloudzero.com/blog/horizontal-vs-vertical-scaling/",
        "type": "article"
      },
      {
        "title": "Vertical Vs Horizontal Scaling: Key Differences You Should Know",
        "url": "https://www.youtube.com/watch?v=dvRFHG2-uYs",
        "type": "video"
      }
    ]
  },
  "OfH_UXnxvGQgwlNQwOEfS": {
    "title": "Star vs Snowflake Schema",
    "description": "A star schema is a way to organize data in a database, namely in data warehouses, to make it easier and faster to analyze. At the center, there's a main table called the **fact table**, which holds measurable data like sales or revenue. Around it are **dimension tables**, which add details like product names, customer info, or dates. This layout forms a star-like shape.\n\nA snowflake schema is another way of organizing data. In this schema, dimension tables are split into smaller sub-dimensions to keep data more organized and detailed, just like snowflakes in a large lake.\n\nThe star schema is simple and fast -ideal when you need to extract data for analysis quickly. On the other hand, the snowflake schema is more detailed. It prioritizes storage efficiency and managing complex data relationships.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Star Schema vs Snowflake Schema: Differences & Use Cases",
        "url": "https://www.datacamp.com/blog/star-schema-vs-snowflake-schema",
        "type": "article"
      }
    ]
  },
  "cslVSSKBMO7I6CpO7vG1H": {
    "title": "Relational Databases",
    "description": "Relational databases are a type of database management system (DBMS) that organizes data into structured tables with rows and columns, using a schema to define data relationships and constraints. They employ Structured Query Language (SQL) for querying and managing data, supporting operations such as data retrieval, insertion, updating, and deletion. Relational databases enforce data integrity through keys (primary and foreign) and constraints (such as unique and not-null), and they are designed to handle complex queries, transactions, and data relationships efficiently. Examples of relational databases include MySQL, PostgreSQL, and Oracle Database. They are commonly used for applications requiring structured data storage, strong consistency, and complex querying capabilities.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Databases and SQL",
        "url": "https://www.edx.org/course/databases-5-sql",
        "type": "course"
      },
      {
        "title": "Relational Databases",
        "url": "https://www.ibm.com/cloud/learn/relational-databases",
        "type": "article"
      },
      {
        "title": "51 Years of Relational Databases",
        "url": "https://learnsql.com/blog/codd-article-databases/",
        "type": "article"
      },
      {
        "title": "Intro To Relational Databases",
        "url": "https://www.udacity.com/course/intro-to-relational-databases--ud197",
        "type": "article"
      },
      {
        "title": "Explore top posts about Backend Development",
        "url": "https://app.daily.dev/tags/backend?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "What is Relational Database",
        "url": "https://youtu.be/OqjJjpjDRLc",
        "type": "video"
      }
    ]
  },
  "2rRVWPON-o3MvpgZmrU_A": {
    "title": "Learn SQL",
    "description": "SQL stands for Structured Query Language. It is a standardized programming language designed to manage and interact with relational database management systems (RDBMS). SQL allows you to create, read, edit, and delete data stored in database tables by writing specific queries.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated SQL Roadmap",
        "url": "https://roadmap.sh/sql",
        "type": "article"
      },
      {
        "title": "SQL Tutorial - Essential SQL For The Beginners",
        "url": "https://www.sqltutorial.org/",
        "type": "article"
      }
    ]
  },
  "ilbFKqhfYyykjJ7cOngwx": {
    "title": "Indexing",
    "description": "Indexing is a data structure technique to efficiently retrieve data from a database. It essentially creates a lookup that can be used to quickly find the location of data records on a disk. Indexes are created using a few database columns and are capable of rapidly locating data without scanning every row in a database table each time the database table is accessed. Indexes can be created using any combination of columns in a database table, reducing the amount of time it takes to find data.\n\nIndexes can be structured in several ways: Binary Tree, B-Tree, Hash Map, etc., each having its own particular strengths and weaknesses. When creating an index, it's crucial to understand which type of index to apply in order to achieve maximum efficiency. Indexes, like any other database feature, must be used wisely because they require disk space and need to be maintained, which can slow down insert and update operations.",
    "links": []
  },
  "1BJGXWax6CONuFkaYR4Jm": {
    "title": "Transactions",
    "description": "Transactions in SQL are units of work that group one or more database operations into a single, atomic unit. They ensure data integrity by following the ACID properties: Atomicity (all or nothing), Consistency (database remains in a valid state), Isolation (transactions don't interfere with each other), and Durability (committed changes are permanent). Transactions are essential for maintaining data consistency in complex operations and handling concurrent access to the database.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "Transactions",
        "url": "https://www.tutorialspoint.com/sql/sql-transactions.htm",
        "type": "article"
      },
      {
        "title": "A Guide to ACID Properties in Database Management Systems",
        "url": "https://www.mongodb.com/resources/basics/databases/acid-transactions",
        "type": "article"
      }
    ]
  },
  "_bFj6rbLuqeQB5MjJZpd6": {
    "title": "MySQL",
    "description": "MySQL is an open-source relational database management system (RDBMS) known for its speed, reliability, and ease of use. It uses SQL (Structured Query Language) for database interactions and supports a range of features for data management, including transactions, indexing, and stored procedures. MySQL is widely used for web applications, data warehousing, and various other applications due to its scalability and flexibility. It integrates well with many programming languages and platforms, and is often employed in conjunction with web servers and frameworks in popular software stacks like LAMP (Linux, Apache, MySQL, PHP/Python/Perl). MySQL is maintained by Oracle Corporation and has a large community and ecosystem supporting its development and use.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "MySQL",
        "url": "https://www.mysql.com/",
        "type": "article"
      },
      {
        "title": "MySQL for Developers",
        "url": "https://planetscale.com/courses/mysql-for-developers/introduction/course-introduction",
        "type": "article"
      },
      {
        "title": "MySQL Tutorial",
        "url": "https://www.mysqltutorial.org/",
        "type": "article"
      },
      {
        "title": "Explore top posts about MySQL",
        "url": "https://app.daily.dev/tags/mysql?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "MySQL Complete Course",
        "url": "https://www.youtube.com/watch?v=5OdVJbNCSso",
        "type": "video"
      }
    ]
  },
  "__JFgwxeDLvz8p7DAJnsc": {
    "title": "PostgreSQL",
    "description": "PostgreSQL is an advanced, open-source relational database management system (RDBMS) known for its robustness, extensibility, and standards compliance. It supports a wide range of data types and advanced features, including complex queries, foreign keys, and full-text search. PostgreSQL is highly extensible, allowing users to define custom data types, operators, and functions. It supports ACID (Atomicity, Consistency, Isolation, Durability) properties for reliable transaction processing and offers strong support for concurrency and data integrity. Its capabilities make it suitable for various applications, from simple web apps to large-scale data warehousing and analytics solutions.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated PostgreSQL DBA Roadmap",
        "url": "https://roadmap.sh/postgresql-dba",
        "type": "article"
      },
      {
        "title": "Official Website",
        "url": "https://www.postgresql.org/",
        "type": "article"
      },
      {
        "title": "Learn PostgreSQL - Full Tutorial for Beginners",
        "url": "https://www.postgresqltutorial.com/",
        "type": "article"
      },
      {
        "title": "Explore top posts about PostgreSQL",
        "url": "https://app.daily.dev/tags/postgresql?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "PostgreSQL in 100 Seconds",
        "url": "https://www.youtube.com/watch?v=n2Fluyr3lbc",
        "type": "video"
      },
      {
        "title": "Postgres tutorial for Beginners",
        "url": "https://www.youtube.com/watch?v=SpfIwlAYaKk",
        "type": "video"
      }
    ]
  },
  "p7S_6O9Qq722r-F4bl6G3": {
    "title": "MariaDB",
    "description": "MariaDB server is a community developed fork of MySQL server. Started by core members of the original MySQL team, MariaDB actively works with outside developers to deliver the most feature rich, stable, and sanely licensed open SQL server in the industry. MariaDB was created with the intention of being a more versatile, drop-in replacement version of MySQL\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "MariaDB",
        "url": "https://mariadb.org/",
        "type": "article"
      },
      {
        "title": "MariaDB vs MySQL",
        "url": "https://www.guru99.com/mariadb-vs-mysql.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about Infrastructure",
        "url": "https://app.daily.dev/tags/infrastructure?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "MariaDB Tutorial For Beginners in One Hour",
        "url": "https://www.youtube.com/watch?v=_AMj02sANpI",
        "type": "video"
      }
    ]
  },
  "YZ4G1-6VJ7VdsphdcBTf9": {
    "title": "Aurora DB",
    "description": "Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. Aurora includes a high-performance storage subsystem. Its MySQL- and PostgreSQL-compatible database engines are customized to take advantage of that fast distributed storage. The underlying storage grows automatically as needed. Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "SAmazon Aurora",
        "url": "https://aws.amazon.com/rds/aurora/",
        "type": "article"
      },
      {
        "title": "SAmazon Aurora: What It Is, How It Works, and How to Get Started",
        "url": "https://www.datacamp.com/tutorial/amazon-aurora",
        "type": "article"
      }
    ]
  },
  "PJcxM60h85Po0AAkSj7nr": {
    "title": "Oracle",
    "description": "Oracle Database is a highly robust, enterprise-grade relational database management system (RDBMS) developed by Oracle Corporation. Known for its scalability, reliability, and comprehensive features, Oracle Database supports complex data management tasks and mission-critical applications. It provides advanced functionalities like SQL querying, transaction management, high availability through clustering, and data warehousing. Oracle's database solutions include support for various data models, such as relational, spatial, and graph, and offer tools for security, performance optimization, and data integration. It is widely used in industries requiring large-scale, secure, and high-performance data processing.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Oracle Website",
        "url": "https://www.oracle.com/database/",
        "type": "article"
      },
      {
        "title": "Oracle Docs",
        "url": "https://docs.oracle.com/en/database/index.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about Oracle",
        "url": "https://app.daily.dev/tags/oracle?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Oracle SQL Tutorial for Beginners",
        "url": "https://www.youtube.com/watch?v=ObbNGhcxXJA",
        "type": "video"
      }
    ]
  },
  "YxnIQh6Y5ic795-YsajB8": {
    "title": "MS SQL",
    "description": "Microsoft SQL Server (MS SQL) is a relational database management system developed by Microsoft for managing and storing structured data. It supports a wide range of data operations, including querying, transaction management, and data warehousing. SQL Server provides tools and features for database design, performance optimization, and security, including support for complex queries through T-SQL (Transact-SQL), data integration with SQL Server Integration Services (SSIS), and business intelligence with SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS). It is commonly used in enterprise environments for applications requiring reliable data storage, transaction processing, and reporting.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated SQL Roadmap",
        "url": "https://roadmap.sh/sql",
        "type": "article"
      },
      {
        "title": "MS SQL",
        "url": "https://www.microsoft.com/en-ca/sql-server/",
        "type": "article"
      },
      {
        "title": "Tutorials for SQL Server",
        "url": "https://docs.microsoft.com/en-us/sql/sql-server/tutorials-for-sql-server-2016?view=sql-server-ver15",
        "type": "article"
      },
      {
        "title": "SQL Server tutorial for beginners",
        "url": "https://www.youtube.com/watch?v=-EPMOaV7h_Q",
        "type": "video"
      }
    ]
  },
  "uZYQ8tqTriXt_JIOjcM9_": {
    "title": "NoSQL Databsases",
    "description": "NoSQL databases are a category of database management systems designed for handling unstructured, semi-structured, or rapidly changing data. Unlike traditional relational databases, which use fixed schemas and SQL for querying, NoSQL databases offer flexible data models and can be classified into several types:\n\n1.  **Document Stores**: Store data in JSON, BSON, or XML formats, allowing for flexible and hierarchical data structures (e.g., MongoDB, CouchDB).\n2.  **Key-Value Stores**: Store data as key-value pairs, suitable for high-speed read and write operations (e.g., Redis, Riak).\n3.  **Column-Family Stores**: Store data in columns rather than rows, which is useful for handling large volumes of data and wide columnar tables (e.g., Apache Cassandra, HBase).\n4.  **Graph Databases**: Optimize the storage and querying of data with complex relationships using graph structures (e.g., Neo4j, Amazon Neptune).\n\nNoSQL databases are often used for applications requiring high scalability, flexibility, and performance, such as real-time analytics, content management systems, and distributed data storage.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "NoSQL Explained",
        "url": "https://www.mongodb.com/nosql-explained",
        "type": "article"
      },
      {
        "title": "Explore top posts about NoSQL",
        "url": "https://app.daily.dev/tags/nosql?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "How do NoSQL Databases work",
        "url": "https://www.youtube.com/watch?v=0buKQHokLK8",
        "type": "video"
      },
      {
        "title": "SQL vs NoSQL Explained",
        "url": "https://www.youtube.com/watch?v=ruz-vK8IesE",
        "type": "video"
      }
    ]
  },
  "sGkAOVl3C-xIIAdtDH9jq": {
    "title": "Document",
    "description": "\\*\\*Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is a Document Database?",
        "url": "https://www.mongodb.com/resources/basics/databases/document-databases",
        "type": "article"
      },
      {
        "title": "HDocument-oriented database",
        "url": "https://en.wikipedia.org/wiki/Document-oriented_database",
        "type": "article"
      }
    ]
  },
  "04V0Bcgjusfqdw0b-Aw4W": {
    "title": "MongoDB",
    "description": "MongoDB is a NoSQL, open-source database designed for storing and managing large volumes of unstructured or semi-structured data. It uses a document-oriented data model where data is stored in BSON (Binary JSON) format, which allows for flexible and hierarchical data representation. Unlike traditional relational databases, MongoDB doesn't require a fixed schema, making it suitable for applications with evolving data requirements or varying data structures. It supports horizontal scaling through sharding and offers high availability with replica sets. MongoDB is commonly used for applications requiring rapid development, real-time analytics, and large-scale data handling, such as content management systems, IoT applications, and big data platforms.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated MongoDB Roadmap",
        "url": "https://roadmap.sh/mongodb",
        "type": "article"
      },
      {
        "title": "MongoDB Website",
        "url": "https://www.mongodb.com/",
        "type": "article"
      },
      {
        "title": "Learning Path for MongoDB Developers",
        "url": "https://learn.mongodb.com/catalog",
        "type": "article"
      },
      {
        "title": "MongoDB Online Sandbox",
        "url": "https://mongoplayground.net/",
        "type": "article"
      },
      {
        "title": "daily.dev MongoDB Feed",
        "url": "https://app.daily.dev/tags/mongodb",
        "type": "article"
      }
    ]
  },
  "_F53cV3ln2yu0ics5BFfx": {
    "title": "ElasticSearch",
    "description": "Elastic search at its core is a document-oriented search engine. It is a document based database that lets you INSERT, DELETE , RETRIEVE and even perform analytics on the saved records. But, Elastic Search is unlike any other general purpose database you have worked with, in the past. It's essentially a search engine and offers an arsenal of features you can use to retrieve the data stored in it, as per your search criteria. And that too, at lightning speeds.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Elasticsearch Website",
        "url": "https://www.elastic.co/elasticsearch/",
        "type": "article"
      },
      {
        "title": "Elasticsearch Documentation",
        "url": "https://www.elastic.co/guide/index.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about ELK",
        "url": "https://app.daily.dev/tags/elk?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "What is Elasticsearch",
        "url": "https://www.youtube.com/watch?v=ZP0NmfyfsoM",
        "type": "video"
      }
    ]
  },
  "goL_GqVVTVxXQMGBw992b": {
    "title": "CosmosDB",
    "description": "Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such [as.NET](http://as.NET), Python, Java, and Node.js.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What are Containers?",
        "url": "https://azure.microsoft.com/en-us/products/cosmos-db#FAQ",
        "type": "article"
      },
      {
        "title": "CAzure Cosmos DB - Database for the AI Era",
        "url": "https://learn.microsoft.com/en-us/azure/cosmos-db/introduction",
        "type": "article"
      },
      {
        "title": "CAzure Cosmos DB: A Global-Scale NoSQL Cloud Database",
        "url": "https://www.datacamp.com/tutorial/azure-cosmos-db",
        "type": "article"
      },
      {
        "title": "What is Azure Cosmos DB?",
        "url": "https://www.youtube.com/watch?v=hBY2YcaIOQM&",
        "type": "video"
      }
    ]
  },
  "-IesOBWPSIlbgvTjBqHcb": {
    "title": "CouchDB",
    "description": "Apache CouchDB is an open source NoSQL document database that collects and stores data in JSON-based document formats. Unlike relational databases, CouchDB uses a schema-free data model, which simplifies record management across various computing devices, mobile phones and web browsers. In CouchDB, each document is uniquely named in the database, and CouchDB provides a RESTful HTTP API for reading and updating (add, edit, delete) database documents. Documents are the primary unit of data in CouchDB and consist of any number of fields and attachments.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "CouchDB",
        "url": "hhttps://couchdb.apache.org/",
        "type": "article"
      },
      {
        "title": "CouchDB Documentation",
        "url": "https://docs.couchdb.org/en/stable/intro/overview.html",
        "type": "article"
      },
      {
        "title": "What is CouchDB?",
        "url": "https://www.ibm.com/think/topics/couchdb",
        "type": "article"
      }
    ]
  },
  "fBD6ZQoMac8w4kMJw_Jrd": {
    "title": "Column",
    "description": "A columnar database is a type of No-SQL database that stores data by columns instead of by rows. In a traditional SQL database, all the information for one record is stored together, but in a columnar database, all the values for a single column are stored together. This makes it much faster to read and analyze large amounts of data, especially when you only need a few columns instead of the whole record. For example, if you want to quickly find the average sales price from millions of rows, a columnar database can scan just the \"price\" column instead of every piece of data. This design is often used in data warehouses and analytics systems because it speeds up queries and saves storage space through better compression.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What are columnar databases? Here are 35 examples.",
        "url": "https://www.tinybird.co/blog-posts/what-is-a-columnar-database",
        "type": "article"
      },
      {
        "title": "Columnar Databases",
        "url": "https://www.techtarget.com/searchdatamanagement/definition/columnar-database",
        "type": "article"
      },
      {
        "title": "WWhat is a Columnar Database? (vs. Row-oriented Database)",
        "url": "https://www.youtube.com/watch?v=1MnvuNg33pA",
        "type": "video"
      }
    ]
  },
  "QYR8ESN7xhi4ZxcoiZbgn": {
    "title": "Cassandra",
    "description": "Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of structured data across multiple commodity servers. It provides high availability with no single point of failure, offering linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure. Cassandra uses a masterless ring architecture, where all nodes are equal, allowing for easy data distribution and replication. It supports flexible data models and can handle both unstructured and structured data. Cassandra excels in write-heavy environments and is particularly suitable for applications requiring high throughput and low latency. Its data model is based on wide column stores, offering a more complex structure than key-value stores. Widely used in big data applications, Cassandra is known for its ability to handle massive datasets while maintaining performance and reliability.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Apache Cassandra",
        "url": "https://cassandra.apache.org/_/index.html",
        "type": "article"
      },
      {
        "title": "article@Cassandra - Quick Guide",
        "url": "https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm",
        "type": "article"
      },
      {
        "title": "Explore top posts about Backend Development",
        "url": "https://app.daily.dev/tags/backend?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Apache Cassandra - Course for Beginners",
        "url": "https://www.youtube.com/watch?v=J-cSy5MeMOA",
        "type": "video"
      }
    ]
  },
  "ltZftFsiOo12AkQ-04N3B": {
    "title": "BigTable",
    "description": "Bigtable is a high-performance, scalable database that excels at capturing, processing, and analyzing data in real-time. It aggregates data as it's written, providing immediate insights into user behavior, A/B testing results, and engagement metrics. This real-time capability also fuels AI/ML models for interactive applications. Bigtable integrates seamlessly with both Dataflow, enriching streaming pipelines with low-latency lookups, and BigQuery, enabling real-time serving of analytics in user facing application and ad-hoc querying on the same data.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Bigtable: Fast, Flexible NoSQL",
        "url": "https://cloud.google.com/bigtable?hl=en#scale-your-latency-sensitive-applications-with-the-nosql-pioneer",
        "type": "article"
      },
      {
        "title": "Google Bigtable",
        "url": "https://www.techtarget.com/searchdatamanagement/definition/Google-BigTable",
        "type": "article"
      }
    ]
  },
  "Uho9OOWSG0bUpyH4P6hKk": {
    "title": "HBase",
    "description": "HBase is a column-oriented No-SQL database management system that runs on top of Hadoop Distributed File System (HDFS), a main component of Apache Hadoop. HBase provides a fault-tolerant way of storing sparse data sets, which are common in many big data use cases. It is well suited for real-time data processing or random read/write access to large volumes of data. HBase applications are written in Java™ much like a typical Apache MapReduce application.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Apacha HBase?",
        "url": "https://hbase.apache.org/",
        "type": "article"
      },
      {
        "title": "What is HBase?",
        "url": "https://www.ibm.com/think/topics/hbase",
        "type": "article"
      },
      {
        "title": "Apache HBase",
        "url": "https://en.wikipedia.org/wiki/Apache_HBase",
        "type": "article"
      }
    ]
  },
  "W6RnhoD7fW2xzVwnyJEDr": {
    "title": "Graph",
    "description": "In a graph database, each node is a record and each arc is a relationship between two nodes. Graph databases are optimized to represent complex relationships with many foreign keys or many-to-many relationships.\n\nGraphs databases offer high performance for data models with complex relationships, such as a social network. They are relatively new and are not yet widely-used; it might be more difficult to find development tools and resources. Many graphs can only be accessed with REST APIs.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is a Graph database?",
        "url": "https://aws.amazon.com/nosql/graph/",
        "type": "article"
      },
      {
        "title": "What is A Graph Database? A Beginner's Guide",
        "url": "https://www.datacamp.com/blog/what-is-a-graph-database",
        "type": "article"
      },
      {
        "title": "Graph database",
        "url": "https://en.wikipedia.org/wiki/Graph_database",
        "type": "article"
      },
      {
        "title": "Introduction to NoSQL",
        "url": "https://www.youtube.com/watch?v=qI_g07C_Q5I",
        "type": "video"
      }
    ]
  },
  "TG63YRbSKL1F9vlUVF1VY": {
    "title": "Neo4j",
    "description": "Neo4j is a highly popular open-source graph database designed to store, manage, and query data as interconnected nodes and relationships. Unlike traditional relational databases that use tables and rows, Neo4j uses a graph model where data is represented as nodes (entities) and edges (relationships), allowing for highly efficient querying of complex, interconnected data. It supports Cypher, a declarative query language specifically designed for graph querying, which simplifies operations like traversing relationships and pattern matching. Neo4j is well-suited for applications involving complex relationships, such as social networks, recommendation engines, and fraud detection, where understanding and leveraging connections between data points is crucial.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Neo4j Website",
        "url": "https://neo4j.com",
        "type": "article"
      },
      {
        "title": "Explore top posts about Backend Development",
        "url": "https://app.daily.dev/tags/backend?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Neo4j in 100 Seconds",
        "url": "https://www.youtube.com/watch?v=T6L9EoBy8Zk",
        "type": "video"
      },
      {
        "title": "Neo4j Course for Beginners",
        "url": "https://www.youtube.com/watch?v=_IgbB24scLI",
        "type": "video"
      }
    ]
  },
  "atAK4zGXIbxZvfBTzFEIe": {
    "title": "Neptune",
    "description": "Amazon Neptune is a fully managed graph database service provided by Amazon Web Services (AWS). It's designed to store and navigate highly connected data, supporting both property graph and RDF (Resource Description Framework) models. Neptune uses graph query languages like Gremlin and SPARQL, making it suitable for applications involving complex relationships, such as social networks, recommendation engines, fraud detection systems, and knowledge graphs. It offers high availability, with replication across multiple Availability Zones, and supports up to 15 read replicas for improved performance. Neptune integrates with other AWS services, provides encryption at rest and in transit, and offers fast recovery from failures. Its scalability and performance make it valuable for handling large-scale, complex data relationships in enterprise-level applications.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "AWS Neptune",
        "url": "https://aws.amazon.com/neptune/",
        "type": "article"
      },
      {
        "title": "Setting Up Amazon Neptune Graph Database",
        "url": "https://cliffordedsouza.medium.com/setting-up-amazon-neptune-graph-database-2b73512a7388",
        "type": "article"
      },
      {
        "title": "Getting Started with Neptune Serverless",
        "url": "https://www.youtube.com/watch?v=b04-jjM9t4g",
        "type": "video"
      }
    ]
  },
  "fSlBjoNVKstJjWO7rS69V": {
    "title": "Key-Value",
    "description": "Key value databases, also known as key value stores, are NoSQL database types where data is stored as key value pairs and optimized for reading and writing that data. The data is fetched by a unique key or a number of unique keys to retrieve the associated value with each key. Both keys and values can be anything, ranging from simple objects to complex compound objects. Key-value databases are highly partitionable and allow horizontal scaling at a level that other types of databases cannot achieve.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is a Key Value Database? - AWS",
        "url": "https://aws.amazon.com/nosql/key-value/",
        "type": "article"
      },
      {
        "title": "What Is A Key-Value Database? - MongoDB",
        "url": "https://www.mongodb.com/resources/basics/databases/key-value-database",
        "type": "article"
      }
    ]
  },
  "dW_eC4vR8BrvKG9wxmEBc": {
    "title": "Redis",
    "description": "Redis is an open-source, in-memory data structure store known for its speed and versatility. It supports various data types, including strings, lists, sets, hashes, and sorted sets, and provides functionalities such as caching, session management, real-time analytics, and message brokering. Redis operates as a key-value store, allowing for rapid read and write operations, and is often used to enhance performance and scalability in applications. It supports persistence options to save data to disk, replication for high availability, and clustering for horizontal scaling. Redis is widely used for scenarios requiring low-latency access to data and high-throughput performance.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Redis Crash Course",
        "url": "https://www.youtube.com/watch?v=XCsS_NVAa1g",
        "type": "course"
      },
      {
        "title": "Visit Dedicated Redis Roadmap",
        "url": "https://roadmap.sh/redis",
        "type": "article"
      },
      {
        "title": "Redis",
        "url": "https://redis.io/",
        "type": "article"
      },
      {
        "title": "Redis Documentation",
        "url": "https://redis.io/docs/latest/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Redis",
        "url": "https://app.daily.dev/tags/redis?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Redis in 100 Seconds",
        "url": "https://www.youtube.com/watch?v=G1rOthIU-uo",
        "type": "video"
      }
    ]
  },
  "KYUh29Ok1aeOviboGDS_i": {
    "title": "Memcached",
    "description": "Memcached (pronounced variously mem-cash-dee or mem-cashed) is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached is free and open-source software, licensed under the Revised BSD license. Memcached runs on Unix-like operating systems (Linux and macOS) and on Microsoft Windows. It depends on the `libevent` library. Memcached's APIs provide a very large hash table distributed across multiple machines. When the table is full, subsequent inserts cause older data to be purged in the least recently used (LRU) order. Applications using Memcached typically layer requests and additions into RAM before falling back on a slower backing store, such as a database.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "memcached/memcached",
        "url": "https://github.com/memcached/memcached#readme",
        "type": "opensource"
      },
      {
        "title": "Memcached Tutorial",
        "url": "https://www.tutorialspoint.com/memcached/index.htm",
        "type": "article"
      },
      {
        "title": "Redis vs Memcached",
        "url": "https://www.youtube.com/watch?v=Gyy1SiE8avE",
        "type": "video"
      }
    ]
  },
  "BDfpCDOxXZ-Tp0Abj_CVW": {
    "title": "DynamoDB",
    "description": "Amazon DynamoDB is a fully managed NoSQL database solution that provides fast and predictable performance with seamless scalability. It is a key-value and document database that delivers single-digit millisecond performance at any scale. DynamoDB can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second. It maintains high durability of data via automatic replication across three different zones in an Amazon defined region.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Amazon DynamoDB",
        "url": "https://aws.amazon.com/dynamodb/",
        "type": "article"
      }
    ]
  },
  "dc3lJI27hJ3zZ45UCVqM1": {
    "title": "What is Data Warehouse?",
    "description": "**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "What Is a Data Warehouse?",
        "url": "https://www.oracle.com/database/what-is-a-data-warehouse/",
        "type": "article"
      },
      {
        "title": "What is a Data Warehouse?",
        "url": "https://www.youtube.com/watch?v=k4tK2ttdSDg",
        "type": "video"
      }
    ]
  },
  "J854xPM1X0BWlhtJw7Hs_": {
    "title": "Data Warehousing Architectures",
    "description": "Data Warehousing Architectures refers to the different systems and solutions for storing data. Options include traditional data warehouse, data marts, data lakes and data mesh architectures.",
    "links": []
  },
  "ArOoKuf9scAURs8NRjAru": {
    "title": "Data Warehouse",
    "description": "**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "What Is a Data Warehouse?",
        "url": "https://www.oracle.com/database/what-is-a-data-warehouse/",
        "type": "article"
      },
      {
        "title": "@hat is a Data Warehouse?",
        "url": "https://www.youtube.com/watch?v=k4tK2ttdSDg",
        "type": "video"
      }
    ]
  },
  "Je2in1n8bMaknyeH79Zbv": {
    "title": "Google BigQuery",
    "description": "BigQuery is a managed, serverless data warehouse product by Google, offering scalable analysis over large quantities of data. It is a Platform as a Service (PaaS) that supports querying using a dialect of SQL. BigQuery is NoOps, meaning there is no infrastructure to manage and you don't need a database administrator. BigQuery lets you focus on analyzing data to find meaningful insights while using familiar SQL and built-in machine learning at unmatched price-performance.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "BigQuery overview",
        "url": "https://cloud.google.com/bigquery/docs/introduction",
        "type": "article"
      },
      {
        "title": "From data warehouse to autonomous data and AI platform",
        "url": "https://cloud.google.com/bigquery",
        "type": "article"
      },
      {
        "title": "What is BigQuery?",
        "url": "https://www.youtube.com/watch?v=d3MDxC_iuaw",
        "type": "video"
      }
    ]
  },
  "W3l1_66fsIqR3MqgBJUmU": {
    "title": "Snowflake",
    "description": "Snowflake is a cloud-based data platform that provides a data warehouse as a service. It allows organizations to store, analyze, and share data, offering features like data engineering, data governance, and collaboration capabilities. Snowflake is known for its scalability, ease of use, and ability to handle diverse workloads, including data warehousing, data lakes, and machine learning.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Snowflake Docs",
        "url": "https://docs.snowflake.com/",
        "type": "article"
      },
      {
        "title": "Snowflake in 20 minutes",
        "url": "https://docs.snowflake.com/en/user-guide/tutorials/snowflake-in-20minutes",
        "type": "article"
      },
      {
        "title": "Snowflake Tutorial For Beginners: From Architecture to Running Databases",
        "url": "https://www.datacamp.com/tutorial/introduction-to-snowflake-for-beginners",
        "type": "article"
      },
      {
        "title": "Learn Snowflake in 2 Hours",
        "url": "https://www.youtube.com/watch?v=mP3QbYURT9k",
        "type": "video"
      }
    ]
  },
  "omrg8QcYmTdQLBKV47b7o": {
    "title": "Amazon Redshift",
    "description": "Amazon Redshift is a cloud-based data warehouse service from Amazon that lets you store and analyze large amounts of data quickly. It’s designed for running complex queries on huge datasets, so businesses can use it to turn raw data into useful reports and insights. You can load data into Redshift from many sources, and then use SQL to explore it, just like you would with a regular database — but it’s optimized to handle much bigger data and run faster.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Amazon Redshift",
        "url": "https://aws.amazon.com/redshift/",
        "type": "article"
      },
      {
        "title": "Getting Started with Amazon Redshift - AWS Online Tech Talks",
        "url": "https://www.youtube.com/watch?v=dfo4J5ZhlKI",
        "type": "video"
      }
    ]
  },
  "c6Pf3kFcC4iV4a7mPc-WH": {
    "title": "Data Mart",
    "description": "A data mart is a subset of a data warehouse, focused on a specific business function or department. A data mart is streamlined for quicker querying and a more straightforward setup, catering to the specialized needs of a particular team, or function. Data marts only hold data relevant to a specific department or business unit, enabling quicker access to specific datasets, and simpler management\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is a Data Mart?",
        "url": "https://www.ibm.com/think/topics/data-mart",
        "type": "article"
      },
      {
        "title": "WData Mart vs Data Warehouse: a Detailed Comparison",
        "url": "https://www.datacamp.com/blog/data-mart-vs-data-warehouse",
        "type": "article"
      },
      {
        "title": "Data Lake VS Data Warehouse VS Data Marts",
        "url": "https://www.youtube.com/watch?v=w9-WoReNKHk",
        "type": "video"
      }
    ]
  },
  "y0Lxz_wVyQ6lr1hvCsufa": {
    "title": "Data Lake",
    "description": "**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "Data Lake Definition",
        "url": "https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-a-data-lake",
        "type": "article"
      },
      {
        "title": "What is a Data Lake?",
        "url": "https://www.youtube.com/watch?v=LxcH6z8TFpI",
        "type": "video"
      }
    ]
  },
  "fhfyoWekmYvEs-jdP2mJo": {
    "title": "Databricks Delta Lake",
    "description": "Delta Lake is the optimized storage layer that provides the foundation for tables in a lakehouse on Databricks. Delta Lake is open source software that extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling. Delta Lake is fully compatible with Apache Spark APIs, and was developed for tight integration with Structured Streaming, allowing you to easily use a single copy of data for both batch and streaming operations and providing incremental processing at scale.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Delta Lake in Databricks?",
        "url": "https://docs.databricks.com/aws/en/delta",
        "type": "article"
      },
      {
        "title": "Delta Table in Databricks: A Complete Guide",
        "url": "https://www.datacamp.com/tutorial/delta-table-in-databricks",
        "type": "article"
      },
      {
        "title": "The Delta Lake Series — Fundamentals and Performance",
        "url": "https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance",
        "type": "article"
      },
      {
        "title": "Delta Lake",
        "url": "https://www.databricks.com/resources/demos/videos/lakehouse-platform/delta-lake",
        "type": "video"
      }
    ]
  },
  "Pf0_CBGkmSEfWDQ2_iFXr": {
    "title": "Snowflake",
    "description": "Snowflake is a cloud-based data platform that provides a data warehouse as a service. It allows organizations to store, analyze, and share data, offering features like data engineering, data governance, and collaboration capabilities. Snowflake is known for its scalability, ease of use, and ability to handle diverse workloads, including data warehousing, data lakes, and machine learning.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Snowflake Docs",
        "url": "https://docs.snowflake.com/",
        "type": "article"
      },
      {
        "title": "Snowflake in 20 minutes",
        "url": "https://docs.snowflake.com/en/user-guide/tutorials/snowflake-in-20minutes",
        "type": "article"
      },
      {
        "title": "Snowflake Tutorial For Beginners: From Architecture to Running Databases",
        "url": "https://www.datacamp.com/tutorial/introduction-to-snowflake-for-beginners",
        "type": "article"
      },
      {
        "title": "Learn Snowflake in 2 Hours",
        "url": "https://www.youtube.com/watch?v=mP3QbYURT9k",
        "type": "video"
      }
    ]
  },
  "senZEYC9k-C_C4EAYDNeU": {
    "title": "Onehouse",
    "description": "Onehouse Managed Lakehouse is a cloud-native SaaS product built on top of Apache Hudi. It replaces painful, inefficient do-iy-yourseld data lake management around file sizing, masking, deletion, clustering, access control, caching, etc. with foundational data infrastructure as a service, to ingest, store, optimize and transform your data on industry-leading open data formats.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Onehouse",
        "url": "https://www.onehouse.ai/",
        "type": "article"
      }
    ]
  },
  "D7qtosIbsQuIY3OWl_Hwc": {
    "title": "Data Mesh",
    "description": "A data mesh is a modern approach to data architecture that shifts data management from a centralized model to a decentralized one. It emphasizes domain-oriented ownership, where data management aligns with specific business areas. This alignment makes data operations more scalable and flexible, leveraging the knowledge and expertise of those closest to the data. Data mesh is defined by four principles: data domains, data products, self-serve data platform, and federated computational governance.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What Is a Data Mesh? - AWS",
        "url": "https://aws.amazon.com/what-is/data-mesh",
        "type": "article"
      },
      {
        "title": "What Is a Data Mesh? - Datacamp",
        "url": "https://www.datacamp.com/blog/data-mesh",
        "type": "article"
      },
      {
        "title": "Data Mesh Architecture",
        "url": "https://www.datamesh-architecture.com/",
        "type": "video"
      }
    ]
  },
  "-x3QLMYhC67VJQ6EW6BrJ": {
    "title": "Data Fabric",
    "description": "A data fabric is a single environment consisting of a unified architecture with services and technologies running on it that architecture that helps a company manage their data. It enables accessing, ingesting, integrating, and sharing data in a environment where the data can be batched or streamed and be in the cloud or on-prem. The ultimate goal of data fabric is to use all your data to gain better insights into your company and make better business decisions. A data fabric includes building blocks such as data pipeline, data access, data lake, data store, data policy, ingestion framework, and data visualization. These building blocks would be used to build platforms or “products” such as a client data integration platform, data hub, governance framework, and a global semantic layer, giving you centralized governance and standardization\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is a data fabric?",
        "url": "http://ibm.com/think/topics/data-fabric",
        "type": "article"
      },
      {
        "title": "Data Fabric defined",
        "url": "https://www.jamesserra.com/archive/2021/06/data-fabric-defined/",
        "type": "article"
      },
      {
        "title": "How Data Fabric Can Optimize Data Delivery",
        "url": "https://www.gartner.com/en/data-analytics/topics/data-fabric",
        "type": "article"
      }
    ]
  },
  "OiWleAdMbPtisrJpk2eSJ": {
    "title": "Data Hub",
    "description": "A **data hub** is an architecture that provides a central point for the flow of data between multiple sources and applications, enabling organizations to collect, integrate, and manage data efficiently. Unlike traditional data storage solutions, a data hub’s purpose focuses on data integration and accessibility. The design supports real-time data exchange, which makes accessing, analyzing, and acting on the data faster and easier.\n\nA data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Data hub",
        "url": "https://en.wikipedia.org/wiki/Data_hub",
        "type": "article"
      },
      {
        "title": "What is a Data Hub? Definition, 7 Key Benefits & Why You Might Need One",
        "url": "https://www.cdata.com/blog/what-is-a-data-hub",
        "type": "article"
      }
    ]
  },
  "14CycunRC1p2qTRn-ncoy": {
    "title": "Metadata-first Architecture",
    "description": "",
    "links": []
  },
  "ZnGX8pg4GagdSalg_P0oq": {
    "title": "Serverless Options",
    "description": "Serverless data storage involves using cloud provider services for databases and object storage that automatically scale infrastructure and implement a consumption-based, pay-as-you-go model, eliminating the need for developers to manage, provision, or maintain any physical or virtual servers. This approach simplifies development, reduces operational overhead, and offers cost-effectiveness by charging only for the resources used, allowing teams to focus on applications rather than infrastructure management.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What Is Serverless Computing?",
        "url": "https://www.ibm.com/think/topics/serverless",
        "type": "article"
      }
    ]
  },
  "lDeSL9qvgQgyAMcWXF7Fr": {
    "title": "Cloud Computing",
    "description": "**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "Cloud Computing - IBM",
        "url": "https://www.ibm.com/think/topics/cloud-computing",
        "type": "article"
      },
      {
        "title": "What is Cloud Computing? - Azure",
        "url": "https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-cloud-computing",
        "type": "article"
      },
      {
        "title": "What is Cloud Computing? - Amazon Web Services",
        "url": "https://www.youtube.com/watch?v=mxT233EdY5c",
        "type": "video"
      }
    ]
  },
  "YLfyb_ycgz1hu0yW8SPNE": {
    "title": "Cloud Architectures",
    "description": "Cloud architecture refers to how various cloud technology components, such as hardware, virtual resources, software capabilities, and virtual network systems interact and connect to create cloud computing environments. Cloud architecture dictates how components are integrated so that you can pool, share, and scale resources over a network. It acts as a blueprint that defines the best way to strategically combine resources to build a cloud environment for a specific business need.\n\nCloud architecture components can included, among others:\n\n*   A frontend platform\n*   A backend platform\n*   A cloud-based delivery model\n*   A network (internet, intranet, or intercloud)\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is cloud architecture? - Google",
        "url": "https://cloud.google.com/learn/what-is-cloud-architecture",
        "type": "article"
      },
      {
        "title": "WWhat is Cloud Architecture and Common Models?",
        "url": "https://www.youtube.com/watch?v=zTP-bx495hU",
        "type": "video"
      }
    ]
  },
  "AHLsBfPfBJOhLlJ-64GcK": {
    "title": "Amazon EC2 ( Compute)",
    "description": "Amazon Elastic Compute Cloud (EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. EC2 enables you to scale your compute capacity, develop and deploy applications faster, and run applications on AWS's reliable computing environment. You have the control of your computing resources and can access various configurations of CPU, Memory, Storage, and Networking capacity for your instances.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "EC2 - User Guide",
        "url": "https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html",
        "type": "article"
      },
      {
        "title": "Introduction to Amazon EC2",
        "url": "https://www.youtube.com/watch?v=eaicwmnSdCs",
        "type": "video"
      }
    ]
  },
  "tbut25IZI2aU7TkI9fFYV": {
    "title": "S3 (Storage)",
    "description": "Amazon S3 (Simple Storage Service) is an object storage service offered by Amazon Web Services (AWS). It provides scalable, secure and durable storage on the internet. Designed for storing and retrieving any amount of data from anywhere on the web, it is a key tool for many companies in the field of data storage, including mobile applications, websites, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "S3",
        "url": "https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html",
        "type": "article"
      }
    ]
  },
  "GtFk7phYGfXUhxanicYNQ": {
    "title": "Amazon RDS (Database)",
    "description": "Amazon RDS (Relational Database Service) is a web service from Amazon Web Services. It's designed to simplify the setup, operation, and scaling of relational databases in the cloud. This service provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks. RDS supports six database engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. These engines give you the ability to run instances ranging from 5GB to 6TB of memory, accommodating your specific use case. It also ensures the database is up-to-date with the latest patches, automatically backs up your data and offers encryption at rest and in transit.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Amazon RDS",
        "url": "https://aws.amazon.com/rds/",
        "type": "article"
      }
    ]
  },
  "glue-etl@nD36-PXHzOXePM7j9u_O_.md": {
    "title": "Glue (ETL)",
    "description": "",
    "links": []
  },
  "-yi-xk-kv0njW9GdytiAQ": {
    "title": "Azure Virtual Machines",
    "description": "Azure Virtual Machines (VMs) enable virtualization without requiring hardware investments. They provide customizable environments for development, testing, and cloud applications so you can run different operating systems like Ubuntu on a Windows host based on your needs. One of the key advantages of Azure VMs is the pay-as-you-go pricing model. It allows you to scale resources up or down as needed, ensuring cost efficiency without wasting resources.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Azure Virtual Machines",
        "url": "https://azure.microsoft.com/en-us/products/virtual-machines",
        "type": "article"
      },
      {
        "title": "Virtual Machines in Azure",
        "url": "https://learn.microsoft.com/en-us/azure/virtual-machines/overview",
        "type": "article"
      },
      {
        "title": "AVirtual Machines in Azure | Beginner's Guide",
        "url": "https://www.youtube.com/watch?v=_abaWXoQFZU",
        "type": "video"
      }
    ]
  },
  "gzbEGCUwMsD1gL4nW668g": {
    "title": "Azure Blob Storage",
    "description": "Azure Blob Storage is Microsoft's object storage solution for the cloud. “Blob” stands for Binary Large Object, a term used to describe storage for unstructured data like text, images, and video. Azure Blob Storage is Microsoft Azure’s solution for storing these blobs in the cloud. It offers flexible storage—you only pay based on your usage. Depending on the access speed you need for your data, you can choose from various storage tiers (hot, cool, and archive). Being cloud-based, it is scalable, secure, and easy to manage.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Azure Blob Storage",
        "url": "https://azure.microsoft.com/en-us/products/storage/blobs",
        "type": "article"
      },
      {
        "title": "Introduction to Azure Blob Storage",
        "url": "https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction",
        "type": "article"
      },
      {
        "title": "A Beginners Guide to Azure Blob Storage",
        "url": "https://www.youtube.com/watch?v=ah1XqItWkuc&t=300s",
        "type": "video"
      }
    ]
  },
  "iIZ3g70KRwEJCBNaONd2d": {
    "title": "Azure SQL Database",
    "description": "Azure SQL Database is a fully managed Platform as a Service (PaaS) offering. It abstracts the underlying infrastructure, enabling developers to focus on building and deploying applications without worrying about database maintenance tasks.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Azure SQL Database",
        "url": "https://azure.microsoft.com/en-us/products/azure-sql/database",
        "type": "article"
      },
      {
        "title": "What is Azure SQL Database?",
        "url": "https://learn.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview?view=azuresql",
        "type": "article"
      },
      {
        "title": "Azure SQL Database: Step-by-Step Setup and Management",
        "url": "https://www.datacamp.com/tutorial/azure-sql-database",
        "type": "article"
      },
      {
        "title": "Azure SQL for Beginners",
        "url": "https://www.youtube.com/playlist?list=PLlrxD0HtieHi5c9-i_Dnxw9vxBY-TqaeN",
        "type": "video"
      }
    ]
  },
  "BNGdJSmrNE90rwPa4JoWj": {
    "title": "Data Factory (ETL)",
    "description": "Data Factory, most commonly referring to Microsoft's Azure Data Factory, is a cloud-based data integration service that allows you to create, schedule, and orchestrate workflows to move and transform data from various sources into a centralized location for analysis. It provides tools for building Extract, Transform, and Load (ETL) pipelines, enabling businesses to prepare data for analytics, business intelligence, and other data-driven initiatives without extensive coding, thanks to its visual, code-free interface and native connectors.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "Microsoft Azure - Data Factory",
        "url": "https://www.coursera.org/learn/microsoft-azure---data-factory",
        "type": "course"
      },
      {
        "title": "What is Azure Data Factory?",
        "url": "https://learn.microsoft.com/en-us/azure/data-factory/introduction",
        "type": "article"
      },
      {
        "title": "Azure Data Factory Documentation",
        "url": "https://learn.microsoft.com/en-gb/azure/data-factory/",
        "type": "article"
      },
      {
        "title": "Azure Data Factory Documentation",
        "url": "https://learn.microsoft.com/en-gb/azure/data-factory/",
        "type": "article"
      }
    ]
  },
  "-cU86vJWJmlmPHXDCo31o": {
    "title": "Compute Engine (Compute)",
    "description": "Compute Engine is a computing and hosting service that lets you create and run virtual machines on Google infrastructure. Compute Engine offers scale, performance, and value that lets you easily launch large compute clusters on Google's infrastructure. There are no upfront investments, and you can run thousands of virtual CPUs on a system that offers quick, consistent performance. You can configure and control Compute Engine resources using the Google Cloud console, the Google Cloud CLI, or using a REST-based API. You can also use a variety of programming languages to run Compute Engine, including Python, Go, and Java.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "The Basics of Google Cloud Compute",
        "url": "https://www.cloudskillsboost.google/course_templates/754",
        "type": "course"
      },
      {
        "title": "Compute Engine overview",
        "url": "https://cloud.google.com/compute/docs/overview",
        "type": "article"
      },
      {
        "title": "WCompute Engine in a minute",
        "url": "https://www.youtube.com/watch?v=IuK4gQeHRcI",
        "type": "video"
      }
    ]
  },
  "2lqvArZdwRX0t3P3yovEH": {
    "title": "Google Cloud Storage",
    "description": "Google Cloud Storage (GCS) is a scalable, secure, and durable object storage service within Google Cloud Platform (GCP) designed for storing and retrieving unstructured data of any type or size. It allows users to store data in \"buckets\" and access it through APIs, web interfaces, or command-line tools for applications, backups, media hosting, and big data analytics. GCS offers different storage classes to optimize costs based on data access frequency, strong security with encryption, and high availability through redundant data storage across multiple locations.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Cloud Storage",
        "url": "https://cloud.google.com/storage",
        "type": "article"
      },
      {
        "title": "Google Cloud Storage",
        "url": "https://en.wikipedia.org/wiki/Google_Cloud_Storage",
        "type": "article"
      },
      {
        "title": "Cloud Storage in a minute",
        "url": "https://www.youtube.com/watch?v=wNOs3LlsH6k",
        "type": "article"
      }
    ]
  },
  "9-wQWQIdAxQmMaJC9ojPg": {
    "title": "Cloud SQL (Database)",
    "description": "Google Cloud SQL is a fully-managed, cost-effective and scalable database service that makes it easy to set-up, maintain, manage and administer MySQL, PostgreSQL, and SQL Server databases in the cloud. Hosted on Google Cloud Platform, Cloud SQL provides a database infrastructure for applications running anywhere.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Cloud SQL",
        "url": "https://www.cloudskillsboost.google/course_templates/701",
        "type": "course"
      },
      {
        "title": "Cloud SQL",
        "url": "https://cloud.google.com/sql",
        "type": "article"
      },
      {
        "title": "Cloud SQL overview",
        "url": "https://cloud.google.com/sql/docs/introduction",
        "type": "article"
      }
    ]
  },
  "YWgVUyIvBRW8eTVR5y73P": {
    "title": "Dataflow",
    "description": "Dataflow is a Google Cloud service that provides unified stream and batch data processing at scale. Typical use cases for Dataflow include Data movement,ETL processes, BI dashboarding, and applying ML in real time to streaming data.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Dataflow",
        "url": "https://cloud.google.com/products/dataflow",
        "type": "article"
      },
      {
        "title": "Dataflow",
        "url": "https://en.wikipedia.org/wiki/Google_Cloud_Dataflow",
        "type": "article"
      },
      {
        "title": "What is Google Dataflow",
        "url": "https://www.youtube.com/watch?v=KalJ0VuEM7s",
        "type": "video"
      }
    ]
  },
  "GN1Xh3kA25ge-wTbdiSio": {
    "title": "Types of Data Ingestion",
    "description": "The primary types of data ingestion are Batch, Streaming, and Hybrid. Batch ingestion processes data in large, scheduled chunks, suitable for non-time-sensitive tasks like monthly reports. Streaming (or Real-time) ingestion handles data as it arrives, ideal for time-sensitive applications such as fraud detection or IoT monitoring. Hybrid ingestion combines both methods, offering flexibility for diverse business needs.",
    "links": []
  },
  "f-a3Hy1ldnvSv8W2mFiJK": {
    "title": "Batch",
    "description": "Batch processing is a method in which large volumes of collected data are processed in chunks or batches. This approach is especially effective for resource-intensive jobs, repetitive tasks, and managing extensive datasets where real-time processing isn’t required. It is ideal for applications like data warehousing, ETL (Extract, Transform, Load), and large-scale reporting. Data batch processing is mainly automated, requiring minimal human interaction once the process is set up. Tasks are predefined, and the system executes them according to a scheduled timeline, typically during off-peak hours when computing resources are readily available.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Batch Processing?",
        "url": "https://aws.amazon.com/what-is/batch-processing/",
        "type": "article"
      },
      {
        "title": "Batch And Streaming Demystified For Unification",
        "url": "https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/",
        "type": "article"
      }
    ]
  },
  "4fugNG5sEDl0kgmN3Mezk": {
    "title": "Hybrid",
    "description": "Hybrid data ingestion combines aspects of both real-time and batch ingestion. This approach gives you the flexibility to adapt your data ingestion strategy as your needs evolve. For example, you could process data in real-time for critical applications and in batches for less time-sensitive tasks. Two common hybrid methods are Lambda architecture-based and micro-batching.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Data Ingestion: Types, Tools, and Real-Life Use Cases",
        "url": "https://estuary.dev/blog/data-ingestion/",
        "type": "article"
      },
      {
        "title": "Lambda Architecture",
        "url": "https://www.databricks.com/glossary/lambda-architecture",
        "type": "article"
      },
      {
        "title": "What is Micro Batching: A Comprehensive Guide 101",
        "url": "https://hevodata.com/learn/micro-batching/",
        "type": "article"
      }
    ]
  },
  "wwPO5Uc6qnwYgibrbPn7y": {
    "title": "Streaming",
    "description": "Streaming processing, also known as real-time processing, involves the immediate ingestion, as well as analysis, of data as it is generated, providing instantaneous insights and enabling timely decisions in time-sensitive applications like financial trading, medical monitoring, and autonomous vehicles. This differs from batch processing, which handles data in later batches, and typically involves continuous data streaming, low latency, and high availability to deliver immediate outcomes for critical tasks.",
    "links": []
  },
  "oqxNr0Lj34mgRi5Z5wJt_": {
    "title": "Realtime",
    "description": "Real-time processing, also known as streaming processing, involves the immediate ingestion, as well as analysis, of data as it is generated, providing instantaneous insights and enabling timely decisions in time-sensitive applications like financial trading, medical monitoring, and autonomous vehicles. This differs from batch processing, which handles data in later batches, and typically involves continuous data streaming, low latency, and high availability to deliver immediate outcomes for critical tasks.",
    "links": []
  },
  "fqTPu70MJyXBmqYUgCJ_r": {
    "title": "Data Pipelines",
    "description": "Data pipelines are a series of automated processes that transport and transform data from various sources to a destination for analysis or storage. They typically involve steps like data extraction, cleaning, transformation, and loading (ETL) into databases, data lakes, or warehouses. Pipelines can handle batch or real-time data, ensuring that large-scale datasets are processed efficiently and consistently. They play a crucial role in ensuring data integrity and enabling businesses to derive insights from raw data for reporting, analytics, or machine learning.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "What is a Data Pipeline? - IBM",
        "url": "https://www.ibm.com/topics/data-pipeline",
        "type": "article"
      },
      {
        "title": "What are Data Pipelines?",
        "url": "https://www.youtube.com/watch?v=oKixNpz6jNo",
        "type": "video"
      }
    ]
  },
  "nShDMih1HmubBczxu4cfU": {
    "title": "Extract Data",
    "description": "The first step in ETL processes involves extract data from data sources to a staging area. Data can come in various types and formats, from SQL or NoSQL databases and plan text to image and video files.",
    "links": []
  },
  "TjsxMNyWO3YGwg6zEIid4": {
    "title": "Transform Data",
    "description": "In the second step, ETL tools transform and consolidate the raw data in the staging area to prepare it for the target data warehouse. The data transformation phase is normally the most complex and prone to errors, as it can involved multiple transformations, including basic data cleaning operations, deduplication, cata casting, filtering, grouping, encrypting, and many more.",
    "links": []
  },
  "y5Aaxe-P68HC5kNsIi88q": {
    "title": "Load Data",
    "description": "In the third step, the transformed data is moved from the staging area into the targe data storage solution, such as a data warehouse or data lake. For most organizations, the data loading process is automated, well-defined, continuous and batch-driven.",
    "links": []
  },
  "vfO5Dz6ppsNtbGiQwpUs7": {
    "title": "Apache Airflow",
    "description": "Apache Airflow is an open-source tool that helps you schedule, organize, and monitor workflows. Think of it like a to-do list for your data tasks, but smarter — you can set tasks to run in a specific order, track their progress, and see what happens if something fails. It’s often used for automating data pipelines so that data moves, gets processed, and is ready for use without manual work.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Apache Airflow",
        "url": "https://airflow.apache.org/",
        "type": "article"
      }
    ]
  },
  "SgYLIkMtLVPlw8Qo5j0Fb": {
    "title": "dbt",
    "description": "dbt, also known as the data build tool, is designed to simplify the management of data warehouses and transform the data within. This is primarily the T, or transformation, within ELT (or sometimes ETL) processes. It allows for easy transition between data warehouse types, such as Snowflake, BigQuery, Postgres, or DuckDB. dbt also provides the ability to use SQL across teams of multiple users, simplifying interaction. In addition, dbt translates between SQL dialects as appropriate to connect to different data sources and warehouses.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "dbt Official Courses",
        "url": "https://learn.getdbt.com/catalog",
        "type": "course"
      },
      {
        "title": "dbt",
        "url": "https://www.getdbt.com/product/what-is-dbt",
        "type": "article"
      },
      {
        "title": "dbt Documentation",
        "url": "https://docs.getdbt.com/docs/build/documentation",
        "type": "article"
      }
    ]
  },
  "_IiKTZDF_b57l79X6lsq6": {
    "title": "Luigi",
    "description": "Luigi is a powerful, easy-to-use open-source framework for building data pipelines with Python. It handles dependency resolution, workflow management, visualization etc. Luigi helps to build the data pipeline, typically associated with long-running batch processes.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Luigi Docs",
        "url": "https://luigi.readthedocs.io/",
        "type": "article"
      },
      {
        "title": "Getting Started with Luigi—What, Why & How",
        "url": "https://medium.com/big-data-processing/getting-started-with-luigi-what-why-how-f8e639a1f2a5",
        "type": "article"
      }
    ]
  },
  "TAh4__7U58J7fduU9a1Ol": {
    "title": "Perfect",
    "description": "Prefect is an open-source orchestration engine that turns your Python functions into production-grade data pipelines with minimal friction. You can build and schedule workflows in pure Python—no DSLs or complex config files—and run them anywhere you can run Python. Prefect handles the heavy lifting for you out of the box: automatic state tracking, failure handling, real-time monitoring, and more.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Perfect Docs",
        "url": "https://docs.prefect.io/v3/get-started",
        "type": "article"
      },
      {
        "title": "Getting Started with Prefect",
        "url": "https://www.youtube.com/watch?v=D5DhwVNHWeU",
        "type": "video"
      }
    ]
  },
  "hB0y8A2U3owpAbTUb7LN5": {
    "title": "Cluster Computing Basics",
    "description": "Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities.",
    "links": []
  },
  "Ad10evrGQuYRl5GaMhQwu": {
    "title": "What is Cluster Computing",
    "description": "Cluster computing is a type of distributing computing where multiple computers are connected so they work together as a single system. By working together, a cluster of machines can address complex tasks with higher computational power and efficiency.\n\nThe term “cluster” refers to the network of linked computer systems programmed to perform the same task. Computing clusters typically consist of servers, workstations and personal computers (PCs) that communicate over a local area network (LAN) or a wide area network (WAN). Each computer or “node,” in a computer network has an operating system (OS) and a central processing unit (CPU) core that handles the tasks required for the software to run properly.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is cluster computing? - IBM",
        "url": "https://www.ibm.com/think/topics/cluster-computing",
        "type": "article"
      },
      {
        "title": "What is cluster computing? - AWS",
        "url": "https://aws.amazon.com/what-is/cluster-computing/",
        "type": "article"
      },
      {
        "title": "Computer cluster - Wikipedia",
        "url": "http://en.wikipedia.org/wiki/Computer_cluster",
        "type": "article"
      },
      {
        "title": "WUnderstand the Basic Cluster Concepts",
        "url": "https://www.youtube.com/watch?v=8BBDxzJL6fY",
        "type": "video"
      }
    ]
  },
  "1LLF4466grFDlT9p_WLsi": {
    "title": "Distributed File Systems",
    "description": "A Distributed File System (DFS) allows multiple computers to access and share files across a network as if they were stored on a single local machine. It distributes data across multiple servers, enhancing accessibility and data redundancy. This enables users to access files from various locations and devices, promoting collaboration and data availability.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is a Distributed File System (DFS)? A Complete Guide",
        "url": "http://starwindsoftware.com/blog/what-is-a-distributed-file-system-dfs-a-complete-guide/",
        "type": "article"
      }
    ]
  },
  "ccc6_SzDwXpCL1WbFuPNA": {
    "title": "Job Scheduling",
    "description": "A scheduling system manages and distributes computational jobs across multiple interconnected computers (a cluster) to optimize resource utilization and job completion. The goal is to efficiently allocate cluster resources (like processors and memory) to incoming jobs based on factors such as user priority, job requirements, and deadlines.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Job scheduler",
        "url": "https://en.wikipedia.org/wiki/Job_scheduler",
        "type": "article"
      },
      {
        "title": "Cluster Resources — Job Scheduling",
        "url": "https://supun-kamburugamuve.medium.com/cluster-resources-job-scheduling-bb63644476bc",
        "type": "article"
      }
    ]
  },
  "wpZfbIFtfiUSLMASk4t7f": {
    "title": "Cluster Management Tools",
    "description": "Cluster management software maximizes the work that a cluster of computers can perform. A cluster manager balances workload to reduce bottlenecks, monitors the health of the elements of the cluster, and manages failover when an element fails. A cluster manager can also help a system administrator to perform administration tasks on elements in the cluster.\n\nSome of the most popular Cluster Management Tools are Kubernetes and Apache Hadoop YARN.",
    "links": []
  },
  "I_IueX1DFp-LmBwr1-suX": {
    "title": "Kubernetes",
    "description": "Kubernetes is an [open source](https://github.com/kubernetes/kubernetes) container management platform, and the dominant product in this space. Using Kubernetes, teams can deploy images across multiple underlying hosts, defining their desired availability, deployment logic, and scaling logic in YAML. Kubernetes evolved from Borg, an internal Google platform used to provision and allocate compute resources (similar to the Autopilot and Aquaman systems of Microsoft Azure).\n\nThe popularity of Kubernetes has made it an increasingly important skill for the DevOps Engineer and has triggered the creation of Platform teams across the industry. These Platform engineering teams often exist with the sole purpose of making Kubernetes approachable and usable for their product development colleagues.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Kubernetes Website",
        "url": "https://kubernetes.io/",
        "type": "article"
      },
      {
        "title": "Kubernetes Documentation",
        "url": "https://kubernetes.io/docs/home/",
        "type": "article"
      },
      {
        "title": "Primer: How Kubernetes Came to Be, What It Is, and Why You Should Care",
        "url": "https://thenewstack.io/primer-how-kubernetes-came-to-be-what-it-is-and-why-you-should-care/",
        "type": "article"
      },
      {
        "title": "Kubernetes: An Overview",
        "url": "https://thenewstack.io/kubernetes-an-overview/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Kubernetes",
        "url": "https://app.daily.dev/tags/kubernetes?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Kubernetes Crash Course for Absolute Beginners",
        "url": "https://www.youtube.com/watch?v=s_o8dwzRlu4",
        "type": "video"
      }
    ]
  },
  "pjm_qShAiFk3JsX4Z2d8G": {
    "title": "Apache Hadoop YARN",
    "description": "Apache Hadoop YARN (Yet Another Resource Negotiator) is the part of Hadoop that manages resources and runs jobs on a cluster. It has a ResourceManager that controls all cluster resources and an ApplicationMaster for each job that schedules and runs tasks. YARN lets different tools like MapReduce and Spark share the same cluster, making it more efficient, flexible, and reliable.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Hadoop Yarn Tutorial",
        "url": "https://www.youtube.com/watch?v=6bIF9VwRwE0",
        "type": "video"
      }
    ]
  },
  "9lSjQBM2hWrkujxZjhQHE": {
    "title": "HDFS",
    "description": "HDFS (Hadoop Distributed File System) is Hadoop’s primary storage system. It is designed to reliably store data across a cluster of machines. Its architecture is set up for this type of access to large datasets and is optimized for fault tolerance, scalability, and data locality.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "HDFS Architecture Guide",
        "url": "https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html",
        "type": "article"
      },
      {
        "title": "Hadoop Distributed File System (HDFS)",
        "url": "https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs",
        "type": "article"
      },
      {
        "title": "What is Hadoop Distributed File System (HDFS)?",
        "url": "https://www.ibm.com/think/topics/hdfs",
        "type": "article"
      }
    ]
  },
  "03BHmPhYkZrJwRvQdmxxr": {
    "title": "Big Data Tools",
    "description": "Big data tools are specialized software and platforms designed to handle the massive volume, velocity, and variety of data that traditional data processing tools cannot effectively manage. These tools provide the infrastructure, frameworks, and capabilities to process, analyze, and extract meaningful knowledge from vast datasets. They are essential for modern data-driven organizations seeking to gain insights, make informed decisions, and achieve a competitive advantage.\n\nHadoop and Spark are two of the most prominent frameworks in big data they handle the processing of large-scale data in very different ways. While Hadoop can be credited with democratizing the distributed computing paradigm through a robust storage system called HDFS and a computational model called MapReduce, Spark is changing the game with its in-memory architecture and flexible programming model.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Big Data?",
        "url": "https://cloud.google.com/learn/what-is-big-data?hl=en",
        "type": "article"
      },
      {
        "title": "Hadoop vs Spark: Which Big Data Framework Is Right For You?",
        "url": "https://www.datacamp.com/blog/hadoop-vs-spark",
        "type": "article"
      },
      {
        "title": "introduction to Big Data with Spark and Hadoop",
        "url": "http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy",
        "type": "video"
      }
    ]
  },
  "0pH2U4GOj8zK3lgkh_r5M": {
    "title": "HDFS",
    "description": "HDFS (Hadoop Distributed File System) is Hadoop’s primary storage system. It is designed to reliably store data across a cluster of machines. Its architecture is set up for this type of access to large datasets and is optimized for fault tolerance, scalability, and data locality.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "HDFS Architecture Guide",
        "url": "https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html",
        "type": "article"
      },
      {
        "title": "Hadoop Distributed File System (HDFS)",
        "url": "https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs",
        "type": "article"
      },
      {
        "title": "What is Hadoop Distributed File System (HDFS)?",
        "url": "https://www.ibm.com/think/topics/hdfs",
        "type": "article"
      }
    ]
  },
  "__tWu5uZYnmnuR-qO9SOR": {
    "title": "MapReduce",
    "description": "MapReduce is a prominent data processing technique used by Data Analysts around the world. It allows them to handle large data sets with complex, unstructured data efficiently. MapReduce breaks down a big data problem into smaller sub-tasks (Map) and then takes those results to create an output in a more usable format (Reduce). This technique is particularly useful in conducting exploratory analysis, as well as in handling big data operations such as text processing, graph processing, or more complicated machine learning algorithms.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "MapReduce",
        "url": "https://www.databricks.com/glossary/mapreduce",
        "type": "article"
      },
      {
        "title": "What is Apache MapReduce?",
        "url": "https://www.ibm.com/topics/mapreduce",
        "type": "article"
      }
    ]
  },
  "KcW4z48pk2x6IjQhZs_Ub": {
    "title": "YARN",
    "description": "Apache Hadoop YARN (Yet Another Resource Negotiator) is the part of Hadoop that manages resources and runs jobs on a cluster. It has a ResourceManager that controls all cluster resources and an ApplicationMaster for each job that schedules and runs tasks. YARN lets different tools like MapReduce and Spark share the same cluster, making it more efficient, flexible, and reliable.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Hadoop Yarn Tutorial",
        "url": "https://www.youtube.com/watch?v=6bIF9VwRwE0",
        "type": "video"
      }
    ]
  },
  "qHMtJFYcGmESiz_VwRwiI": {
    "title": "Apache Spark",
    "description": "Apache Spark is an open-source distributed computing system designed for big data processing and analytics. It offers a unified interface for programming entire clusters, enabling efficient handling of large-scale data with built-in support for data parallelism and fault tolerance. Spark excels in processing tasks like batch processing, real-time data streaming, machine learning, and graph processing. It’s known for its speed, ease of use, and ability to process data in-memory, significantly outperforming traditional MapReduce systems. Spark is widely used in big data ecosystems for its scalability and versatility across various data processing tasks.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "ApacheSpark",
        "url": "https://spark.apache.org/documentation.html",
        "type": "article"
      },
      {
        "title": "Spark By Examples",
        "url": "https://sparkbyexamples.com",
        "type": "article"
      },
      {
        "title": "Explore top posts about Apache Spark",
        "url": "https://app.daily.dev/tags/spark?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "eTHitN2erd6z8-MZiXE9s": {
    "title": "Containers & Orchestration",
    "description": "**Containers** are lightweight, portable, and isolated environments that package applications and their dependencies, enabling consistent deployment across different computing environments. They encapsulate software code, runtime, system tools, libraries, and settings, ensuring that the application runs the same regardless of where it's deployed. Containers share the host operating system's kernel, making them more efficient than traditional virtual machines.\n\n**Orchestration** refers to the automated coordination and management of complex IT systems. It involves combining multiple automated tasks and processes into a single workflow to achieve a specific goal. Orchestration is one of the key components of any software development process and it should never be avoided nor preferred over manual configuration. As an automation practice, orchestration helps to remove the chance of human error from the different steps of the data engineering lifecycle. This is all to ensure efficient resource utilization and consistency.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What are Containers?",
        "url": "https://cloud.google.com/learn/what-are-containers",
        "type": "article"
      },
      {
        "title": "Containers - The New Stack",
        "url": "https://thenewstack.io/category/containers/",
        "type": "article"
      },
      {
        "title": "An Introduction to Data Orchestration: Process and Benefits",
        "url": "https://www.datacamp.com/blog/introduction-to-data-orchestration-process-and-benefits",
        "type": "article"
      },
      {
        "title": "What is Container Orchestration?",
        "url": "https://www.redhat.com/en/topics/containers/what-is-container-orchestration",
        "type": "article"
      },
      {
        "title": "What are Containers?",
        "url": "https://www.youtube.com/playlist?list=PLawsLZMfND4nz-WDBZIj8-nbzGFD4S9oz",
        "type": "video"
      },
      {
        "title": "Why You Need Data Orchestration",
        "url": "https://www.youtube.com/watch?v=ZtlS5-G-gng",
        "type": "video"
      }
    ]
  },
  "OQ3RqVgWEMxpAtrrjOG5U": {
    "title": "Docker",
    "description": "Docker is an open-source platform that automates the deployment, scaling, and management of applications using containerization technology. It enables developers to package applications with all their dependencies into standardized units called containers, ensuring consistent behavior across different environments. Docker provides a lightweight alternative to full machine virtualization, using OS-level virtualization to run multiple isolated systems on a single host. Its ecosystem includes tools for building, sharing, and running containers, such as Docker Engine, Docker Hub, and Docker Compose. Docker has become integral to modern DevOps practices, facilitating microservices architectures, continuous integration/deployment pipelines, and efficient resource utilization in both development and production environments.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Visit Dedicated Docker Roadmap",
        "url": "https://roadmap.sh/docker",
        "type": "article"
      },
      {
        "title": "Docker Documentation",
        "url": "https://docs.docker.com/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Docker",
        "url": "https://app.daily.dev/tags/docker?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Docker Tutorial",
        "url": "https://www.youtube.com/watch?v=RqTEHSBrYFw",
        "type": "video"
      },
      {
        "title": "Docker simplified in 55 seconds",
        "url": "https://youtu.be/vP_4DlOH1G4",
        "type": "video"
      }
    ]
  },
  "kcgDW6AFW7WXzXMTPE6J-": {
    "title": "Kubernetes",
    "description": "Kubernetes is an [open source](https://github.com/kubernetes/kubernetes) container management platform, and the dominant product in this space. Using Kubernetes, teams can deploy images across multiple underlying hosts, defining their desired availability, deployment logic, and scaling logic in YAML. Kubernetes evolved from Borg, an internal Google platform used to provision and allocate compute resources (similar to the Autopilot and Aquaman systems of Microsoft Azure).\n\nThe popularity of Kubernetes has made it an increasingly important skill for the DevOps Engineer and has triggered the creation of Platform teams across the industry. These Platform engineering teams often exist with the sole purpose of making Kubernetes approachable and usable for their product development colleagues.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Kubernetes Website",
        "url": "https://kubernetes.io/",
        "type": "article"
      },
      {
        "title": "Kubernetes Documentation",
        "url": "https://kubernetes.io/docs/home/",
        "type": "article"
      },
      {
        "title": "Primer: How Kubernetes Came to Be, What It Is, and Why You Should Care",
        "url": "https://thenewstack.io/primer-how-kubernetes-came-to-be-what-it-is-and-why-you-should-care/",
        "type": "article"
      },
      {
        "title": "Kubernetes: An Overview",
        "url": "https://thenewstack.io/kubernetes-an-overview/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Kubernetes",
        "url": "https://app.daily.dev/tags/kubernetes?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Kubernetes Crash Course for Absolute Beginners",
        "url": "https://www.youtube.com/watch?v=s_o8dwzRlu4",
        "type": "video"
      }
    ]
  },
  "8qEgXYZEbDWC73SQSflDY": {
    "title": "Google Cloud GKE",
    "description": "GKE - Google Kubernetes Engine\n------------------------------\n\nGoogle Kubernetes Engine (GKE) is a managed Kubernetes service provided by Google Cloud Platform. It allows organizations to deploy, manage, and scale containerized applications using Kubernetes orchestration. GKE automates cluster management tasks, including upgrades, scaling, and security patches, while providing integration with Google Cloud services. It offers features like auto-scaling, load balancing, and private clusters, enabling developers to focus on application development rather than infrastructure management.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "GKE",
        "url": "https://cloud.google.com/kubernetes-engine",
        "type": "article"
      },
      {
        "title": "What is Google Kubernetes Engine (GKE)?",
        "url": "https://www.youtube.com/watch?v=Rl5M1CzgEH4",
        "type": "video"
      }
    ]
  },
  "eVqcYI2Sy2Dldl3SfxB2C": {
    "title": "AWS EKS",
    "description": "Amazon Elastic Kubernetes Service (EKS) is a managed service that simplifies the deployment, management, and scaling of containerized applications using Kubernetes, an open-source container orchestration platform. EKS manages the Kubernetes control plane for the user, making it easy to run Kubernetes applications without the operational overhead of maintaining the Kubernetes control plane. With EKS, you can leverage AWS services such as Auto Scaling Groups, Elastic Load Balancer, and Route 53 for resilient and scalable application infrastructure. Additionally, EKS can support Spot and On-Demand instances use, and includes integrations with AWS App Mesh service and AWS Fargate for serverless compute.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Amazon Elastic Kubernetes Service (EKS)",
        "url": "https://aws.amazon.com/eks/",
        "type": "article"
      },
      {
        "title": "Concepts of Amazon EKS",
        "url": "https://docs.aws.amazon.com/eks/",
        "type": "article"
      }
    ]
  },
  "k2SJ4ELGa4B2ZERDAk1uj": {
    "title": "CI/CD",
    "description": "**Continuous Integration** is a software development method where team members integrate their work at least once daily. An automated build checks every integration to detect errors in this method. In Continuous Integration, the software is built and tested immediately after a code commit. In a large project with many developers, commits are made many times during the day. With each commit, code is built and tested.\n\n**Continuous Delivery** is a software engineering method in which a team develops software products in a short cycle. It ensures that software can be easily released at any time. The main aim of continuous delivery is to build, test, and release software with good speed and frequency. It helps reduce the cost, time, and risk of delivering changes by allowing for frequent updates in production.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is CI/CD? Continuous Integration and Continuous Delivery",
        "url": "https://www.guru99.com/continuous-integration.html",
        "type": "article"
      },
      {
        "title": "Continuous Integration vs Delivery vs Deployment",
        "url": "https://www.guru99.com/continuous-integration-vs-delivery-vs-deployment.html",
        "type": "article"
      },
      {
        "title": "CI/CD Pipeline: Learn with Example",
        "url": "https://www.guru99.com/ci-cd-pipeline.html",
        "type": "article"
      }
    ]
  },
  "IYIO4S3DO5xkLD__XT5Dp": {
    "title": "GitLab CI",
    "description": "GitLab offers a CI/CD service that can be used as a SaaS offering or self-managed using your own resources. You can use GitLab CI with any GitLab hosted repository, or any BitBucket Cloud or GitHub repository in the GitLab Premium self-managed, GitLab Premium SaaS and higher tiers.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "GitLab",
        "url": "https://gitlab.com/",
        "type": "article"
      },
      {
        "title": "GitLab Documentation",
        "url": "https://docs.gitlab.com/",
        "type": "article"
      },
      {
        "title": "Get Started with GitLab CI",
        "url": "https://docs.gitlab.com/ee/ci/quick_start/",
        "type": "article"
      },
      {
        "title": "Learn GitLab Tutorials",
        "url": "https://docs.gitlab.com/ee/tutorials/",
        "type": "article"
      },
      {
        "title": "GitLab CI/CD Examples",
        "url": "https://docs.gitlab.com/ee/ci/examples/",
        "type": "article"
      },
      {
        "title": "Explore top posts about GitLab",
        "url": "https://app.daily.dev/tags/gitlab?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "CewITBPtfVs32LD5Acb2E": {
    "title": "Circle CI",
    "description": "CircleCI is a CI/CD service that can be integrated with GitHub, BitBucket and GitLab repositories. The service that can be used as a SaaS offering or self-managed using your own resources.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "CircleCI",
        "url": "https://circleci.com/",
        "type": "article"
      },
      {
        "title": "CircleCI Documentation",
        "url": "https://circleci.com/docs",
        "type": "article"
      },
      {
        "title": "Configuration Tutorial",
        "url": "https://circleci.com/docs/config-intro",
        "type": "article"
      },
      {
        "title": "Explore top posts about CI/CD",
        "url": "https://app.daily.dev/tags/cicd?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "N8vpCfSdZCADwO_qceWBK": {
    "title": "GitHub Actions",
    "description": "GitHub Actions is a CI/CD tool integrated directly into GitHub, allowing developers to automate workflows, such as building, testing, and deploying code directly from their repositories. It uses YAML files to define workflows, which can be triggered by various events like pushes, pull requests, or on a schedule. GitHub Actions supports a wide range of actions and integrations, making it highly customizable for different project needs. It provides a marketplace with reusable workflows and actions contributed by the community. With its seamless integration with GitHub, developers can take advantage of features like matrix builds, secrets management, and environment-specific configurations to streamline and enhance their development and deployment processes.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "GitHub Actions Documentation",
        "url": "https://docs.github.com/en/actions",
        "type": "article"
      }
    ]
  },
  "PUzHbjwntTSj1REL_dAov": {
    "title": "ArgoCD",
    "description": "Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment. Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Argo CD - Argo Project",
        "url": "https://argo-cd.readthedocs.io/en/stable/",
        "type": "article"
      },
      {
        "title": "Explore top posts about ArgoCD",
        "url": "https://app.daily.dev/tags/argocd?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "ArgoCD Tutorial for Beginners",
        "url": "https://www.youtube.com/watch?v=MeU5_k9ssrs",
        "type": "video"
      },
      {
        "title": "What is ArgoCD",
        "url": "https://www.youtube.com/watch?v=p-kAqxuJNik",
        "type": "video"
      }
    ]
  },
  "dk5FQl7Pk3-O5eF7dKwmp": {
    "title": "Monitoring",
    "description": "Monitoring involves continuously observing and tracking the performance, availability, and health of systems, applications, and infrastructure. It typically includes collecting and analyzing metrics, logs, and events to ensure systems are operating within desired parameters. Monitoring helps detect anomalies, identify potential issues before they escalate, and provides insights into system behavior. It often involves tools and platforms that offer dashboards, alerts, and reporting features to facilitate real-time visibility and proactive management. Effective monitoring is crucial for maintaining system reliability, performance, and for supporting incident response and troubleshooting.\n\nA few popular tools are Prometheus, Sentry, Datadog, and NewRelic.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Top Monitoring Tools",
        "url": "https://thectoclub.com/tools/best-application-monitoring-software/",
        "type": "article"
      },
      {
        "title": "daily.dev Monitoring Feed",
        "url": "https://app.daily.dev/tags/monitoring",
        "type": "article"
      }
    ]
  },
  "3QsgoKKxAoyj2LWJ8ad-7": {
    "title": "Prometheus",
    "description": "Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Prometheus Website",
        "url": "https://prometheus.io/",
        "type": "article"
      },
      {
        "title": "Prometheus Documentation",
        "url": "https://prometheus.io/docs/introduction/overview/",
        "type": "article"
      },
      {
        "title": "Getting Started with Prometheus",
        "url": "https://prometheus.io/docs/tutorials/getting_started/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Prometheus",
        "url": "https://app.daily.dev/tags/prometheus?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "Zoa4JEGrSKjVwUNer4Go1": {
    "title": "Datadog",
    "description": "Datadog is a monitoring and analytics platform for large-scale applications. It encompasses infrastructure monitoring, application performance monitoring, log management, and user-experience monitoring. Datadog aggregates data across your entire stack with 400+ integrations for troubleshooting, alerting, and graphing.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Datadog",
        "url": "https://www.datadoghq.com/",
        "type": "article"
      },
      {
        "title": "Datadog Documentation",
        "url": "https://docs.datadoghq.com/",
        "type": "article"
      }
    ]
  },
  "i54fx-NV6nWzQVCdi0aKL": {
    "title": "Sentry",
    "description": "Sentry tracks your software performance, measuring metrics like throughput and latency, and displaying the impact of errors across multiple systems. Sentry captures distributed traces consisting of transactions and spans, which measure individual services and individual operations within those services.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Sentry",
        "url": "https://sentry.io",
        "type": "article"
      },
      {
        "title": "Sentry Documentation",
        "url": "https://docs.sentry.io/",
        "type": "article"
      }
    ]
  },
  "r1KmASWAa_MOqQOC9gvvF": {
    "title": "New Relic",
    "description": "New Relic is an observability platform that helps you build better software. You can bring in data from any digital source so that you can fully understand your system and how to improve it.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "New Relic",
        "url": "https://newrelic.com/",
        "type": "article"
      },
      {
        "title": "Learn New Relic",
        "url": "https://learn.newrelic.com/",
        "type": "article"
      },
      {
        "title": "Explore top posts about DevOps",
        "url": "https://app.daily.dev/tags/devops?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "DZoxLu-j1vq5leoXLRZqt": {
    "title": "Testing",
    "description": "Testing is a systematic process used to evaluate the functionality, performance, and quality of software or systems to ensure they meet specified requirements and standards. It involves various methodologies and levels, including unit testing (testing individual components), integration testing (verifying interactions between components), system testing (assessing the entire system's behavior), and acceptance testing (confirming it meets user needs). Testing can be manual or automated and aims to identify defects, validate that features work as intended, and ensure the system performs reliably under different conditions. Effective testing is critical for delivering high-quality software and mitigating risks before deployment.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Software Testing?",
        "url": "https://www.guru99.com/software-testing-introduction-importance.html",
        "type": "article"
      },
      {
        "title": "Testing Pyramid",
        "url": "https://www.browserstack.com/guide/testing-pyramid-for-test-automation",
        "type": "article"
      },
      {
        "title": "Explore top posts about Testing",
        "url": "https://app.daily.dev/tags/testing?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "NIG53tyoEiLtwf6LvBZId": {
    "title": "Integration Testing",
    "description": "Integration Testing is a type of testing where software modules are integrated logically and tested as a group. A typical software project consists of multiple software modules coded by different programmers. This testing level aims to expose defects in the interaction between these software modules when they are integrated. Integration Testing focuses on checking data communication amongst these modules.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Integration Testing Tutorial",
        "url": "https://www.guru99.com/integration-testing.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about Testing",
        "url": "https://app.daily.dev/tags/testing?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "8dXD4ddR_USEbAJhUMcB6": {
    "title": "Unit Testing",
    "description": "Unit testing is where individual **units** (modules, functions/methods, routines, etc.) of software are tested to ensure their correctness. This low-level testing ensures smaller components are functionally sound while taking the burden off of higher-level tests. Generally, a developer writes these tests during the development process and they are run as automated tests.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Unit Testing Tutorial",
        "url": "https://www.guru99.com/unit-testing-guide.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about Testing",
        "url": "https://app.daily.dev/tags/testing?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "What is Unit Testing?",
        "url": "https://youtu.be/3kzHmaeozDI",
        "type": "video"
      }
    ]
  },
  "mC9sWeC_wYHeJJHJAvxpI": {
    "title": "End-to-End Testing",
    "description": "End-to-end or (E2E) testing is a form of testing used to assert your entire application works as expected from start to finish or \"end-to-end\". E2E testing differs from unit testing in that it is completely decoupled from the underlying implementation details of your code. It is typically used to validate an application in a way that mimics the way a user would interact with it.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "End to End Testing",
        "url": "https://microsoft.github.io/code-with-engineering-playbook/automated-testing/e2e-testing/",
        "type": "article"
      },
      {
        "title": "End to End Testing: Importance, Process, Best Practices & Frameworks",
        "url": "https://testgrid.io/blog/end-to-end-testing-a-detailed-guide/",
        "type": "article"
      }
    ]
  },
  "E4ND5XaMDGDLtlV7wTzi6": {
    "title": "Functional Testing",
    "description": "Functional testing is a type of software testing that validates the software system against the functional requirements/specifications. The purpose of functional tests is to test each function of the software application by providing appropriate input and verifying the output against the functional requirements.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Functional Testing? Types & Examples",
        "url": "https://www.guru99.com/functional-testing.html",
        "type": "article"
      },
      {
        "title": "Functional Testing : A Detailed Guide",
        "url": "https://www.browserstack.com/guide/functional-testing",
        "type": "article"
      },
      {
        "title": "Explore top posts about Testing",
        "url": "https://app.daily.dev/tags/testing?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "5qe0q_llTzzNVudbONMYo": {
    "title": "A/B Testing",
    "description": "A/B testing is a way to compare two versions of something to see which one works better. You split your audience into two groups, one sees version A, the other sees version B — and then you measure which version gets better results, like more clicks, sales, or sign-ups. This helps you make decisions based on real data instead of guesses.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "A software engineer's guide to A/B testing",
        "url": "https://posthog.com/product-engineers/ab-testing-guide-for-engineers",
        "type": "article"
      },
      {
        "title": "A/B Testing for Beginners",
        "url": "https://www.youtube.com/watch?v=VpTlNRUcIDo",
        "type": "video"
      }
    ]
  },
  "qoMRpAITA7R_KOrwGDPAb": {
    "title": "Load Testing",
    "description": "Load Testing is a type of Performance Testing that determines the performance of a system, software product, or software application under real-life-based load conditions. Load testing determines the behavior of the application when multiple users use it at the same time. It is the response of the system measured under varying load conditions.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Load testing and Best Practices",
        "url": "https://loadninja.com/load-testing/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Load Testing",
        "url": "https://app.daily.dev/tags/load-testing?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "woa5K4Dt9L6aBzlJMNS31": {
    "title": "Smoke Testing",
    "description": "Smoke Testing is a software testing process that determines whether the deployed software build is stable or not. Smoke testing is a confirmation for QA team to proceed with further software testing. It consists of a minimal set of tests run on each build to test software functionalities.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Smoke Testing | Software Testing",
        "url": "https://www.guru99.com/smoke-testing.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about Testing",
        "url": "https://app.daily.dev/tags/testing?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "dAvizeYvv92KMeAvSDmey": {
    "title": "Messaging Systems",
    "description": "Messaging systems, commonly known as messaging queus, make it possible for applications to communicate asynchronously, by sending messages to each other via a queue. A message queue provides temporary storage between the sender and the receiver so that the sender can keep operating without interruption when the destination program is busy or not connected.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Messaging Queues",
        "url": "https://aws.amazon.com/message-queue/",
        "type": "article"
      },
      {
        "title": "Messaging Queues Tutorial",
        "url": "https://www.tutorialspoint.com/inter_process_communication/inter_process_communication_message_queues.htm",
        "type": "article"
      }
    ]
  },
  "1qju7UlcMo2Ebp4a3BGxH": {
    "title": "What and why use them?",
    "description": "In data engineering, messaging systems act as central brokers for data communication, allowing different applications and services to send and receive data in a decoupled, scalable, and fault-tolerant way. They are crucial for handling high-volume, real-time data streams, building resilient data pipelines, and enabling event-driven architectures by acting as buffers and communication channels between data producers and consumers. Key benefits include decoupling systems for agility, ensuring data reliability through queuing and retries, and horizontal scalability to manage growing data loads, while common examples include Apache Kafka and message queues like RabbitMQ and AWS SQS.",
    "links": []
  },
  "VefHaP7rIOcZVFzglyn66": {
    "title": "Async vs Sync Communication",
    "description": "Synchronous and asynchronous data refer to different approaches in data transmission and processing. **Synchronous** ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, **asynchronous** ingestion is a process where data is ingested without waiting for a response from the data source. Normally, data is queued in a buffer and sent in batches for efficiency.\n\nEach approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Synchronous And Asynchronous Data Transmission: The Differences And How to Use Them",
        "url": "https://www.computer.org/publications/tech-news/trends/synchronous-asynchronous-data-transmission",
        "type": "article"
      },
      {
        "title": "Synchronous vs Asynchronous Communication: What’s the Difference?",
        "url": "https://www.getguru.com/reference/synchronous-vs-asynchronous-communication",
        "type": "article"
      }
    ]
  },
  "IZvL-1Xi0R9IuwJ30FDm4": {
    "title": "Messages vs Streams",
    "description": "Messages and Streams are often used interchange‐ably but a subtle but essential differences exists between the two. A message is raw data communicated across two or more systems. Messages are discrete and singular signals in an event-driven system.\n\nBy contrast, a stream is an append-only log of event records. As events occur, streams are accumulated in an ordered sequence, using a timestamp or an ID to record events order. Streams are used when you need to analyze what happened over many events. Because of the append-only nature of streams, records in a stream are persisted over a long retention window—often weeks or months—allowing for complex operations on records such as aggregations on multiple records or the ability to rewind to a point in time within the stream.",
    "links": []
  },
  "yyJJGinOv3M21MFuqJs0j": {
    "title": "Best Practices",
    "description": "1.  **Ensure Reliability.** A robust messaging system must guarantee that messages aren’t lost, even during node failures or network issues. This means using acknowledgments, replication across multiple brokers, and durable storage on disk. These measures ensure that producers and consumers can recover seamlessly without data loss when something goes wrong.\n    \n2.  **Design for Scalability.** Scalability should be baked in from the start. Partition topics strategically to distribute load across brokers and consumer groups, enabling horizontal scaling.\n    \n3.  **Maintain Message Ordering.** For systems that depend on message sequence, ensure ordering within partitions and design producers to consistently route related messages to the same partition.\n    \n4.  **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations.\n    \n5.  **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.\n    \n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Best Practices for Message Queue Architecture",
        "url": "https://abhishek-patel.medium.com/best-practices-for-message-queue-architecture-f69d47e3565",
        "type": "article"
      }
    ]
  },
  "fTpx6m8U0506ZLCdDU5OG": {
    "title": "Apache Kafka",
    "description": "Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Apache Kafka",
        "url": "https://kafka.apache.org/quickstart",
        "type": "article"
      },
      {
        "title": "Apache Kafka Streams",
        "url": "https://docs.confluent.io/platform/current/streams/concepts.html",
        "type": "article"
      },
      {
        "title": "Kafka Streams Confluent",
        "url": "https://kafka.apache.org/documentation/streams/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Kafka",
        "url": "https://app.daily.dev/tags/kafka?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "Apache Kafka Fundamentals",
        "url": "https://www.youtube.com/watch?v=B5j3uNBH8X4",
        "type": "video"
      },
      {
        "title": "Kafka in 100 Seconds",
        "url": "https://www.youtube.com/watch?v=uvb00oaa3k8",
        "type": "video"
      }
    ]
  },
  "ERcgPTACqYo9BXoRdLjbd": {
    "title": "RabbitMQ",
    "description": "RabbitMQ is an open-source message broker that facilitates the exchange of messages between distributed systems using the Advanced Message Queuing Protocol (AMQP). It enables asynchronous communication by queuing and routing messages between producers and consumers, which helps decouple application components and improve scalability and reliability. RabbitMQ supports features such as message durability, acknowledgments, and flexible routing through exchanges and queues. It is highly configurable, allowing for various messaging patterns, including publish/subscribe, request/reply, and point-to-point communication. RabbitMQ is widely used in enterprise environments for handling high-throughput messaging and integrating heterogeneous systems.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "RabbitMQ Tutorials",
        "url": "https://www.rabbitmq.com/getstarted.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about RabbitMQ",
        "url": "https://app.daily.dev/tags/rabbitmq?ref=roadmapsh",
        "type": "article"
      },
      {
        "title": "RabbitMQ Tutorial - Message Queues and Distributed Systems",
        "url": "https://www.youtube.com/watch?v=nFxjaVmFj5E",
        "type": "video"
      },
      {
        "title": "RabbitMQ in 100 Seconds",
        "url": "https://m.youtube.com/watch?v=NQ3fZtyXji0",
        "type": "video"
      }
    ]
  },
  "uIU5Yncp6hGDcNO1fpjUS": {
    "title": "AWS SQS",
    "description": "Amazon Simple Queue Service (Amazon SQS) offers a secure, durable, and available hosted queue that lets you integrate and decouple distributed software systems and components. Amazon SQS offers common constructs such as dead-letter queues and cost allocation tags. It provides a generic web services API that you can access using any programming language that the AWS SDK supports.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Amazon Simple Queue Service",
        "url": "https://aws.amazon.com/sqs/",
        "type": "article"
      },
      {
        "title": "What is Amazon Simple Queue Service?",
        "url": "https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html",
        "type": "article"
      },
      {
        "title": "Amazon Simple Queue Service (SQS): A Comprehensive Tutorial",
        "url": "https://www.datacamp.com/tutorial/amazon-sqs",
        "type": "article"
      }
    ]
  },
  "uFeiTRobSymkvCinhwmZV": {
    "title": "AWS SNS",
    "description": "Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud. It provides developers with a highly scalable, flexible, and cost-effective capability to publish messages from an application and immediately deliver them to subscribers or other applications. It is designed to make web-scale computing easier for developers. Amazon SNS follows the “publish-subscribe” (pub-sub) messaging paradigm, with notifications being delivered to clients using a “push” mechanism that eliminates the need to periodically check or “poll” for new information and updates. With simple APIs requiring minimal up-front development effort, no maintenance or management overhead and pay-as-you-go pricing, Amazon SNS gives developers an easy mechanism to incorporate a powerful notification system with their applications.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Amazon Simple Notification Service (SNS) ",
        "url": "http://aws.amazon.com/sns/",
        "type": "article"
      },
      {
        "title": "Send Fanout Event Notifications",
        "url": "https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/",
        "type": "article"
      },
      {
        "title": "What is Pub/Sub Messaging?",
        "url": "https://aws.amazon.com/what-is/pub-sub-messaging/",
        "type": "article"
      }
    ]
  },
  "jgz7L8OSuqRNcf9buuMTj": {
    "title": "Infrastructure as Code - IaC",
    "description": "Infrastructure as code (IaC) is the ability to provision and support your computing infrastructure using code instead of manual processes and settings. Manual infrastructure management is time-consuming and prone to error—especially when you manage applications at scale. Infrastructure as code lets you define your infrastructure's desired state without including all the steps to get to that state. It automates infrastructure management so developers can focus on building and improving applications instead of managing environments. Organizations use infrastructure as code to control costs, reduce risks, and respond with speed to new business opportunities.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Infrastructure as Code?",
        "url": "https://aws.amazon.com/what-is/iac/",
        "type": "article"
      },
      {
        "title": "Infrastructure as Code",
        "url": "https://en.wikipedia.org/wiki/Infrastructure_as_code",
        "type": "article"
      },
      {
        "title": "What is Infrastructure as Code?",
        "url": "https://www.youtube.com/watch?v=zWw2wuiKd5o",
        "type": "video"
      }
    ]
  },
  "GyC2JctG-Gi0R_qx1lTeg": {
    "title": "Declarative vs Imperative",
    "description": "When it comes to Infrastructure as Code (IaC), there are two fundamental styles: imperative and declarative.\n\nIn **imperative IaC**, you specify a list of steps the IaC tool should follow to provision a new resource. You tell your IaC tool how to create each environment using a sequence of command imperatives. Imperative IaC can offer more flexibility as it allows you to dictate each step. However, this can result in increased complexity. Popular imperative IaC tools are Chef and Puppet\n\nIn **declarative IaC**, you specify the name and properties of the infrastructure resources you wish to provision, and then the IaC tool figures out how to achieve that end result on its own. You declare to your IaC tool what you want, but not how to get there. Declarative IaC, while less flexible, tends to be simpler and more manageable. Terraform is the most popular declarative IaC tool\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Infrastructure as Code: From Imperative to Declarative and Back Again",
        "url": "https://thenewstack.io/infrastructure-as-code-from-imperative-to-declarative-and-back-again/",
        "type": "article"
      },
      {
        "title": "Declarative vs Imperative Programming for Infrastructure as Code (IaC)",
        "url": "https://www.copado.com/resources/blog/declarative-vs-imperative-programming-for-infrastructure-as-code-iac",
        "type": "article"
      }
    ]
  },
  "9xoBZgKT9uAGsjc1soelY": {
    "title": "Idempotency",
    "description": "Idempotency is a crucial concept in IaC. An idempotent operation produces the same result regardless of how many times it’s executed. In the context of IaC, this means that applying the same configuration multiple times should not change the end state of the system. The role of idempotency in IaC scripts is to ensure consistency and prevent unintended side effects. For example, if a script to create a virtual machine (VM) is run twice, it should not create two VMs. Instead, it should recognize that the VM already exists and take no action.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Why idempotence was important to DevOps",
        "url": "https://dev.to/startpher/why-idempotence-was-important-to-devops-2jn3",
        "type": "article"
      },
      {
        "title": "Idempotency: The Secret to Seamless DevOps and Infrastructure",
        "url": "https://medium.com/@tiwari.sushil/idempotency-the-secret-to-seamless-devops-and-infrastructure-bf22e63e1be5",
        "type": "article"
      }
    ]
  },
  "Rzk6HlMosx3FN_JD5kELZ": {
    "title": "Reusability",
    "description": "One of the goals of Infrastructure as Code (IaC) is to creat modular, standardized units of code—like modules or templates that can be used across multiple projects, environments, and teams, embodying the \"Don't Repeat Yourself\" (DRY) principle. This approach significantly boosts efficiency, consistency, and maintainability, as it allows for rapid deployment of identical infrastructure patterns, enforces organizational standards, simplifies complex setups, and improves collaboration by providing shared, tested building blocks for infrastructure management.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Infrastructure as Code (IaC)?",
        "url": "https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac",
        "type": "article"
      }
    ]
  },
  "WUfJCLTajyLNK19gAAvoW": {
    "title": "Environmental Management",
    "description": "Environmental management, or Environment as Code (EaC) takes the concept of Infrastructure as Code (IaC) one step further. EaC applies DevOps principles to manage and automate entire software environments—including infrastructure, applications, and configurations—using code, making them reproducible, versionable, and reliable. It extends IaC by focusing not just on the underlying servers and networks but on the complete, connected system of services and applications that run on top of it. This approach helps increase efficiency, speeds up deployments, and provides a consistent, auditable process for creating and managing development, testing, and production environments.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "EWhat Is Environment as Code (EaaC)?",
        "url": "https://www.bunnyshell.com/blog/what-is-environment-as-code-eaac/",
        "type": "article"
      }
    ]
  },
  "N-xRhdOTHijAymcTWPXPJ": {
    "title": "Terraform",
    "description": "Terraform is an open-source infrastructure as code (IaC) tool developed by HashiCorp, used to define, provision, and manage cloud and on-premises infrastructure using declarative configuration files. It supports multiple cloud providers like AWS, Azure, and Google Cloud, as well as various services and platforms, enabling infrastructure automation across diverse environments. Terraform's state management and modular structure allow for efficient scaling, reusability, and version control of infrastructure. It is widely used for automating infrastructure provisioning, reducing manual errors, and improving infrastructure consistency and repeatability.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Complete Terraform Course",
        "url": "https://www.youtube.com/watch?v=7xngnjfIlK4",
        "type": "course"
      },
      {
        "title": "Visit Dedicated Terraform Roadmap",
        "url": "https://roadmap.sh/terraform",
        "type": "article"
      },
      {
        "title": "Terraform Documentation",
        "url": "https://www.terraform.io/docs",
        "type": "article"
      },
      {
        "title": "Terraform Tutorials",
        "url": "https://learn.hashicorp.com/terraform",
        "type": "article"
      },
      {
        "title": "How to Scale Your Terraform Infrastructure",
        "url": "https://thenewstack.io/how-to-scale-your-terraform-infrastructure/",
        "type": "article"
      },
      {
        "title": "Explore top posts about Terraform",
        "url": "https://app.daily.dev/tags/terraform?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "WdlC0HhJ5YESfjXmdMnLU": {
    "title": "OpenTofu",
    "description": "OpenTofu is an infrastructure as code tool that lets you define both cloud and on-prem resources in human-readable configuration files that you can version, reuse, and share. You can then use a consistent workflow to provision and manage all of your infrastructure throughout its lifecycle. OpenTofu can manage low-level components like compute, storage, and networking resources, as well as high-level components like DNS entries and SaaS features.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "OpenTofu Docs",
        "url": "https://opentofu.org/docs/",
        "type": "article"
      },
      {
        "title": "OpenWhat is OpenTofu ?Explained with Demo",
        "url": "https://www.youtube.com/watch?v=6eHV63BVqmA",
        "type": "video"
      }
    ]
  },
  "OKJ3HTfreitk2JdrfeLIK": {
    "title": "AWS CDK",
    "description": "The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework used to provision cloud infrastructure resources in a safe, repeatable manner through AWS CloudFormation. AWS CDK offers the flexibility to write infrastructure as code in popular languages like Python, Java, Go, and C#.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "AWS CDK Crash Course for Beginners",
        "url": "https://www.youtube.com/watch?v=D4Asp5g4fp8",
        "type": "course"
      },
      {
        "title": "AWS CDK Examples",
        "url": "https://github.com/aws-samples/aws-cdk-examples",
        "type": "opensource"
      },
      {
        "title": "AWS CDK",
        "url": "https://aws.amazon.com/cdk/",
        "type": "article"
      },
      {
        "title": "AWS CDK Documentation",
        "url": "https://docs.aws.amazon.com/cdk/index.html",
        "type": "article"
      },
      {
        "title": "Explore top posts about AWS",
        "url": "https://app.daily.dev/tags/aws?ref=roadmapsh",
        "type": "article"
      }
    ]
  },
  "1A98uTo8l_GQSrFxu5N2X": {
    "title": "Google Deployment  Mgr.",
    "description": "Google Cloud Deployment Manager is an infrastructure deployment service that automates the creation and management of Google Cloud resources. It provides users with flexible template and configuration files to create deployments that have a variety of Google Cloud services, such as Cloud Storage, Compute Engine, and Cloud SQL, configured to work together.\n\nImportant, Google Deployment Manager will reach end of support on 31 December 2025. An alternative to this tool is **Google Infrastructure Manager**. Infrastructure Manager (Infra Manager) automates the deployment and management of Google Cloud infrastructure resources using Terraform. Infra Manager allows users to deploy programmatically to Google Cloud, allowing to use this service rather than maintaining a different toolchain to work with Terraform on Google Cloud.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Infrastructure Manager Overview",
        "url": "https://cloud.google.com/infrastructure-manager/docs/overview",
        "type": "article"
      },
      {
        "title": "Google Cloud Deployment Manager documentation",
        "url": "https://cloud.google.com/deployment-manager/docs",
        "type": "article"
      }
    ]
  },
  "V30v5RLQrWSMBUIsZQG1o": {
    "title": "Data Analytics",
    "description": "Data Analytics involves extracting meaningful insights from raw data to drive decision-making processes. It includes a wide range of techniques and disciplines ranging from the simple data compilation to advanced algorithms and statistical analysis. Data analysts, as ambassadors of this domain, employ these techniques to answer various questions:\n\n*   Descriptive Analytics _(what happened in the past?)_\n*   Diagnostic Analytics _(why did it happened in the past?)_\n*   Predictive Analytics _(what will happen in the future?)_\n*   Prescriptive Analytics _(how can we make it happen?)_\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Introduction to Data Analytics",
        "url": "https://www.coursera.org/learn/introduction-to-data-analytics",
        "type": "course"
      },
      {
        "title": "The 4 Types of Data Analysis: Ultimate Guide",
        "url": "https://careerfoundry.com/en/blog/data-analytics/different-types-of-data-analysis/",
        "type": "article"
      },
      {
        "title": "What is Data Analysis? An Expert Guide With Examples",
        "url": "https://www.datacamp.com/blog/what-is-data-analysis-expert-guide",
        "type": "article"
      },
      {
        "title": "Descriptive vs Diagnostic vs Predictive vs Prescriptive Analytics: What's the Difference?",
        "url": "https://www.youtube.com/watch?v=QoEpC7jUb9k",
        "type": "video"
      },
      {
        "title": "Types of Data Analytics",
        "url": "https://www.youtube.com/watch?v=lsZnSgxMwBA",
        "type": "video"
      }
    ]
  },
  "zA5QqqBMsqymdiPGFdUnt": {
    "title": "Business Intelligence",
    "description": "Business intelligence encompasses a set of techniques and technologies to transform raw data into meaningful insights that drive strategic decision-making within an organization. BI tools enable business users to access different types of data, historical and current, third-party and in-house, as well as semistructured data and unstructured data such as social media. Users can analyze this information to gain insights into how the business is performing and what it should do next.\n\nBI platforms traditionally rely on data warehouses for their baseline information. The strength of a data warehouse is that it aggregates data from multiple data sources into one central system to support business data analytics and reporting. BI presents the results to the user in the form of reports, charts and maps, which might be displayed through a dashboard.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is business intelligence (BI)?",
        "url": "https://www.ibm.com/think/topics/business-intelligence",
        "type": "article"
      },
      {
        "title": "Business intelligence: A complete overview",
        "url": "https://www.tableau.com/business-intelligence/what-is-business-intelligence",
        "type": "article"
      },
      {
        "title": "What is business intelligence?",
        "url": "https://www.youtube.com/watch?v=l98-BcB3UIE",
        "type": "video"
      }
    ]
  },
  "6Nr5FAGT_oOPZwZWdv7hl": {
    "title": "Microsoft Power BI",
    "description": "PowerBI, an interactive data visualization and business analytics tool developed by Microsoft, plays a crucial role in the field of a data analyst's work. It helps data analysts to convert raw data into meaningful insights through it's easy-to-use dashboards and reports function. This tool provides a unified view of business data, allowing analysts to track and visualize key performance metrics and make better-informed business decisions. With PowerBI, data analysts also have the ability to manipulate and produce visualizations of large data sets that can be shared across an organization, making complex statistical information more digestible.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "Power BI",
        "url": "https://www.microsoft.com/en-us/power-platform/products/power-bi",
        "type": "article"
      },
      {
        "title": "Power BI for beginners",
        "url": "https://www.youtube.com/watch?v=NNSHu0rkew8",
        "type": "video"
      }
    ]
  },
  "FfU6Vwf0PXva91FoqxFgp": {
    "title": "Streamlit",
    "description": "Streamlit is a free and open-source framework to rapidly build and share machine learning and data science web apps. It is a Python-based library specifically designed for data and machine learning engineers. Data scientists or machine learning engineers are not web developers and they're not interested in spending weeks learning to use these frameworks to build web apps. Instead, they want a tool that is easier to learn and to use, as long as it can display data and collect needed parameters for modeling.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Streamlit Docs",
        "url": "https://docs.streamlit.io/",
        "type": "article"
      },
      {
        "title": "Streamlit Python: Tutorial",
        "url": "https://www.datacamp.com/tutorial/streamlit",
        "type": "article"
      },
      {
        "title": "EStreamlit Explained: Python Tutorial for Data Scientists",
        "url": "https://www.youtube.com/watch?v=c8QXUrvSSyg",
        "type": "video"
      }
    ]
  },
  "gqEAOwHFrQiYSejNUdV7-": {
    "title": "Tableu",
    "description": "Tableau is a powerful data visualization tool utilized extensively by data analysts worldwide. Its primary role is to transform raw, unprocessed data into an understandable format without any technical skills or coding. Data analysts use Tableau to create data visualizations, reports, and dashboards that help businesses make more informed, data-driven decisions. They also use it to perform tasks like trend analysis, pattern identification, and forecasts, all within a user-friendly interface. Moreover, Tableau's data visualization capabilities make it easier for stakeholders to understand complex data and act on insights quickly.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "Tableau",
        "url": "https://www.tableau.com/en-gb",
        "type": "article"
      },
      {
        "title": "What is Tableau?",
        "url": "https://www.youtube.com/watch?v=NLCzpPRCc7U",
        "type": "video"
      }
    ]
  },
  "fY0eZzz0aTXm2lelk8l3g": {
    "title": "Looker",
    "description": "Looker is a Google cloud-based business intelligence and data analytics platform. It allows users to explore, analyze, and visualize data to gain insights and make data-driven decisions. Looker is known for its ability to connect to various data sources, create custom dashboards, and generate reports. It also facilitates the integration of analytics, visualizations, and relevant information into business processes.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Looker business intelligence platform embedded analytics",
        "url": "https://cloud.google.com/looker",
        "type": "article"
      },
      {
        "title": "What is Looker?",
        "url": "https://www.youtube.com/watch?v=EmkNPAzla0Y&pp=0gcJCfwAo7VqN5tD",
        "type": "video"
      }
    ]
  },
  "JpuiYsipNWBcrjmn2ji6b": {
    "title": "Reverse ETL",
    "description": "Reverse ETL is the process of extracting data from a data warehouse, transforming it to fit the requirements of operational systems, and then loading it into those other systems. This approach contrasts with traditional ETL, where data is extracted from operational systems, transformed, and loaded into a data warehouse.\n\nWhile ETL and ELT focus on centralizing data, Reverse ETL aims to operationalize this data by making it actionable within third-party systems such as CRMs, marketing platforms, and other operational tools. Visit the following resources to learn more:",
    "links": [
      {
        "title": "What is Reverse ETL? A Helpful Guide",
        "url": "https://www.datacamp.com/blog/reverse-etl",
        "type": "article"
      },
      {
        "title": "What is Reverse ETL?",
        "url": "https://www.youtube.com/watch?v=DRAGfc5or2Y",
        "type": "video"
      }
    ]
  },
  "LMFREK9dH_7qzx_s2xCjI": {
    "title": "ETL vs Reverse ETL",
    "description": "ETL (Extract, Transform, Load) is a key process in data warehousing, enabling the integration of data from multiple sources into a centralized database.\n\nReverse ETL emerged as organizations recognized that their carefully curated data warehouses, while excellent for analysis, created a new form of data silo that prevented operational teams from accessing valuable insights. This methodology addresses the critical gap between analytical insights and operational execution by systematically moving processed data from centralized repositories back to the operational systems where business teams interact with customers and manage daily operations.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is ETL?",
        "url": "https://www.snowflake.com/guides/what-etl",
        "type": "article"
      },
      {
        "title": "ETL vs Reverse ETL vs Data Activation",
        "url": "https://airbyte.com/data-engineering-resources/etl-vs-reverse-etl-vs-data-activation",
        "type": "article"
      },
      {
        "title": "ETL vs Reverse ETL: An Overview, Key Differences, & Use Cases",
        "url": "https://portable.io/learn/etl-vs-reverse-etl",
        "type": "article"
      }
    ]
  },
  "mBOGrJIUaatBe2PnJM2NK": {
    "title": "Reverse ETL Usecases",
    "description": "",
    "links": []
  },
  "vZGDtlyt_yj4szcPTw3cv": {
    "title": "Census",
    "description": "Census is a reverse ETL platform that synchronizes data from a data warehouse to various business applications and SaaS apps like Salesforce and Hubspot. It's a crucial part of the modern data stack, enabling businesses to operationalize their data by making it available in the tools where teams work, like CRMs, marketing platforms, and more.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Census",
        "url": "https://www.getcensus.com/reverse-etl",
        "type": "article"
      },
      {
        "title": "Census Documentation",
        "url": "https://developers.getcensus.com/getting-started/introduction",
        "type": "article"
      },
      {
        "title": "A starter guide to reverse ETL with Census",
        "url": "https://www.getcensus.com/blog/starter-guide-for-first-time-census-users",
        "type": "article"
      },
      {
        "title": "How to \"Reverse ETL\" with Census",
        "url": "https://www.youtube.com/watch?v=XkS7DQFHzbA",
        "type": "video"
      }
    ]
  },
  "8vqjI-uFwJIr_TBEVyM_3": {
    "title": "Segment",
    "description": "Segment is an analytics platform that provides a single API for collecting, storing, and routing customer data from various sources. With Segment, data engineers can easily add analytics tracking to their app, without having to integrate with multiple analytics tools individually. Segment acts as a single point of integration, allowing developers to send data to multiple analytics tools with a single API.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "flutter_segment",
        "url": "https://pub.dev/packages/flutter_segment",
        "type": "article"
      }
    ]
  },
  "8NTe5-XQ5tKAWUyg1rnzb": {
    "title": "Hightouch",
    "description": "Hightouch is a reverse ETL and AI platform crafted for marketing and personalization, allowing companies to uncover insights, execute campaigns, and develop AI agents using their data. It features an AI Decisioning Platform for lifecycle marketing and a Composable Customer Data Platform (CDP) that is adaptable, secure, and quick to deploy, built on top of a data warehouse.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Hightouch Docs",
        "url": "https://hightouch.com/docs",
        "type": "article"
      },
      {
        "title": "What is Hightouch? - The Data Activation Platform",
        "url": "https://www.youtube.com/watch?v=vMm87-MC7og",
        "type": "video"
      }
    ]
  },
  "HDVhttLNMLmIAVEOBCOQ3": {
    "title": "Authentication vs Authorization",
    "description": "Authentication and authorization are popular terms in modern computer systems that often confuse people. **Authentication** is the process of confirming the identity of a user or a device (i.e., an entity). During the authentication process, an entity usually relies on some proof to authenticate itself, i.e. an authentication factor. In contrast to authentication, **authorization** refers to the process of verifying what resources entities (users or devices) can access, or what actions they can perform, i.e., their access rights.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Basic Authentication",
        "url": "https://roadmap.sh/guides/basic-authentication",
        "type": "article"
      },
      {
        "title": "What is Authentication vs Authorization?",
        "url": "https://auth0.com/intro-to-iam/authentication-vs-authorization",
        "type": "article"
      }
    ]
  },
  "2PqRgrYuJi_pPhOS0AkoP": {
    "title": "Encryption",
    "description": "Encryption is used to protect data from being stolen, changed, or compromised and works by scrambling data into a secret code that can only be unlocked with a unique digital key. Encrypted data can be protected while at rest on computers or in transit between them, or while being processed, regardless of whether those computers are located on-premises or are remote cloud servers.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Whay is Encryption?",
        "url": "https://cloud.google.com/learn/what-is-encryption",
        "type": "article"
      },
      {
        "title": "Whay is Encryption?",
        "url": "https://www.youtube.com/watch?v=9chKCUQ8_VQ",
        "type": "video"
      }
    ]
  },
  "ZAKo9Svb8TQ6KkmOnfB5x": {
    "title": "Tokenization",
    "description": "Tokenization is the step where raw text is broken into small pieces called tokens, and each token is given a unique number. A token can be a whole word, part of a word, a punctuation mark, or even a space. The list of all possible tokens is the model’s vocabulary. Once text is turned into these numbered tokens, the model can look up an embedding for each number and start its math. By working with tokens instead of full sentences, the model keeps the input size steady and can handle new or rare words by slicing them into familiar sub-pieces. After the model finishes its work, the numbered tokens are turned back into text through the same vocabulary map, letting the user read the result.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Explaining Tokens — the Language and Currency of AI",
        "url": "https://blogs.nvidia.com/blog/ai-tokens-explained/",
        "type": "article"
      },
      {
        "title": "What is Tokenization? Types, Use Cases, Implementation",
        "url": "https://www.datacamp.com/blog/what-is-tokenization",
        "type": "article"
      }
    ]
  },
  "2Wu1Ufm2l1nrytz1mAxmJ": {
    "title": "Data Masking",
    "description": "Data masking is a process that creates a copy of real data but replaces sensitive information with false but realistic-looking data, preserving the format and structure of the original data for non-production uses like software testing, training, and development. The goal is to protect confidential information and ensure compliance with data protection regulations by preventing unauthorized access to real sensitive data without compromising the usability of the data for other business functions.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Data masking",
        "url": "https://en.wikipedia.org/wiki/Data_masking",
        "type": "article"
      },
      {
        "title": "What is data masking?",
        "url": "https://aws.amazon.com/what-is/data-masking/",
        "type": "article"
      }
    ]
  },
  "rUiYUV4ps6NYYYRwUnjuM": {
    "title": "Data Obfuscation",
    "description": "Statistical data obfuscation involves altering the values of sensitive data in a way that preserves the statistical properties and relationships within the data. It ensures that the masked data maintains the overall distribution, patterns, and correlations of the original data for accurate statistical analysis. Statistical data obfuscation techniques include applying mathematical functions or perturbation algorithms to the data.",
    "links": []
  },
  "cStrYgFZA2NuYq8TdWWP_": {
    "title": "Data Quality",
    "description": "Ensuring quality involves validating the accuracy, completeness, consistency, and reliability of the data collected from each source. The fact that you do it from one source or multiple is almost irrelevant since the only extra task would be to homogenize the final schema of the data, ensuring deduplication and normalization.\n\nThis last part typically includes verifying the credibility of each data source, standardizing formats (like date/time or currency), performing schema alignment, and running profiling to detect anomalies, duplicates, or mismatches before integrating the data for analysis.",
    "links": []
  },
  "pKewO7Ef3GBXL4MDK62QG": {
    "title": "Data Lineage",
    "description": "**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in Data Engineering for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "What is Data Lineage? - IBM",
        "url": "https://www.ibm.com/topics/data-lineage",
        "type": "article"
      },
      {
        "title": "What is Data Lineage? - Datacamp",
        "url": "https://www.datacamp.com/blog/data-lineage",
        "type": "article"
      }
    ]
  },
  "a5gzM8msXibxD58eVDkM-": {
    "title": "Metadata Management",
    "description": "",
    "links": []
  },
  "ghAbtfB5KtbboNjijL1Zf": {
    "title": "Data Interoperability",
    "description": "Data interoperability is the ability of diverse systems and applications to access, exchange, and cooperatively use data in a coordinated and meaningful way, even across organizational boundaries. It ensures that data can flow freely, maintaining its integrity and context, allowing for improved efficiency, collaboration, and decision-making by breaking down data silos. Achieving data interoperability often relies on data standards, metadata, and common data elements to define how data is collected, formatted, and interpreted.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "Data Interoperability",
        "url": "https://www.sciencedirect.com/topics/computer-science/data-interoperability",
        "type": "article"
      },
      {
        "title": "What is Data Interoperability? – Exploring the Process and Benefits",
        "url": "https://www.codelessplatforms.com/blog/what-is-data-interoperability/",
        "type": "article"
      }
    ]
  },
  "iuNP6W0A2GLTE2PK5y68u": {
    "title": "Data Quality",
    "description": "Data quality refers to the degree to which a dataset is accurate, complete, consistent, relevant, and timely, making it fit for its intended use. High-quality data is reliable and trustworthy, enabling better decision-making, accurate analysis, and effective strategies, while poor data quality can lead to flawed insights, wasted resources, and negative consequences for an organization.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "What is Data Quality?",
        "url": "https://www.ibm.com/think/topics/data-quality",
        "type": "article"
      }
    ]
  },
  "MuPHohc7mJzcH5QdJ-K46": {
    "title": "GDPR",
    "description": "The General Data Protection Regulation (GDPR) is an essential standard in API Design that addresses the storage, transfer, and processing of personal data of individuals within the European Union. With regards to API Design, considerations must be given on how APIs handle, process, and secure the data to conform with GDPR's demands on data privacy and security. This includes requirements for explicit consent, right to erasure, data portability, and privacy by design. Non-compliance with these standards not only leads to hefty fines but may also erode trust from users and clients. As such, understanding the impact and integration of GDPR within API design is pivotal for organizations handling EU residents' data.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "GDPR",
        "url": "https://gdpr-info.eu/",
        "type": "article"
      },
      {
        "title": "What is GDPR Compliance in Web Application and API Security?",
        "url": "https://probely.com/blog/what-is-gdpr-compliance-in-web-application-and-api-security/",
        "type": "article"
      }
    ]
  },
  "g1VwuSupohuDAT2O4hTXx": {
    "title": "ECPA",
    "description": "The California Consumer Privacy Act (CCPA) is a California state law enacted in 2020 that protects and enforces the rights of Californians regarding the privacy of consumers’ personal information (PI).\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "California Consumer Privacy Act (CCPA)",
        "url": "https://oag.ca.gov/privacy/ccpa",
        "type": "article"
      },
      {
        "title": "What is the California Consumer Privacy Act (CCPA)?",
        "url": "https://www.ibm.com/think/topics/ccpa-compliance",
        "type": "article"
      },
      {
        "title": "What is the California Consumer Privacy Act? | CCPA Explained?",
        "url": "https://www.youtube.com/watch?v=dpzsAgrDAO4",
        "type": "video"
      }
    ]
  },
  "tdqhFFvQ2dQVeQh1qTHjV": {
    "title": "EU AI Act",
    "description": "he Artificial Intelligence Act of the European Union, also known as the EU AI Act, is a comprehensive regulatory framework that is established to ensure safety and that fundamental human rights are upheld in the use of AI technologies. It governs the development and/or use of AI in the European Union. The act takes a risk-based approach to regulation, applying different rules to AI systems according to the risk they pose.\n\nConsidered the world's first comprehensive regulatory framework for AI, the EU AI Act prohibits some AI uses outright and implements strict governance, risk management and transparency requirements for others.\n\nVisit the following resources to learn more:",
    "links": [
      {
        "title": "The EU AI Act Explorer",
        "url": "https://artificialintelligenceact.eu/ai-act-explorer/",
        "type": "article"
      },
      {
        "title": "AI Act - European Commission",
        "url": "https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai",
        "type": "article"
      },
      {
        "title": "Artificial Intelligence Act",
        "url": "https://en.wikipedia.org/wiki/Artificial_Intelligence_Act",
        "type": "article"
      },
      {
        "title": "The EU AI Act Explained",
        "url": "https://www.youtube.com/watch?v=s_rxOnCt3HQ",
        "type": "video"
      }
    ]
  },
  "S8XMtFKWlnUqADElFp0Zw": {
    "title": "Machine Learning",
    "description": "Machine learning, a subset of artificial intelligence, is an indispensable tool in the hands of a data analyst. It provides the ability to automatically learn, improve from experience and make decisions without being explicitly programmed. In the context of a data analyst, machine learning contributes significantly in uncovering hidden insights, recognising patterns or making predictions based on large amounts of data. Through the use of varying algorithms and models, data analysts are able to leverage machine learning to convert raw data into meaningful information, making it a critical concept in data analysis.\n\nLearn more from the following resources:",
    "links": [
      {
        "title": "What is Machine Learning (ML)?",
        "url": "https://www.ibm.com/topics/machine-learning",
        "type": "article"
      },
      {
        "title": "What is Machine Learning?",
        "url": "https://www.youtube.com/watch?v=9gGnTQTYNaE",
        "type": "video"
      }
    ]
  },
  "VQv-c7buU2l-IDzRZBMRo": {
    "title": "MLOps",
    "description": "MLOps is a practice for collaboration and communication between data scientists and operations professionals to help manage production ML lifecycle. It is a set of best practices that aims to automate the ML lifecycle, including training, deployment, and monitoring. MLOps helps organizations to scale ML models and deliver business value faster.",
    "links": []
  }
}