add content to DE roadmap and fix some typos in content appearing in several roadmaps

2025-09-01 21:32:35 +02:00 · 2025-08-14 16:55:13 +02:00
parent 88ac6406f9
commit 4c7daa6a5b
13 changed files with 110 additions and 13 deletions
--- a/src/data/roadmaps/data-engineer/content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md
+++ b/src/data/roadmaps/data-engineer/content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md
@@ -1 +1,12 @@
-# Apache Kafka
+# Apache Kafka
+
+Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss.
+
+Visit the following resources to learn more:
+
+- [@official@Apache Kafka](https://kafka.apache.org/quickstart)
+- [@offical@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
+- [@offical@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
+- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4)
+- [@video@Kafka in 100 Seconds](https://www.youtube.com/watch?v=uvb00oaa3k8)
+- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
--- a/src/data/roadmaps/data-engineer/content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md
+++ b/src/data/roadmaps/data-engineer/content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md
@@ -1 +1,9 @@
-# Apache Spark
+# Apache Spark
+
+Apache Spark is an open-source distributed computing system designed for big data processing and analytics. It offers a unified interface for programming entire clusters, enabling efficient handling of large-scale data with built-in support for data parallelism and fault tolerance. Spark excels in processing tasks like batch processing, real-time data streaming, machine learning, and graph processing. It’s known for its speed, ease of use, and ability to process data in-memory, significantly outperforming traditional MapReduce systems. Spark is widely used in big data ecosystems for its scalability and versatility across various data processing tasks.
+
+Visit the following resources to learn more:
+
+- [@official@ApacheSpark](https://spark.apache.org/documentation.html)
+- [@article@Spark By Examples](https://sparkbyexamples.com)
+- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)
--- a/src/data/roadmaps/data-engineer/content/apis@cxTriSZvrmXP4axKynIZW.md
+++ b/src/data/roadmaps/data-engineer/content/apis@cxTriSZvrmXP4axKynIZW.md
@@ -1 +1,8 @@
-# APIs
+ # APIs and Data Collection
+
+Application Programming Interfaces, better known as APIs, play a fundamental role in the work of data engineers, particularly in the process of data collection. APIs are sets of protocols, routines, and tools that enable different software applications to communicate with each other. An API allows developers to interact with a service or platform through a defined set of rules and endpoints, enabling data exchange and functionality use without needing to understand the underlying code. In data engineering, APIs are used extensively to collect, exchange, and manipulate data from different sources in a secure and efficient manner.
+
+Visit the following resources to learn more:
+
+- [@article@What is an API?](https://aws.amazon.com/what-is/api/)
+- [@article@A Beginner's Guide to APIs](https://www.postman.com/what-is-an-api/)
--- a/src/data/roadmaps/data-engineer/content/argocd@PUzHbjwntTSj1REL_dAov.md
+++ b/src/data/roadmaps/data-engineer/content/argocd@PUzHbjwntTSj1REL_dAov.md
@@ -1 +1,10 @@
-# ArgoCD
+# ArgoCD
+
+Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment. Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications.
+
+Visit the following resources to learn more:
+
+- [@official@Argo CD - Argo Project](https://argo-cd.readthedocs.io/en/stable/)
+- [@video@ArgoCD Tutorial for Beginners](https://www.youtube.com/watch?v=MeU5_k9ssrs)
+- [@video@What is ArgoCD](https://www.youtube.com/watch?v=p-kAqxuJNik)
+- [@feed@Explore top posts about ArgoCD](https://app.daily.dev/tags/argocd?ref=roadmapsh)
--- a/src/data/roadmaps/data-engineer/content/async-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md
+++ b/src/data/roadmaps/data-engineer/content/async-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md
@@ -1 +1,10 @@
-# Async vs Sync Communication
+# Async vs Sync Communication
+
+Synchronous and asynchronous data refer to different approaches in data transmission and processing. **Synchronous** ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, **asynchronous** ingestion is a process where data is ingested without waiting for a response from the data source. Normally, data is queued in a buffer and sent in batches for efficiency.
+
+Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
+
+Visit the following resources to learn more:
+
+- [@article@Synchronous And Asynchronous Data Transmission: The Differences And How to Use Them](https://www.computer.org/publications/tech-news/trends/synchronous-asynchronous-data-transmission)
+- [@article@Synchronous vs Asynchronous Communication: What’s the Difference?](https://www.getguru.com/reference/synchronous-vs-asynchronous-communication)
--- a/src/data/roadmaps/data-engineer/content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md
+++ b/src/data/roadmaps/data-engineer/content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md
@@ -1 +1,9 @@
-# Aurora DB
+# Aurora DB
+
+Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. Aurora includes a high-performance storage subsystem. Its MySQL- and PostgreSQL-compatible database engines are customized to take advantage of that fast distributed storage. The underlying storage grows automatically as needed. Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration.
+
+Visit the following resources to learn more:
+
+- [@official@SAmazon Aurora](https://aws.amazon.com/rds/aurora/)
+- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora)
+
--- a/src/data/roadmaps/data-engineer/content/authentication-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md
+++ b/src/data/roadmaps/data-engineer/content/authentication-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md
@@ -1 +1,8 @@
-# Authentication vs Authorization
+# Authentication vs Authorization
+
+Authentication and authorization are popular terms in modern computer systems that often confuse people. **Authentication** is the process of confirming the identity of a user or a device (i.e., an entity). During the authentication process, an entity usually relies on some proof to authenticate itself, i.e. an authentication factor. In contrast to authentication, **authorization** refers to the process of verifying what resources entities (users or devices) can access, or what actions they can perform, i.e., their access rights.
+
+Visit the following resources to learn more:
+
+- [@roadmap.sh@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
+- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization)
--- a/src/data/roadmaps/data-engineer/content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md
+++ b/src/data/roadmaps/data-engineer/content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md
@@ -1 +1,11 @@
-# AWS CDK
+# AWS CDK
+
+The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework used to provision cloud infrastructure resources in a safe, repeatable manner through AWS CloudFormation. AWS CDK offers the flexibility to write infrastructure as code in popular languages like Python, Java, Go, and C#.
+
+Visit the following resources to learn more:
+
+- [@official@AWS CDK](https://aws.amazon.com/cdk/)
+- [@official@AWS CDK Documentation](https://docs.aws.amazon.com/cdk/index.html)
+- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
+- [@opensource@AWS CDK Examples](https://github.com/aws-samples/aws-cdk-examples)
+- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)
--- a/src/data/roadmaps/data-engineer/content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md
+++ b/src/data/roadmaps/data-engineer/content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md
@@ -1 +1,8 @@
-# AWS EKS
+# EKS
+
+Amazon Elastic Kubernetes Service (EKS) is a managed service that simplifies the deployment, management, and scaling of containerized applications using Kubernetes, an open-source container orchestration platform. EKS manages the Kubernetes control plane for the user, making it easy to run Kubernetes applications without the operational overhead of maintaining the Kubernetes control plane. With EKS, you can leverage AWS services such as Auto Scaling Groups, Elastic Load Balancer, and Route 53 for resilient and scalable application infrastructure. Additionally, EKS can support Spot and On-Demand instances use, and includes integrations with AWS App Mesh service and AWS Fargate for serverless compute.
+
+Visit the following resources to learn more:
+
+- [@official@Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)
+- [@official@Concepts of Amazon EKS](https://docs.aws.amazon.com/eks/)
--- a/src/data/roadmaps/data-engineer/content/aws-sns@uFeiTRobSymkvCinhwmZV.md
+++ b/src/data/roadmaps/data-engineer/content/aws-sns@uFeiTRobSymkvCinhwmZV.md
@@ -1 +1,10 @@
-# AWS SNS
+# AWS SNS
+
+Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud. It provides developers with a highly scalable, flexible, and cost-effective capability to publish messages from an application and immediately deliver them to subscribers or other applications. It is designed to make web-scale computing easier for developers. Amazon SNS follows the “publish-subscribe” (pub-sub) messaging paradigm, with notifications being delivered to clients using a “push” mechanism that eliminates the need to periodically check or “poll” for new information and updates. With simple APIs requiring minimal up-front development effort, no maintenance or management overhead and pay-as-you-go pricing, Amazon SNS gives developers an easy mechanism to incorporate a powerful notification system with their applications.
+
+Visit the following resources to learn more:
+
+- [@official@Amazon Simple Notification Service (SNS) ](http://aws.amazon.com/sns/)
+- [@official@Send Fanout Event Notifications](https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/)
+- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/)
+
--- a/src/data/roadmaps/devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md
+++ b/src/data/roadmaps/devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md
@@ -1,6 +1,6 @@
 # ArgoCD

-Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment.Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications.
+Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment. Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications.

 Visit the following resources to learn more:

--- a/src/data/roadmaps/server-side-game-developer/content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md
+++ b/src/data/roadmaps/server-side-game-developer/content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md
@@ -1,3 +1,9 @@
 # Apache Kafka

-Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss.
+Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss.
+
+Visit the following resources to learn more:
+
+- [@official@Apache Kafka Quickstart](https://kafka.apache.org/quickstart)
+- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4)
+- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
--- a/src/data/roadmaps/server-side-game-developer/content/apache-spark@yrWiWJMSyTWxDakJbqacu.md
+++ b/src/data/roadmaps/server-side-game-developer/content/apache-spark@yrWiWJMSyTWxDakJbqacu.md
@@ -1,3 +1,9 @@
 # Apache Spark

-Apache Spark is an open-source, distributed computing system used for big data processing and analytics. It offers an interface for programming entire clusters with impeccable data parallelism and fault tolerance. With its high-level APIs in Java, Scala, Python and R, it provides a framework for distributed task dispatching, scheduling and basic I/O functionalities. Notable modules include SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing. Apache Spark can run standalone, on Hadoop, or in the cloud, and is capable of accessing diverse data sources such as HDFS, Apache Cassandra, Apache HBase, and Amazon S3.
+Apache Spark is an open-source, distributed computing system used for big data processing and analytics. It offers an interface for programming entire clusters with impeccable data parallelism and fault tolerance. With its high-level APIs in Java, Scala, Python and R, it provides a framework for distributed task dispatching, scheduling and basic I/O functionalities. Notable modules include SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing. Apache Spark can run standalone, on Hadoop, or in the cloud, and is capable of accessing diverse data sources such as HDFS, Apache Cassandra, Apache HBase, and Amazon S3.
+
+Visit the following resources to learn more:
+
+- [@official@ApacheSpark](https://spark.apache.org/documentation.html)
+- [@article@Spark By Examples](https://sparkbyexamples.com)
+- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)