From 4c7daa6a5bbb611c846d70449a96ccc74491c8e4 Mon Sep 17 00:00:00 2001 From: Javi Canales Date: Thu, 14 Aug 2025 16:55:13 +0200 Subject: [PATCH] add content to DE roadmap and fix some typos in content appearing in several roadmaps --- .../content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md | 13 ++++++++++++- .../content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md | 10 +++++++++- .../content/apis@cxTriSZvrmXP4axKynIZW.md | 9 ++++++++- .../content/argocd@PUzHbjwntTSj1REL_dAov.md | 11 ++++++++++- ...c-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md | 11 ++++++++++- .../content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md | 10 +++++++++- ...cation-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md | 9 ++++++++- .../content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md | 12 +++++++++++- .../content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md | 9 ++++++++- .../content/aws-sns@uFeiTRobSymkvCinhwmZV.md | 11 ++++++++++- .../devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md | 2 +- .../content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md | 8 +++++++- .../content/apache-spark@yrWiWJMSyTWxDakJbqacu.md | 8 +++++++- 13 files changed, 110 insertions(+), 13 deletions(-) diff --git a/src/data/roadmaps/data-engineer/content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md b/src/data/roadmaps/data-engineer/content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md index a7aa4172a..9e9c5bf6a 100644 --- a/src/data/roadmaps/data-engineer/content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md +++ b/src/data/roadmaps/data-engineer/content/apache-kafka@fTpx6m8U0506ZLCdDU5OG.md @@ -1 +1,12 @@ -# Apache Kafka \ No newline at end of file +# Apache Kafka + +Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss. + +Visit the following resources to learn more: + +- [@official@Apache Kafka](https://kafka.apache.org/quickstart) +- [@offical@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html) +- [@offical@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/) +- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4) +- [@video@Kafka in 100 Seconds](https://www.youtube.com/watch?v=uvb00oaa3k8) +- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md b/src/data/roadmaps/data-engineer/content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md index 49d598a03..7b1111d54 100644 --- a/src/data/roadmaps/data-engineer/content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md +++ b/src/data/roadmaps/data-engineer/content/apache-spark@qHMtJFYcGmESiz_VwRwiI.md @@ -1 +1,9 @@ -# Apache Spark \ No newline at end of file +# Apache Spark + +Apache Spark is an open-source distributed computing system designed for big data processing and analytics. It offers a unified interface for programming entire clusters, enabling efficient handling of large-scale data with built-in support for data parallelism and fault tolerance. Spark excels in processing tasks like batch processing, real-time data streaming, machine learning, and graph processing. It’s known for its speed, ease of use, and ability to process data in-memory, significantly outperforming traditional MapReduce systems. Spark is widely used in big data ecosystems for its scalability and versatility across various data processing tasks. + +Visit the following resources to learn more: + +- [@official@ApacheSpark](https://spark.apache.org/documentation.html) +- [@article@Spark By Examples](https://sparkbyexamples.com) +- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/apis@cxTriSZvrmXP4axKynIZW.md b/src/data/roadmaps/data-engineer/content/apis@cxTriSZvrmXP4axKynIZW.md index 51117e98c..b46207bb7 100644 --- a/src/data/roadmaps/data-engineer/content/apis@cxTriSZvrmXP4axKynIZW.md +++ b/src/data/roadmaps/data-engineer/content/apis@cxTriSZvrmXP4axKynIZW.md @@ -1 +1,8 @@ -# APIs \ No newline at end of file + # APIs and Data Collection + +Application Programming Interfaces, better known as APIs, play a fundamental role in the work of data engineers, particularly in the process of data collection. APIs are sets of protocols, routines, and tools that enable different software applications to communicate with each other. An API allows developers to interact with a service or platform through a defined set of rules and endpoints, enabling data exchange and functionality use without needing to understand the underlying code. In data engineering, APIs are used extensively to collect, exchange, and manipulate data from different sources in a secure and efficient manner. + +Visit the following resources to learn more: + +- [@article@What is an API?](https://aws.amazon.com/what-is/api/) +- [@article@A Beginner's Guide to APIs](https://www.postman.com/what-is-an-api/) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/argocd@PUzHbjwntTSj1REL_dAov.md b/src/data/roadmaps/data-engineer/content/argocd@PUzHbjwntTSj1REL_dAov.md index 4f5515083..1bef089ac 100644 --- a/src/data/roadmaps/data-engineer/content/argocd@PUzHbjwntTSj1REL_dAov.md +++ b/src/data/roadmaps/data-engineer/content/argocd@PUzHbjwntTSj1REL_dAov.md @@ -1 +1,10 @@ -# ArgoCD \ No newline at end of file +# ArgoCD + +Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment. Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications. + +Visit the following resources to learn more: + +- [@official@Argo CD - Argo Project](https://argo-cd.readthedocs.io/en/stable/) +- [@video@ArgoCD Tutorial for Beginners](https://www.youtube.com/watch?v=MeU5_k9ssrs) +- [@video@What is ArgoCD](https://www.youtube.com/watch?v=p-kAqxuJNik) +- [@feed@Explore top posts about ArgoCD](https://app.daily.dev/tags/argocd?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/async-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md b/src/data/roadmaps/data-engineer/content/async-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md index e56d853f6..69d7ed421 100644 --- a/src/data/roadmaps/data-engineer/content/async-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md +++ b/src/data/roadmaps/data-engineer/content/async-vs-sync-communication@VefHaP7rIOcZVFzglyn66.md @@ -1 +1,10 @@ -# Async vs Sync Communication \ No newline at end of file +# Async vs Sync Communication + +Synchronous and asynchronous data refer to different approaches in data transmission and processing. **Synchronous** ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, **asynchronous** ingestion is a process where data is ingested without waiting for a response from the data source. Normally, data is queued in a buffer and sent in batches for efficiency. + +Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs. + +Visit the following resources to learn more: + +- [@article@Synchronous And Asynchronous Data Transmission: The Differences And How to Use Them](https://www.computer.org/publications/tech-news/trends/synchronous-asynchronous-data-transmission) +- [@article@Synchronous vs Asynchronous Communication: What’s the Difference?](https://www.getguru.com/reference/synchronous-vs-asynchronous-communication) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md b/src/data/roadmaps/data-engineer/content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md index 2ac9593c6..a3489a30e 100644 --- a/src/data/roadmaps/data-engineer/content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md +++ b/src/data/roadmaps/data-engineer/content/aurora-db@YZ4G1-6VJ7VdsphdcBTf9.md @@ -1 +1,9 @@ -# Aurora DB \ No newline at end of file +# Aurora DB + +Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL. Aurora includes a high-performance storage subsystem. Its MySQL- and PostgreSQL-compatible database engines are customized to take advantage of that fast distributed storage. The underlying storage grows automatically as needed. Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration. + +Visit the following resources to learn more: + +- [@official@SAmazon Aurora](https://aws.amazon.com/rds/aurora/) +- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora) + diff --git a/src/data/roadmaps/data-engineer/content/authentication-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md b/src/data/roadmaps/data-engineer/content/authentication-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md index c07456ef3..aa82f7ee2 100644 --- a/src/data/roadmaps/data-engineer/content/authentication-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md +++ b/src/data/roadmaps/data-engineer/content/authentication-vs-authorization@HDVhttLNMLmIAVEOBCOQ3.md @@ -1 +1,8 @@ -# Authentication vs Authorization \ No newline at end of file +# Authentication vs Authorization + +Authentication and authorization are popular terms in modern computer systems that often confuse people. **Authentication** is the process of confirming the identity of a user or a device (i.e., an entity). During the authentication process, an entity usually relies on some proof to authenticate itself, i.e. an authentication factor. In contrast to authentication, **authorization** refers to the process of verifying what resources entities (users or devices) can access, or what actions they can perform, i.e., their access rights. + +Visit the following resources to learn more: + +- [@roadmap.sh@Basic Authentication](https://roadmap.sh/guides/basic-authentication) +- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md b/src/data/roadmaps/data-engineer/content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md index ef7addebb..1a5980d01 100644 --- a/src/data/roadmaps/data-engineer/content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md +++ b/src/data/roadmaps/data-engineer/content/aws-cdk@OKJ3HTfreitk2JdrfeLIK.md @@ -1 +1,11 @@ -# AWS CDK \ No newline at end of file +# AWS CDK + +The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework used to provision cloud infrastructure resources in a safe, repeatable manner through AWS CloudFormation. AWS CDK offers the flexibility to write infrastructure as code in popular languages like Python, Java, Go, and C#. + +Visit the following resources to learn more: + +- [@official@AWS CDK](https://aws.amazon.com/cdk/) +- [@official@AWS CDK Documentation](https://docs.aws.amazon.com/cdk/index.html) +- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8) +- [@opensource@AWS CDK Examples](https://github.com/aws-samples/aws-cdk-examples) +- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md b/src/data/roadmaps/data-engineer/content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md index 67d7c31b0..b778e0208 100644 --- a/src/data/roadmaps/data-engineer/content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md +++ b/src/data/roadmaps/data-engineer/content/aws-eks@eVqcYI2Sy2Dldl3SfxB2C.md @@ -1 +1,8 @@ -# AWS EKS \ No newline at end of file +# EKS + +Amazon Elastic Kubernetes Service (EKS) is a managed service that simplifies the deployment, management, and scaling of containerized applications using Kubernetes, an open-source container orchestration platform. EKS manages the Kubernetes control plane for the user, making it easy to run Kubernetes applications without the operational overhead of maintaining the Kubernetes control plane. With EKS, you can leverage AWS services such as Auto Scaling Groups, Elastic Load Balancer, and Route 53 for resilient and scalable application infrastructure. Additionally, EKS can support Spot and On-Demand instances use, and includes integrations with AWS App Mesh service and AWS Fargate for serverless compute. + +Visit the following resources to learn more: + +- [@official@Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/) +- [@official@Concepts of Amazon EKS](https://docs.aws.amazon.com/eks/) diff --git a/src/data/roadmaps/data-engineer/content/aws-sns@uFeiTRobSymkvCinhwmZV.md b/src/data/roadmaps/data-engineer/content/aws-sns@uFeiTRobSymkvCinhwmZV.md index e46eef018..fa470325c 100644 --- a/src/data/roadmaps/data-engineer/content/aws-sns@uFeiTRobSymkvCinhwmZV.md +++ b/src/data/roadmaps/data-engineer/content/aws-sns@uFeiTRobSymkvCinhwmZV.md @@ -1 +1,10 @@ -# AWS SNS \ No newline at end of file +# AWS SNS + +Amazon Simple Notification Service (Amazon SNS) is a web service that makes it easy to set up, operate, and send notifications from the cloud. It provides developers with a highly scalable, flexible, and cost-effective capability to publish messages from an application and immediately deliver them to subscribers or other applications. It is designed to make web-scale computing easier for developers. Amazon SNS follows the “publish-subscribe” (pub-sub) messaging paradigm, with notifications being delivered to clients using a “push” mechanism that eliminates the need to periodically check or “poll” for new information and updates. With simple APIs requiring minimal up-front development effort, no maintenance or management overhead and pay-as-you-go pricing, Amazon SNS gives developers an easy mechanism to incorporate a powerful notification system with their applications. + +Visit the following resources to learn more: + +- [@official@Amazon Simple Notification Service (SNS) ](http://aws.amazon.com/sns/) +- [@official@Send Fanout Event Notifications](https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/) +- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/) + diff --git a/src/data/roadmaps/devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md b/src/data/roadmaps/devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md index 5c5c3ff0a..1bef089ac 100644 --- a/src/data/roadmaps/devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md +++ b/src/data/roadmaps/devops/content/argocd@i-DLwNXdCUUug6lfjkPSy.md @@ -1,6 +1,6 @@ # ArgoCD -Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment.Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications. +Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment. Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications. Visit the following resources to learn more: diff --git a/src/data/roadmaps/server-side-game-developer/content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md b/src/data/roadmaps/server-side-game-developer/content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md index 2a32c39a6..da917835c 100644 --- a/src/data/roadmaps/server-side-game-developer/content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md +++ b/src/data/roadmaps/server-side-game-developer/content/apache-kafka@gL7hubTh3qiMyUWeAZNwI.md @@ -1,3 +1,9 @@ # Apache Kafka -Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss. \ No newline at end of file +Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to the Apache Software Foundation. It is written in Scala and Java and operates based on a message queue, designed to handle real-time data feeds. Kafka functions as a kind of message broker service in between the data producers and the consumers, facilitating efficient transmission of data. It can be viewed as a durable message broker where applications can process and reprocess streamed data. Kafka is a highly scalable and fault-tolerant system which ensures data delivery without loss. + +Visit the following resources to learn more: + +- [@official@Apache Kafka Quickstart](https://kafka.apache.org/quickstart) +- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4) +- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh) diff --git a/src/data/roadmaps/server-side-game-developer/content/apache-spark@yrWiWJMSyTWxDakJbqacu.md b/src/data/roadmaps/server-side-game-developer/content/apache-spark@yrWiWJMSyTWxDakJbqacu.md index bc5df21b8..6d84feee3 100644 --- a/src/data/roadmaps/server-side-game-developer/content/apache-spark@yrWiWJMSyTWxDakJbqacu.md +++ b/src/data/roadmaps/server-side-game-developer/content/apache-spark@yrWiWJMSyTWxDakJbqacu.md @@ -1,3 +1,9 @@ # Apache Spark -Apache Spark is an open-source, distributed computing system used for big data processing and analytics. It offers an interface for programming entire clusters with impeccable data parallelism and fault tolerance. With its high-level APIs in Java, Scala, Python and R, it provides a framework for distributed task dispatching, scheduling and basic I/O functionalities. Notable modules include SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing. Apache Spark can run standalone, on Hadoop, or in the cloud, and is capable of accessing diverse data sources such as HDFS, Apache Cassandra, Apache HBase, and Amazon S3. \ No newline at end of file +Apache Spark is an open-source, distributed computing system used for big data processing and analytics. It offers an interface for programming entire clusters with impeccable data parallelism and fault tolerance. With its high-level APIs in Java, Scala, Python and R, it provides a framework for distributed task dispatching, scheduling and basic I/O functionalities. Notable modules include SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for incremental computation and stream processing. Apache Spark can run standalone, on Hadoop, or in the cloud, and is capable of accessing diverse data sources such as HDFS, Apache Cassandra, Apache HBase, and Amazon S3. + +Visit the following resources to learn more: + +- [@official@ApacheSpark](https://spark.apache.org/documentation.html) +- [@article@Spark By Examples](https://sparkbyexamples.com) +- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)