1
0
mirror of https://github.com/kamranahmedse/developer-roadmap.git synced 2025-09-25 00:21:28 +02:00

chore: sync content to repo

This commit is contained in:
kamranahmedse
2025-09-03 11:42:00 +00:00
committed by Kamran Ahmed
parent dd12cf1c99
commit ba1e5a58b5
166 changed files with 311 additions and 363 deletions

View File

@@ -5,4 +5,4 @@ Amazon Elastic Compute Cloud (EC2) is a web service that provides secure, resiza
Visit the following resources to learn more:
- [@official@EC2 - User Guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/concepts.html)
- [@video@Introduction to Amazon EC2](https://www.youtube.com/watch?v=eaicwmnSdCs)
- [@video@Introduction to Amazon EC2](https://www.youtube.com/watch?v=eaicwmnSdCs)

View File

@@ -4,4 +4,4 @@ Amazon RDS (Relational Database Service) is a web service from Amazon Web Servic
Visit the following resources to learn more:
- [@official@Amazon RDS](https://aws.amazon.com/rds/)
- [@official@Amazon RDS](https://aws.amazon.com/rds/)

View File

@@ -5,8 +5,8 @@ Apache Kafka is an open-source stream-processing software platform developed by
Visit the following resources to learn more:
- [@official@Apache Kafka](https://kafka.apache.org/quickstart)
- [@offical@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
- [@offical@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
- [@article@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
- [@article@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4)
- [@video@Kafka in 100 Seconds](https://www.youtube.com/watch?v=uvb00oaa3k8)
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@ApacheSpark](https://spark.apache.org/documentation.html)
- [@article@Spark By Examples](https://sparkbyexamples.com)
- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)
- [@feed@Explore top posts about Apache Spark](https://app.daily.dev/tags/spark?ref=roadmapsh)

View File

@@ -1,4 +1,4 @@
# APIs and Data Collection
# APIs and Data Collection
Application Programming Interfaces, better known as APIs, play a fundamental role in the work of data engineers, particularly in the process of data collection. APIs are sets of protocols, routines, and tools that enable different software applications to communicate with each other. An API allows developers to interact with a service or platform through a defined set of rules and endpoints, enabling data exchange and functionality use without needing to understand the underlying code. In data engineering, APIs are used extensively to collect, exchange, and manipulate data from different sources in a secure and efficient manner.

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@official@Argo CD - Argo Project](https://argo-cd.readthedocs.io/en/stable/)
- [@video@ArgoCD Tutorial for Beginners](https://www.youtube.com/watch?v=MeU5_k9ssrs)
- [@video@What is ArgoCD](https://www.youtube.com/watch?v=p-kAqxuJNik)
- [@feed@Explore top posts about ArgoCD](https://app.daily.dev/tags/argocd?ref=roadmapsh)
- [@feed@Explore top posts about ArgoCD](https://app.daily.dev/tags/argocd?ref=roadmapsh)

View File

@@ -5,5 +5,4 @@ Amazon Aurora (Aurora) is a fully managed relational database engine that's comp
Visit the following resources to learn more:
- [@official@SAmazon Aurora](https://aws.amazon.com/rds/aurora/)
- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora)
- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora)

View File

@@ -4,5 +4,5 @@ Authentication and authorization are popular terms in modern computer systems th
Visit the following resources to learn more:
- [@roadmap.sh@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
- [@article@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization)

View File

@@ -4,8 +4,8 @@ The AWS Cloud Development Kit (AWS CDK) is an open-source software development f
Visit the following resources to learn more:
- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
- [@official@AWS CDK](https://aws.amazon.com/cdk/)
- [@official@AWS CDK Documentation](https://docs.aws.amazon.com/cdk/index.html)
- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
- [@opensource@AWS CDK Examples](https://github.com/aws-samples/aws-cdk-examples)
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)

View File

@@ -5,4 +5,4 @@ Amazon Elastic Kubernetes Service (EKS) is a managed service that simplifies the
Visit the following resources to learn more:
- [@official@Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/)
- [@official@Concepts of Amazon EKS](https://docs.aws.amazon.com/eks/)
- [@official@Concepts of Amazon EKS](https://docs.aws.amazon.com/eks/)

View File

@@ -6,5 +6,4 @@ Visit the following resources to learn more:
- [@official@Amazon Simple Notification Service (SNS) ](http://aws.amazon.com/sns/)
- [@official@Send Fanout Event Notifications](https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/)
- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/)
- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/)

View File

@@ -6,5 +6,4 @@ Visit the following resources to learn more:
- [@official@Amazon Simple Queue Service](https://aws.amazon.com/sqs/)
- [@official@What is Amazon Simple Queue Service?](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html)
- [@article@Amazon Simple Queue Service (SQS): A Comprehensive Tutorial](https://www.datacamp.com/tutorial/amazon-sqs)
- [@article@Amazon Simple Queue Service (SQS): A Comprehensive Tutorial](https://www.datacamp.com/tutorial/amazon-sqs)

View File

@@ -1,9 +1,9 @@
# Azure Blob Storage
Azure Blob Storage is Microsoft's object storage solution for the cloud. “Blob” stands for Binary Large Object, a term used to describe storage for unstructured data like text, images, and video. Azure Blob Storage is Microsoft Azures solution for storing these blobs in the cloud. It offers flexible storage—you only pay based on your usage. Depending on the access speed you need for your data, you can choose from various storage tiers (hot, cool, and archive). Being cloud-based, it is scalable, secure, and easy to manage.
Azure Blob Storage is Microsoft's object storage solution for the cloud. “Blob” stands for Binary Large Object, a term used to describe storage for unstructured data like text, images, and video. Azure Blob Storage is Microsoft Azures solution for storing these blobs in the cloud. It offers flexible storage—you only pay based on your usage. Depending on the access speed you need for your data, you can choose from various storage tiers (hot, cool, and archive). Being cloud-based, it is scalable, secure, and easy to manage.
Visit the following resources to learn more:
- [@official@Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs)
- [@official@Introduction to Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
- [@video@A Beginners Guide to Azure Blob Storage](https://www.youtube.com/watch?v=ah1XqItWkuc&t=300s)
- [@video@A Beginners Guide to Azure Blob Storage](https://www.youtube.com/watch?v=ah1XqItWkuc&t=300s)

View File

@@ -1,10 +1,10 @@
# Azure SQL Database
Azure SQL Database is a fully managed Platform as a Service (PaaS) offering. It abstracts the underlying infrastructure, enabling developers to focus on building and deploying applications without worrying about database maintenance tasks.
Azure SQL Database is a fully managed Platform as a Service (PaaS) offering. It abstracts the underlying infrastructure, enabling developers to focus on building and deploying applications without worrying about database maintenance tasks.
Visit the following resources to learn more:
- [@official@Azure SQL Database](https://azure.microsoft.com/en-us/products/azure-sql/database)
- [@official@What is Azure SQL Database?](https://learn.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview?view=azuresql)
- [@article@Azure SQL Database: Step-by-Step Setup and Management](https://www.datacamp.com/tutorial/azure-sql-database)
- [@video@Azure SQL for Beginners](https://www.youtube.com/playlist?list=PLlrxD0HtieHi5c9-i_Dnxw9vxBY-TqaeN)
- [@video@Azure SQL for Beginners](https://www.youtube.com/playlist?list=PLlrxD0HtieHi5c9-i_Dnxw9vxBY-TqaeN)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@Azure Virtual Machines](https://azure.microsoft.com/en-us/products/virtual-machines)
- [@official@Virtual Machines in Azure](https://learn.microsoft.com/en-us/azure/virtual-machines/overview)
- [@video@AVirtual Machines in Azure | Beginner's Guide](https://www.youtube.com/watch?v=_abaWXoQFZU)
- [@video@AVirtual Machines in Azure | Beginner's Guide](https://www.youtube.com/watch?v=_abaWXoQFZU)

View File

@@ -5,5 +5,4 @@ Batch processing is a method in which large volumes of collected data are proces
Visit the following resources to learn more:
- [@article@What is Batch Processing?](https://aws.amazon.com/what-is/batch-processing/)
- [@article@Batch And Streaming Demystified For Unification](https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/)
- [@article@Batch And Streaming Demystified For Unification](https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/)

View File

@@ -1,15 +1,15 @@
# Best Practices
1. **Ensure Reliability.** A robust messaging system must guarantee that messages arent lost, even during node failures or network issues. This means using acknowledgments, replication across multiple brokers, and durable storage on disk. These measures ensure that producers and consumers can recover seamlessly without data loss when something goes wrong.
2. **Design for Scalability.** Scalability should be baked in from the start. Partition topics strategically to distribute load across brokers and consumer groups, enabling horizontal scaling.
3. **Maintain Message Ordering.** For systems that depend on message sequence, ensure ordering within partitions and design producers to consistently route related messages to the same partition.
4. **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations.
6. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.
1. **Ensure Reliability.** A robust messaging system must guarantee that messages arent lost, even during node failures or network issues. This means using acknowledgments, replication across multiple brokers, and durable storage on disk. These measures ensure that producers and consumers can recover seamlessly without data loss when something goes wrong.
2. **Design for Scalability.** Scalability should be baked in from the start. Partition topics strategically to distribute load across brokers and consumer groups, enabling horizontal scaling.
3. **Maintain Message Ordering.** For systems that depend on message sequence, ensure ordering within partitions and design producers to consistently route related messages to the same partition.
4. **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations.
5. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.
Visit the following resources to learn more:
- [@article@Best Practices for Message Queue Architecture](https://abhishek-patel.medium.com/best-practices-for-message-queue-architecture-f69d47e3565)
- [@article@Best Practices for Message Queue Architecture](https://abhishek-patel.medium.com/best-practices-for-message-queue-architecture-f69d47e3565)

View File

@@ -1,6 +1,6 @@
# Big Data Tools
Big data tools are specialized software and platforms designed to handle the massive volume, velocity, and variety of data that traditional data processing tools cannot effectively manage. These tools provide the infrastructure, frameworks, and capabilities to process, analyze, and extract meaningful knowledge from vast datasets. They are essential for modern data-driven organizations seeking to gain insights, make informed decisions, and achieve a competitive advantage.
Big data tools are specialized software and platforms designed to handle the massive volume, velocity, and variety of data that traditional data processing tools cannot effectively manage. These tools provide the infrastructure, frameworks, and capabilities to process, analyze, and extract meaningful knowledge from vast datasets. They are essential for modern data-driven organizations seeking to gain insights, make informed decisions, and achieve a competitive advantage.
Hadoop and Spark are two of the most prominent frameworks in big data they handle the processing of large-scale data in very different ways. While Hadoop can be credited with democratizing the distributed computing paradigm through a robust storage system called HDFS and a computational model called MapReduce, Spark is changing the game with its in-memory architecture and flexible programming model.
@@ -8,5 +8,4 @@ Visit the following resources to learn more:
- [@article@What is Big Data?](https://cloud.google.com/learn/what-is-big-data?hl=en)
- [@article@Hadoop vs Spark: Which Big Data Framework Is Right For You?](https://www.datacamp.com/blog/hadoop-vs-spark)
- [@video@introduction to Big Data with Spark and Hadoop](http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy)
- [@video@introduction to Big Data with Spark and Hadoop](http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy)

View File

@@ -5,4 +5,4 @@ Bigtable is a high-performance, scalable database that excels at capturing, proc
Visit the following resources to learn more:
- [@official@Bigtable: Fast, Flexible NoSQL](https://cloud.google.com/bigtable?hl=en#scale-your-latency-sensitive-applications-with-the-nosql-pioneer)
- [@article@Google Bigtable](https://www.techtarget.com/searchdatamanagement/definition/Google-BigTable)
- [@article@Google Bigtable](https://www.techtarget.com/searchdatamanagement/definition/Google-BigTable)

View File

@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@article@What is business intelligence (BI)?](https://www.ibm.com/think/topics/business-intelligence)
- [@article@Business intelligence: A complete overview](https://www.tableau.com/business-intelligence/what-is-business-intelligence)
- [@video@What is business intelligence?](https://www.youtube.com/watch?v=l98-BcB3UIE)
- [@video@What is business intelligence?](https://www.youtube.com/watch?v=l98-BcB3UIE)

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@article@What is CAP Theorem?](https://www.bmc.com/blogs/cap-theorem/)
- [@article@An Illustrated Proof of the CAP Theorem](https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/)
- [@article@CAP Theorem and its applications in NoSQL Databases](https://www.ibm.com/uk-en/cloud/learn/cap-theorem)
- [@video@What is CAP Theorem?](https://www.youtube.com/watch?v=_RbsFXWRZ10)
- [@video@What is CAP Theorem?](https://www.youtube.com/watch?v=_RbsFXWRZ10)

View File

@@ -5,6 +5,6 @@ Apache Cassandra is a highly scalable, distributed NoSQL database designed to ha
Visit the following resources to learn more:
- [@official@Apache Cassandra](https://cassandra.apache.org/_/index.html)
- [article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm)
- [@article@article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm)
- [@video@Apache Cassandra - Course for Beginners](https://www.youtube.com/watch?v=J-cSy5MeMOA)
- [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh)
- [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh)

View File

@@ -1,10 +1,10 @@
# Census
Census is a reverse ETL platform that synchronizes data from a data warehouse to various business applications and SaaS apps like Salesforce and Hubspot. It's a crucial part of the modern data stack, enabling businesses to operationalize their data by making it available in the tools where teams work, like CRMs, marketing platforms, and more.
Census is a reverse ETL platform that synchronizes data from a data warehouse to various business applications and SaaS apps like Salesforce and Hubspot. It's a crucial part of the modern data stack, enabling businesses to operationalize their data by making it available in the tools where teams work, like CRMs, marketing platforms, and more.
Visit the following resources to learn more:
- [@official@Census](https://www.getcensus.com/reverse-etl)
- [@official@Census Documentation](https://developers.getcensus.com/getting-started/introduction)
- [@article@A starter guide to reverse ETL with Census](https://www.getcensus.com/blog/starter-guide-for-first-time-census-users)
- [@video@How to "Reverse ETL" with Census](https://www.youtube.com/watch?v=XkS7DQFHzbA)
- [@video@How to "Reverse ETL" with Census](https://www.youtube.com/watch?v=XkS7DQFHzbA)

View File

@@ -1,16 +1,16 @@
# Choosing the Right Technologies
The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:
- **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
- **Interoperability**. When choosing a technology or system, youll need to ensure that it interacts and operates smoothly with other technologies.
- **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
- **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
- **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
- **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:
* **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
* **Interoperability**. When choosing a technology or system, youll need to ensure that it interacts and operates smoothly with other technologies.
* **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
* **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
* **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
* **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
Visit the following resources to learn more:
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@Build hybrid and multicloud architectures using Google Cloud](https://cloud.google.com/architecture/hybrid-multicloud-patterns)
- [@article@The Unfulfilled Promise of Serverless](https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@The Unfulfilled Promise of Serverless](https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/)

View File

@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@article@What is CI/CD? Continuous Integration and Continuous Delivery](https://www.guru99.com/continuous-integration.html)
- [@article@Continuous Integration vs Delivery vs Deployment](https://www.guru99.com/continuous-integration-vs-delivery-vs-deployment.html)
- [@article@CI/CD Pipeline: Learn with Example](https://www.guru99.com/ci-cd-pipeline.html)
- [@article@CI/CD Pipeline: Learn with Example](https://www.guru99.com/ci-cd-pipeline.html)

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@official@CircleCI](https://circleci.com/)
- [@official@CircleCI Documentation](https://circleci.com/docs)
- [@official@Configuration Tutorial](https://circleci.com/docs/config-intro)
- [@feed@Explore top posts about CI/CD](https://app.daily.dev/tags/cicd?ref=roadmapsh)
- [@feed@Explore top posts about CI/CD](https://app.daily.dev/tags/cicd?ref=roadmapsh)

View File

@@ -1,15 +1,15 @@
# Cloud Architectures
Cloud architecture refers to how various cloud technology components, such as hardware, virtual resources, software capabilities, and virtual network systems interact and connect to create cloud computing environments. Cloud architecture dictates how components are integrated so that you can pool, share, and scale resources over a network. It acts as a blueprint that defines the best way to strategically combine resources to build a cloud environment for a specific business need.
Cloud architecture refers to how various cloud technology components, such as hardware, virtual resources, software capabilities, and virtual network systems interact and connect to create cloud computing environments. Cloud architecture dictates how components are integrated so that you can pool, share, and scale resources over a network. It acts as a blueprint that defines the best way to strategically combine resources to build a cloud environment for a specific business need.
Cloud architecture components can included, among others:
- A frontend platform
- A backend platform
- A cloud-based delivery model
- A network (internet, intranet, or intercloud)
* A frontend platform
* A backend platform
* A cloud-based delivery model
* A network (internet, intranet, or intercloud)
Visit the following resources to learn more:
- [@article@What is cloud architecture? - Google](https://cloud.google.com/learn/what-is-cloud-architecture)
- [@video@WWhat is Cloud Architecture and Common Models?](https://www.youtube.com/watch?v=zTP-bx495hU)
- [@video@WWhat is Cloud Architecture and Common Models?](https://www.youtube.com/watch?v=zTP-bx495hU)

View File

@@ -2,8 +2,8 @@
**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@Cloud Computing - IBM](https://www.ibm.com/think/topics/cloud-computing)
- [@article@What is Cloud Computing? - Azure](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-cloud-computing)
- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c)
- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c)

View File

@@ -4,6 +4,6 @@ Google Cloud SQL is a fully-managed, cost-effective and scalable database servic
Visit the following resources to learn more:
- [@official@Cloud SQL](https://cloud.google.com/sql)
- [@official@Cloud SQL overview](https://cloud.google.com/sql/docs/introduction)
- [@course@Cloud SQL](https://www.cloudskillsboost.google/course_templates/701)
- [@official@Cloud SQL](https://cloud.google.com/sql)
- [@official@Cloud SQL overview](https://cloud.google.com/sql/docs/introduction)

View File

@@ -1,6 +1,3 @@
# Cluster Computing Basics
Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities.
Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities.

View File

@@ -1,5 +1,5 @@
# Cluster Management Tools
Cluster management software maximizes the work that a cluster of computers can perform. A cluster manager balances workload to reduce bottlenecks, monitors the health of the elements of the cluster, and manages failover when an element fails. A cluster manager can also help a system administrator to perform administration tasks on elements in the cluster.
Cluster management software maximizes the work that a cluster of computers can perform. A cluster manager balances workload to reduce bottlenecks, monitors the health of the elements of the cluster, and manages failover when an element fails. A cluster manager can also help a system administrator to perform administration tasks on elements in the cluster.
Some of the most popular Cluster Management Tools are Kubernetes and Apache Hadoop YARN.

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What are columnar databases? Here are 35 examples.](https://www.tinybird.co/blog-posts/what-is-a-columnar-database)
- [@article@Columnar Databases](https://www.techtarget.com/searchdatamanagement/definition/columnar-database)
- [@video@WWhat is a Columnar Database? (vs. Row-oriented Database)](https://www.youtube.com/watch?v=1MnvuNg33pA)
- [@video@WWhat is a Columnar Database? (vs. Row-oriented Database)](https://www.youtube.com/watch?v=1MnvuNg33pA)

View File

@@ -1,11 +1,9 @@
# Compute Engine (Compute)
Compute Engine is a computing and hosting service that lets you create and run virtual machines on Google infrastructure. Compute Engine offers scale, performance, and value that lets you easily launch large compute clusters on Google's infrastructure. There are no upfront investments, and you can run thousands of virtual CPUs on a system that offers quick, consistent performance. You can configure and control Compute Engine resources using the Google Cloud console, the Google Cloud CLI, or using a REST-based API. You can also use a variety of programming languages to run Compute Engine, including Python, Go, and Java.
Visit the following resources to learn more:
- [@official@Compute Engine overview](https://cloud.google.com/compute/docs/overview)
- [@course@The Basics of Google Cloud Compute](https://www.cloudskillsboost.google/course_templates/754)
- [@video@WCompute Engine in a minute](https://www.youtube.com/watch?v=IuK4gQeHRcI)
- [@official@Compute Engine overview](https://cloud.google.com/compute/docs/overview)
- [@video@WCompute Engine in a minute](https://www.youtube.com/watch?v=IuK4gQeHRcI)

View File

@@ -1,6 +1,6 @@
# Containers & Orchestration
**Containers** are lightweight, portable, and isolated environments that package applications and their dependencies, enabling consistent deployment across different computing environments. They encapsulate software code, runtime, system tools, libraries, and settings, ensuring that the application runs the same regardless of where it's deployed. Containers share the host operating system's kernel, making them more efficient than traditional virtual machines.
**Containers** are lightweight, portable, and isolated environments that package applications and their dependencies, enabling consistent deployment across different computing environments. They encapsulate software code, runtime, system tools, libraries, and settings, ensuring that the application runs the same regardless of where it's deployed. Containers share the host operating system's kernel, making them more efficient than traditional virtual machines.
**Orchestration** refers to the automated coordination and management of complex IT systems. It involves combining multiple automated tasks and processes into a single workflow to achieve a specific goal. Orchestration is one of the key components of any software development process and it should never be avoided nor preferred over manual configuration. As an automation practice, orchestration helps to remove the chance of human error from the different steps of the data engineering lifecycle. This is all to ensure efficient resource utilization and consistency.
@@ -8,7 +8,7 @@ Visit the following resources to learn more:
- [@article@What are Containers?](https://cloud.google.com/learn/what-are-containers)
- [@article@Containers - The New Stack](https://thenewstack.io/category/containers/)
- [@article@An Introduction to Data Orchestration: Process and Benefits](https://www.datacamp.com/blog/introduction-to-data-orchestration-process-and-benefits)
- [@article@What is Container Orchestration?](https://www.redhat.com/en/topics/containers/what-is-container-orchestration)
- [@article@An Introduction to Data Orchestration: Process and Benefits](https://www.datacamp.com/blog/introduction-to-data-orchestration-process-and-benefits)
- [@article@What is Container Orchestration?](https://www.redhat.com/en/topics/containers/what-is-container-orchestration)
- [@video@What are Containers?](https://www.youtube.com/playlist?list=PLawsLZMfND4nz-WDBZIj8-nbzGFD4S9oz)
- [@video@Why You Need Data Orchestration](https://www.youtube.com/watch?v=ZtlS5-G-gng)
- [@video@Why You Need Data Orchestration](https://www.youtube.com/watch?v=ZtlS5-G-gng)

View File

@@ -1,11 +1,10 @@
# CosmosDB
Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such as.NET, Python, Java, and Node.js.
Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such [as.NET](http://as.NET), Python, Java, and Node.js.
Visit the following resources to learn more:
- [@official@What are Containers?](https://azure.microsoft.com/en-us/products/cosmos-db#FAQ)
- [@official@CAzure Cosmos DB - Database for the AI Era](https://learn.microsoft.com/en-us/azure/cosmos-db/introduction)
- [@article@CAzure Cosmos DB: A Global-Scale NoSQL Cloud Database](https://www.datacamp.com/tutorial/azure-cosmos-db)
- [@video@What is Azure Cosmos DB?](https://www.youtube.com/watch?v=hBY2YcaIOQM&)
- [@video@What is Azure Cosmos DB?](https://www.youtube.com/watch?v=hBY2YcaIOQM&)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@CouchDB](hhttps://couchdb.apache.org/)
- [@official@CouchDB Documentation](https://docs.couchdb.org/en/stable/intro/overview.html)
- [@article@What is CouchDB?](https://www.ibm.com/think/topics/couchdb)
- [@article@What is CouchDB?](https://www.ibm.com/think/topics/couchdb)

View File

@@ -2,16 +2,15 @@
Data Analytics involves extracting meaningful insights from raw data to drive decision-making processes. It includes a wide range of techniques and disciplines ranging from the simple data compilation to advanced algorithms and statistical analysis. Data analysts, as ambassadors of this domain, employ these techniques to answer various questions:
- Descriptive Analytics *(what happened in the past?)*
- Diagnostic Analytics *(why did it happened in the past?)*
- Predictive Analytics *(what will happen in the future?)*
- Prescriptive Analytics *(how can we make it happen?)*
* Descriptive Analytics _(what happened in the past?)_
* Diagnostic Analytics _(why did it happened in the past?)_
* Predictive Analytics _(what will happen in the future?)_
* Prescriptive Analytics _(how can we make it happen?)_
Visit the following resources to learn more:
- [@course@Introduction to Data Analytics](https://www.coursera.org/learn/introduction-to-data-analytics)
- [@article@The 4 Types of Data Analysis: Ultimate Guide](https://careerfoundry.com/en/blog/data-analytics/different-types-of-data-analysis/)
- [@article@What is Data Analysis? An Expert Guide With Examples](https://www.datacamp.com/blog/what-is-data-analysis-expert-guide)
- [@course@Introduction to Data Analytics](https://www.coursera.org/learn/introduction-to-data-analytics)
- [@video@Descriptive vs Diagnostic vs Predictive vs Prescriptive Analytics: What's the Difference?](https://www.youtube.com/watch?v=QoEpC7jUb9k)
- [@video@Types of Data Analytics](https://www.youtube.com/watch?v=lsZnSgxMwBA)
- [@video@Types of Data Analytics](https://www.youtube.com/watch?v=lsZnSgxMwBA)

View File

@@ -2,13 +2,12 @@
Before designing the technology archecture to collect and store data, you should consider the following factors:
- **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.
- **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.
- **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
- **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.
- **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isnt lost or corrupted during the data collection process.
* **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.
* **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.
* **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
* **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.
* **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isnt lost or corrupted during the data collection process.
Visit the following resources to learn more:
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)

View File

@@ -4,13 +4,13 @@ The data engineering lifecycle encompasses the entire process of transforming ra
It involves 4 steps:
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
Visit the following resources to learn more:
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)

View File

@@ -4,13 +4,13 @@ The data engineering lifecycle encompasses the entire process of transforming ra
It involves 4 steps:
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
1. Data Generation: Collecting data from various source systems.
2. Data Storage: Safely storing data for future processing and analysis.
3. Data Ingestion: Transforming and bringing data into a centralized system.
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
Visit the following resources to learn more:
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)

View File

@@ -1,6 +1,6 @@
# Data Engineering vs Data Science
Data engineering and data science are distinct but complementary roles within the field of data. Data engineering focuses on building and maintaining the infrastructure for data collection, storage, and processing, essentially creating the systems that make data available for downstream users. On the other hand, data science professionals, like data analysts and data scientists, uses that data to extract insights, build predictive models, and ultimately inform decision-making.
Data engineering and data science are distinct but complementary roles within the field of data. Data engineering focuses on building and maintaining the infrastructure for data collection, storage, and processing, essentially creating the systems that make data available for downstream users. On the other hand, data science professionals, like data analysts and data scientists, uses that data to extract insights, build predictive models, and ultimately inform decision-making.
Visit the following resources to learn more:

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What is a data fabric?](http://ibm.com/think/topics/data-fabric)
- [@article@Data Fabric defined](https://www.jamesserra.com/archive/2021/06/data-fabric-defined/)
- [@article@How Data Fabric Can Optimize Data Delivery](https://www.gartner.com/en/data-analytics/topics/data-fabric)
- [@article@How Data Fabric Can Optimize Data Delivery](https://www.gartner.com/en/data-analytics/topics/data-fabric)

View File

@@ -2,9 +2,9 @@
Data Factory, most commonly referring to Microsoft's Azure Data Factory, is a cloud-based data integration service that allows you to create, schedule, and orchestrate workflows to move and transform data from various sources into a centralized location for analysis. It provides tools for building Extract, Transform, and Load (ETL) pipelines, enabling businesses to prepare data for analytics, business intelligence, and other data-driven initiatives without extensive coding, thanks to its visual, code-free interface and native connectors.
Learn more from the following resources:
Visit the following resources to learn more:
- [@course@Microsoft Azure - Data Factory](https://www.coursera.org/learn/microsoft-azure---data-factory)
- [@official@What is Azure Data Factory?](https://learn.microsoft.com/en-us/azure/data-factory/introduction)
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)
- [@course@Microsoft Azure - Data Factory](https://www.coursera.org/learn/microsoft-azure---data-factory)
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)

View File

@@ -1,6 +1,6 @@
# Data Generation
Data generation refers to the different ways data is produced and generated. Thanks to progress in computing power and storage, as well as technology breakthrough in sensor technology (for example, IoT devices), the number of these so-called source systems is rapidly growing. Data is created in many ways, both analog and digital.
Data generation refers to the different ways data is produced and generated. Thanks to progress in computing power and storage, as well as technology breakthrough in sensor technology (for example, IoT devices), the number of these so-called source systems is rapidly growing. Data is created in many ways, both analog and digital.
**Analog data** refers to continuous, real-world information that is represented by a range of values. It can take on any value within a given range and is often used to describe physical quantities like temperature or sounds.
@@ -9,4 +9,4 @@ By contrast, **digital data** is either created by converting analog data to dig
Visit the following resources to learn more:
- [@article@The Concept of Data Generation](https://www.marktechpost.com/2023/02/27/the-concept-of-data-generation/)
- [@video@Analog vs. Digital](https://www.youtube.com/watch?v=zzvglgC5ut0)
- [@video@Analog vs. Digital](https://www.youtube.com/watch?v=zzvglgC5ut0)

View File

@@ -1,10 +1,10 @@
# Data Hub
A **data hub** is an architecture that provides a central point for the flow of data between multiple sources and applications, enabling organizations to collect, integrate, and manage data efficiently. Unlike traditional data storage solutions, a data hubs purpose focuses on data integration and accessibility. The design supports real-time data exchange, which makes accessing, analyzing, and acting on the data faster and easier.
A **data hub** is an architecture that provides a central point for the flow of data between multiple sources and applications, enabling organizations to collect, integrate, and manage data efficiently. Unlike traditional data storage solutions, a data hubs purpose focuses on data integration and accessibility. The design supports real-time data exchange, which makes accessing, analyzing, and acting on the data faster and easier.
A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services.
A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services.
Visit the following resources to learn more:
- [@article@Data hub](https://en.wikipedia.org/wiki/Data_hub)
- [@article@What is a Data Hub? Definition, 7 Key Benefits & Why You Might Need One](https://www.cdata.com/blog/what-is-a-data-hub)
- [@article@What is a Data Hub? Definition, 7 Key Benefits & Why You Might Need One](https://www.cdata.com/blog/what-is-a-data-hub)

View File

@@ -5,4 +5,4 @@ Data ingestion is the third step in the data engineering lifecycle. It entails t
Visit the following resources to learn more:
- [@article@What is Data Ingestion?](https://www.ibm.com/think/topics/data-ingestion)
- [@article@WData Ingestion](https://www.qlik.com/us/data-ingestion)
- [@article@WData Ingestion](https://www.qlik.com/us/data-ingestion)

View File

@@ -1,8 +1,8 @@
# Data Interoperability
Data interoperability is the ability of diverse systems and applications to access, exchange, and cooperatively use data in a coordinated and meaningful way, even across organizational boundaries. It ensures that data can flow freely, maintaining its integrity and context, allowing for improved efficiency, collaboration, and decision-making by breaking down data silos. Achieving data interoperability often relies on data standards, metadata, and common data elements to define how data is collected, formatted, and interpreted.
Data interoperability is the ability of diverse systems and applications to access, exchange, and cooperatively use data in a coordinated and meaningful way, even across organizational boundaries. It ensures that data can flow freely, maintaining its integrity and context, allowing for improved efficiency, collaboration, and decision-making by breaking down data silos. Achieving data interoperability often relies on data standards, metadata, and common data elements to define how data is collected, formatted, and interpreted.
Visit the following resources to learn more:
- [@article@Data Interoperability](https://www.sciencedirect.com/topics/computer-science/data-interoperability)
- [@article@What is Data Interoperability? Exploring the Process and Benefits](https://www.codelessplatforms.com/blog/what-is-data-interoperability/)
- [@article@What is Data Interoperability? Exploring the Process and Benefits](https://www.codelessplatforms.com/blog/what-is-data-interoperability/)

View File

@@ -1,8 +1,8 @@
# Data lakes
# Data lakes
**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary.
**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@Data Lake Definition](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-a-data-lake)
- [@video@What is a Data Lake?](https://www.youtube.com/watch?v=LxcH6z8TFpI)
- [@video@What is a Data Lake?](https://www.youtube.com/watch?v=LxcH6z8TFpI)

View File

@@ -2,7 +2,7 @@
**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in Data Engineering for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@What is Data Lineage? - IBM](https://www.ibm.com/topics/data-lineage)
- [@article@What is Data Lineage? - Datacamp](https://www.datacamp.com/blog/data-lineage)
- [@article@What is Data Lineage? - Datacamp](https://www.datacamp.com/blog/data-lineage)

View File

@@ -2,11 +2,8 @@
A data mart is a subset of a data warehouse, focused on a specific business function or department. A data mart is streamlined for quicker querying and a more straightforward setup, catering to the specialized needs of a particular team, or function. Data marts only hold data relevant to a specific department or business unit, enabling quicker access to specific datasets, and simpler management
Visit the following resources to learn more:
- [@article@What is a Data Mart?](https://www.ibm.com/think/topics/data-mart)
- [@article@WData Mart vs Data Warehouse: a Detailed Comparison](https://www.datacamp.com/blog/data-mart-vs-data-warehouse)
- [@video@Data Lake VS Data Warehouse VS Data Marts](https://www.youtube.com/watch?v=w9-WoReNKHk)
- [@video@Data Lake VS Data Warehouse VS Data Marts](https://www.youtube.com/watch?v=w9-WoReNKHk)

View File

@@ -1,9 +1,8 @@
# Data Masking
Data masking is a process that creates a copy of real data but replaces sensitive information with false but realistic-looking data, preserving the format and structure of the original data for non-production uses like software testing, training, and development. The goal is to protect confidential information and ensure compliance with data protection regulations by preventing unauthorized access to real sensitive data without compromising the usability of the data for other business functions.
Data masking is a process that creates a copy of real data but replaces sensitive information with false but realistic-looking data, preserving the format and structure of the original data for non-production uses like software testing, training, and development. The goal is to protect confidential information and ensure compliance with data protection regulations by preventing unauthorized access to real sensitive data without compromising the usability of the data for other business functions.
Visit the following resources to learn more:
- [@article@Data masking](https://en.wikipedia.org/wiki/Data_masking)
- [@article@What is data masking?](https://aws.amazon.com/what-is/data-masking/)
- [@article@What is data masking?](https://aws.amazon.com/what-is/data-masking/)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What Is a Data Mesh? - AWS](https://aws.amazon.com/what-is/data-mesh)
- [@article@What Is a Data Mesh? - Datacamp](https://www.datacamp.com/blog/data-mesh)
- [@video@Data Mesh Architecture](https://www.datamesh-architecture.com/)
- [@video@Data Mesh Architecture](https://www.datamesh-architecture.com/)

View File

@@ -2,12 +2,12 @@
A data model is a specification of data structures and business rules. It creates a visual representation of data and illustrates how different data elements are related to each other. Different techniques are employed depending on the complexity of the data and the goals. Below you can find a list with the most common data modelling techniques:
- **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).
- **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.
- **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors
- **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time
* **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).
* **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.
* **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors
* **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time
Visit the following resources to learn more:
- [@article@7 data modeling techniques and concepts for business](https://www.techtarget.com/searchdatamanagement/tip/7-data-modeling-techniques-and-concepts-for-business)
- [@articleData Modeling Explained: Techniques, Examples, and Best Practices](https://www.datacamp.com/blog/data-modeling)
- [@article@@articleData Modeling Explained: Techniques, Examples, and Best Practices](https://www.datacamp.com/blog/data-modeling)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What is Normalization in DBMS (SQL)? 1NF, 2NF, 3NF, BCNF Database with Example](https://www.guru99.com/database-normalization.html)
- [@video@Complete guide to Database Normalization in SQL](https://www.youtube.com/watch?v=rBPQ5fg_kiY)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)

View File

@@ -1,4 +1,3 @@
# Data Obfuscation
Statistical data obfuscation involves altering the values of sensitive data in a way that preserves the statistical properties and relationships within the data. It ensures that the masked data maintains the overall distribution, patterns, and correlations of the original data for accurate statistical analysis. Statistical data obfuscation techniques include applying mathematical functions or perturbation algorithms to the data.
Statistical data obfuscation involves altering the values of sensitive data in a way that preserves the statistical properties and relationships within the data. It ensures that the masked data maintains the overall distribution, patterns, and correlations of the original data for accurate statistical analysis. Statistical data obfuscation techniques include applying mathematical functions or perturbation algorithms to the data.

View File

@@ -2,7 +2,7 @@
Data pipelines are a series of automated processes that transport and transform data from various sources to a destination for analysis or storage. They typically involve steps like data extraction, cleaning, transformation, and loading (ETL) into databases, data lakes, or warehouses. Pipelines can handle batch or real-time data, ensuring that large-scale datasets are processed efficiently and consistently. They play a crucial role in ensuring data integrity and enabling businesses to derive insights from raw data for reporting, analytics, or machine learning.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@What is a Data Pipeline? - IBM](https://www.ibm.com/topics/data-pipeline)
- [@video@What are Data Pipelines?](https://www.youtube.com/watch?v=oKixNpz6jNo)

View File

@@ -1,5 +1,5 @@
# Data Quality
Ensuring quality involves validating the accuracy, completeness, consistency, and reliability of the data collected from each source. The fact that you do it from one source or multiple is almost irrelevant since the only extra task would be to homogenize the final schema of the data, ensuring deduplication and normalization.
Ensuring quality involves validating the accuracy, completeness, consistency, and reliability of the data collected from each source. The fact that you do it from one source or multiple is almost irrelevant since the only extra task would be to homogenize the final schema of the data, ensuring deduplication and normalization.
This last part typically includes verifying the credibility of each data source, standardizing formats (like date/time or currency), performing schema alignment, and running profiling to detect anomalies, duplicates, or mismatches before integrating the data for analysis.
This last part typically includes verifying the credibility of each data source, standardizing formats (like date/time or currency), performing schema alignment, and running profiling to detect anomalies, duplicates, or mismatches before integrating the data for analysis.

View File

@@ -1,4 +1,3 @@
# Data Serving
Data serving is the last step in the data engineering process. Once the data is stored in your data architectures and transformed into coherent and useful format, it's time for get value from it. Data serving refers to the different ways data is used by downstream applications and users to create value. There are many ways companies can extract value from data, including training machine learning models, BI Analytics, and reverse ETL.
Data serving is the last step in the data engineering process. Once the data is stored in your data architectures and transformed into coherent and useful format, it's time for get value from it. Data serving refers to the different ways data is used by downstream applications and users to create value. There are many ways companies can extract value from data, including training machine learning models, BI Analytics, and reverse ETL.

View File

@@ -1,6 +1,6 @@
# Data Storage
Data storage is the process of saving and preserving digital information on various physical or cloud-based media for future retrieval and use. It encompasses the use of technologies and devices like hard drives and cloud platforms to store data.
Data storage is the process of saving and preserving digital information on various physical or cloud-based media for future retrieval and use. It encompasses the use of technologies and devices like hard drives and cloud platforms to store data.
Visit the following resources to learn more:

View File

@@ -6,8 +6,7 @@
Visit the following resources to learn more:
- [@video@Data Structures Illustrated](https://www.youtube.com/watch?v=9rhT3P1MDHk\&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY)
- [@article@Interview Questions about Data Structures](https://www.csharpstar.com/csharp-algorithms/)
- [@video@Data Structures Illustrated](https://www.youtube.com/watch?v=9rhT3P1MDHk&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY)
- [@video@Intro to Algorithms](https://www.youtube.com/watch?v=rL8X2mlNHPM)
- [@feed@Explore top posts about Algorithms](https://app.daily.dev/tags/algorithms?ref=roadmapsh)
- [@feed@Explore top posts about Algorithms](https://app.daily.dev/tags/algorithms?ref=roadmapsh)

View File

@@ -2,7 +2,7 @@
**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.
Learn more from the following resources:
Visit the following resources to learn more:
- [@article@What Is a Data Warehouse?](https://www.oracle.com/database/what-is-a-data-warehouse/)
- [@video@@hat is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg)

View File

@@ -11,7 +11,7 @@ Visit the following resources to learn more:
- [@article@Oracle: What is a Database?](https://www.oracle.com/database/what-is-database/)
- [@article@Prisma.io: What are Databases?](https://www.prisma.io/dataguide/intro/what-are-databases)
- [@article@Intro To Relational Databases](https://www.udacity.com/course/intro-to-relational-databases--ud197)
- [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc)
- [@article@NoSQL Explained](https://www.mongodb.com/nosql-explained)
- [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc)
- [@video@How do NoSQL Databases work](https://www.youtube.com/watch?v=0buKQHokLK8)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)

View File

@@ -1,3 +1,3 @@
# Database
A database is an organized, structured collection of electronic data that is stored, managed, and accessed via a computer system, usually controlled by a Database Management System (DBMS). Databases organize various types of data, such as words, numbers, images, and videos, allowing users to easily retrieve, update, and modify it for various purposes, from managing customer information to analyzing business processes.
A database is an organized, structured collection of electronic data that is stored, managed, and accessed via a computer system, usually controlled by a Database Management System (DBMS). Databases organize various types of data, such as words, numbers, images, and videos, allowing users to easily retrieve, update, and modify it for various purposes, from managing customer information to analyzing business processes.

View File

@@ -4,8 +4,7 @@ Delta Lake is the optimized storage layer that provides the foundation for table
Visit the following resources to learn more:
- [@book@The Delta Lake Series — Fundamentals and Performance](https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance)
- [@official@What is Delta Lake in Databricks?](https://docs.databricks.com/aws/en/delta)
- [@article@Delta Table in Databricks: A Complete Guide](https://www.datacamp.com/tutorial/delta-table-in-databricks)
- [@video@Delta Lake](https://www.databricks.com/resources/demos/videos/lakehouse-platform/delta-lake)
- [@book@The Delta Lake Series — Fundamentals and Performance](https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance)
- [@video@Delta Lake](https://www.databricks.com/resources/demos/videos/lakehouse-platform/delta-lake)

View File

@@ -5,4 +5,4 @@ Datadog is a monitoring and analytics platform for large-scale applications. It
Visit the following resources to learn more:
- [@official@Datadog](https://www.datadoghq.com/)
- [@official@Datadog Documentation](https://docs.datadoghq.com/)
- [@official@Datadog Documentation](https://docs.datadoghq.com/)

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@official@Dataflow](https://cloud.google.com/products/dataflow)
- [@article@Dataflow](https://en.wikipedia.org/wiki/Google_Cloud_Dataflow)
- [@video@What is Google Dataflow](https://www.youtube.com/watch?v=KalJ0VuEM7s)
- [@video@What is Google Dataflow](https://www.youtube.com/watch?v=KalJ0VuEM7s)

View File

@@ -4,6 +4,6 @@ dbt, also known as the data build tool, is designed to simplify the management o
Visit the following resources to learn more:
- [@official@dbt](https://www.getdbt.com/product/what-is-dbt)
- [@official@dbt Documentation](https://docs.getdbt.com/docs/build/documentation)
- [@course@dbt Official Courses](https://learn.getdbt.com/catalog)
- [@official@dbt](https://www.getdbt.com/product/what-is-dbt)
- [@official@dbt Documentation](https://docs.getdbt.com/docs/build/documentation)

View File

@@ -1,15 +1,12 @@
# Declarative vs Imperative
When it comes to Infrastructure as Code (IaC), there are two fundamental styles: imperative and declarative.
When it comes to Infrastructure as Code (IaC), there are two fundamental styles: imperative and declarative.
In **imperative IaC**, you specify a list of steps the IaC tool should follow to provision a new resource. You tell your IaC tool how to create each environment using a sequence of command imperatives. Imperative IaC can offer more flexibility as it allows you to dictate each step. However, this can result in increased complexity. Popular imperative IaC tools are Chef and Puppet
In **declarative IaC**, you specify the name and properties of the infrastructure resources you wish to provision, and then the IaC tool figures out how to achieve that end result on its own. You declare to your IaC tool what you want, but not how to get there. Declarative IaC, while less flexible, tends to be simpler and more manageable. Terraform is the most popular declarative IaC tool
In **declarative IaC**, you specify the name and properties of the infrastructure resources you wish to provision, and then the IaC tool figures out how to achieve that end result on its own. You declare to your IaC tool what you want, but not how to get there. Declarative IaC, while less flexible, tends to be simpler and more manageable. Terraform is the most popular declarative IaC tool
Visit the following resources to learn more:
- [@article@Infrastructure as Code: From Imperative to Declarative and Back Again](https://thenewstack.io/infrastructure-as-code-from-imperative-to-declarative-and-back-again/)
- [@article@Declarative vs Imperative Programming for Infrastructure as Code (IaC)](https://www.copado.com/resources/blog/declarative-vs-imperative-programming-for-infrastructure-as-code-iac)
- [@article@Declarative vs Imperative Programming for Infrastructure as Code (IaC)](https://www.copado.com/resources/blog/declarative-vs-imperative-programming-for-infrastructure-as-code-iac)

View File

@@ -1,7 +1,7 @@
# Distributed File Systems
A Distributed File System (DFS) allows multiple computers to access and share files across a network as if they were stored on a single local machine. It distributes data across multiple servers, enhancing accessibility and data redundancy. This enables users to access files from various locations and devices, promoting collaboration and data availability.
A Distributed File System (DFS) allows multiple computers to access and share files across a network as if they were stored on a single local machine. It distributes data across multiple servers, enhancing accessibility and data redundancy. This enables users to access files from various locations and devices, promoting collaboration and data availability.
Visit the following resources to learn more:
- [@article@What is a Distributed File System (DFS)? A Complete Guide](http://starwindsoftware.com/blog/what-is-a-distributed-file-system-dfs-a-complete-guide/)
- [@article@What is a Distributed File System (DFS)? A Complete Guide](http://starwindsoftware.com/blog/what-is-a-distributed-file-system-dfs-a-complete-guide/)

View File

@@ -4,6 +4,6 @@ A distributed system is a collection of independent computers that communicate a
Visit the following resources to learn more:
- [@video@Quick overview](https://www.youtube.com/watch?v=IJWwfMyPu1c)
- [@article@Introduction to Distributed Systems](https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/)
- [@article@Distributed Systems Guide](https://www.baeldung.com/cs/distributed-systems-guide)
- [@video@Quick overview](https://www.youtube.com/watch?v=IJWwfMyPu1c)

View File

@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@official@Docker Documentation](https://docs.docker.com/)
- [@video@Docker Tutorial](https://www.youtube.com/watch?v=RqTEHSBrYFw)
- [@video@Docker simplified in 55 seconds](https://youtu.be/vP_4DlOH1G4)
- [@feed@Explore top posts about Docker](https://app.daily.dev/tags/docker?ref=roadmapsh)
- [@feed@Explore top posts about Docker](https://app.daily.dev/tags/docker?ref=roadmapsh)

View File

@@ -1,6 +1,6 @@
# Document
**Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.
\*\*Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.
Visit the following resources to learn more:

View File

@@ -4,4 +4,4 @@ Amazon DynamoDB is a fully managed NoSQL database solution that provides fast an
Visit the following resources to learn more:
- [@official@Amazon DynamoDB](https://aws.amazon.com/dynamodb/)
- [@official@Amazon DynamoDB](https://aws.amazon.com/dynamodb/)

View File

@@ -2,9 +2,8 @@
The California Consumer Privacy Act (CCPA) is a California state law enacted in 2020 that protects and enforces the rights of Californians regarding the privacy of consumers personal information (PI).
Visit the following resources to learn more:
- [@official@California Consumer Privacy Act (CCPA)](https://oag.ca.gov/privacy/ccpa)
- [@article@What is the California Consumer Privacy Act (CCPA)?](https://www.ibm.com/think/topics/ccpa-compliance)
- [@video@What is the California Consumer Privacy Act? | CCPA Explained?](https://www.youtube.com/watch?v=dpzsAgrDAO4)
- [@video@What is the California Consumer Privacy Act? | CCPA Explained?](https://www.youtube.com/watch?v=dpzsAgrDAO4)

View File

@@ -7,4 +7,4 @@ Visit the following resources to learn more:
- [@official@Elasticsearch Website](https://www.elastic.co/elasticsearch/)
- [@official@Elasticsearch Documentation](https://www.elastic.co/guide/index.html)
- [@video@What is Elasticsearch](https://www.youtube.com/watch?v=ZP0NmfyfsoM)
- [@feed@Explore top posts about ELK](https://app.daily.dev/tags/elk?ref=roadmapsh)
- [@feed@Explore top posts about ELK](https://app.daily.dev/tags/elk?ref=roadmapsh)

View File

@@ -1,6 +1,6 @@
# Environmental Management
Environmental management, or Environment as Code (EaC) takes the concept of Infrastructure as Code (IaC) one step further. EaC applies DevOps principles to manage and automate entire software environments—including infrastructure, applications, and configurations—using code, making them reproducible, versionable, and reliable. It extends IaC by focusing not just on the underlying servers and networks but on the complete, connected system of services and applications that run on top of it. This approach helps increase efficiency, speeds up deployments, and provides a consistent, auditable process for creating and managing development, testing, and production environments.
Environmental management, or Environment as Code (EaC) takes the concept of Infrastructure as Code (IaC) one step further. EaC applies DevOps principles to manage and automate entire software environments—including infrastructure, applications, and configurations—using code, making them reproducible, versionable, and reliable. It extends IaC by focusing not just on the underlying servers and networks but on the complete, connected system of services and applications that run on top of it. This approach helps increase efficiency, speeds up deployments, and provides a consistent, auditable process for creating and managing development, testing, and production environments.
Visit the following resources to learn more:

View File

@@ -1,6 +1,6 @@
# ETL vs Reverse ETL
ETL (Extract, Transform, Load) is a key process in data warehousing, enabling the integration of data from multiple sources into a centralized database.
ETL (Extract, Transform, Load) is a key process in data warehousing, enabling the integration of data from multiple sources into a centralized database.
Reverse ETL emerged as organizations recognized that their carefully curated data warehouses, while excellent for analysis, created a new form of data silo that prevented operational teams from accessing valuable insights. This methodology addresses the critical gap between analytical insights and operational execution by systematically moving processed data from centralized repositories back to the operational systems where business teams interact with customers and manage daily operations.
@@ -8,4 +8,4 @@ Visit the following resources to learn more:
- [@article@What is ETL?](https://www.snowflake.com/guides/what-etl)
- [@article@ETL vs Reverse ETL vs Data Activation](https://airbyte.com/data-engineering-resources/etl-vs-reverse-etl-vs-data-activation)
- [@article@ETL vs Reverse ETL: An Overview, Key Differences, & Use Cases](https://portable.io/learn/etl-vs-reverse-etl)
- [@article@ETL vs Reverse ETL: An Overview, Key Differences, & Use Cases](https://portable.io/learn/etl-vs-reverse-etl)

View File

@@ -1,12 +1,12 @@
# EU AI Act
he Artificial Intelligence Act of the European Union, also known as the EU AI Act, is a comprehensive regulatory framework that is established to ensure safety and that fundamental human rights are upheld in the use of AI technologies. It governs the development and/or use of AI in the European Union. The act takes a risk-based approach to regulation, applying different rules to AI systems according to the risk they pose.
he Artificial Intelligence Act of the European Union, also known as the EU AI Act, is a comprehensive regulatory framework that is established to ensure safety and that fundamental human rights are upheld in the use of AI technologies. It governs the development and/or use of AI in the European Union. The act takes a risk-based approach to regulation, applying different rules to AI systems according to the risk they pose.
Considered the world's first comprehensive regulatory framework for AI, the EU AI Act prohibits some AI uses outright and implements strict governance, risk management and transparency requirements for others.
Considered the world's first comprehensive regulatory framework for AI, the EU AI Act prohibits some AI uses outright and implements strict governance, risk management and transparency requirements for others.
Visit the following resources to learn more:
- [@official@The EU AI Act Explorer](https://artificialintelligenceact.eu/ai-act-explorer/)
- [@article@AI Act - European Commission](https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai)
- [@article@Artificial Intelligence Act](https://en.wikipedia.org/wiki/Artificial_Intelligence_Act)
- [@video@The EU AI Act Explained](https://www.youtube.com/watch?v=s_rxOnCt3HQ)
- [@video@The EU AI Act Explained](https://www.youtube.com/watch?v=s_rxOnCt3HQ)

View File

@@ -1,3 +1,3 @@
# Extract Data
The first step in ETL processes involves extract data from data sources to a staging area. Data can come in various types and formats, from SQL or NoSQL databases and plan text to image and video files.
The first step in ETL processes involves extract data from data sources to a staging area. Data can come in various types and formats, from SQL or NoSQL databases and plan text to image and video files.

View File

@@ -6,4 +6,4 @@ Visit the following resources to learn more:
- [@article@What is Functional Testing? Types & Examples](https://www.guru99.com/functional-testing.html)
- [@article@Functional Testing : A Detailed Guide](https://www.browserstack.com/guide/functional-testing)
- [@feed@Explore top posts about Testing](https://app.daily.dev/tags/testing?ref=roadmapsh)
- [@feed@Explore top posts about Testing](https://app.daily.dev/tags/testing?ref=roadmapsh)

View File

@@ -2,7 +2,7 @@
The General Data Protection Regulation (GDPR) is an essential standard in API Design that addresses the storage, transfer, and processing of personal data of individuals within the European Union. With regards to API Design, considerations must be given on how APIs handle, process, and secure the data to conform with GDPR's demands on data privacy and security. This includes requirements for explicit consent, right to erasure, data portability, and privacy by design. Non-compliance with these standards not only leads to hefty fines but may also erode trust from users and clients. As such, understanding the impact and integration of GDPR within API design is pivotal for organizations handling EU residents' data.
Learn more from the following resources:
Visit the following resources to learn more:
- [@official@GDPR](https://gdpr-info.eu/)
- [@article@What is GDPR Compliance in Web Application and API Security?](https://probely.com/blog/what-is-gdpr-compliance-in-web-application-and-api-security/)

View File

@@ -12,4 +12,4 @@ Visit the following resources to learn more:
- [@article@Learn Git with Tutorials, News and Tips - Atlassian](https://www.atlassian.com/git)
- [@article@Git Cheat Sheet](https://cs.fyi/guide/git-cheatsheet)
- [@video@What is GitHub?](https://www.youtube.com/watch?v=w3jLJU7DT5E)
- [@video@Git & GitHub Crash Course For Beginners](https://www.youtube.com/watch?v=SWYqp7iY_Tc)
- [@video@Git & GitHub Crash Course For Beginners](https://www.youtube.com/watch?v=SWYqp7iY_Tc)

View File

@@ -2,6 +2,6 @@
GitHub Actions is a CI/CD tool integrated directly into GitHub, allowing developers to automate workflows, such as building, testing, and deploying code directly from their repositories. It uses YAML files to define workflows, which can be triggered by various events like pushes, pull requests, or on a schedule. GitHub Actions supports a wide range of actions and integrations, making it highly customizable for different project needs. It provides a marketplace with reusable workflows and actions contributed by the community. With its seamless integration with GitHub, developers can take advantage of features like matrix builds, secrets management, and environment-specific configurations to streamline and enhance their development and deployment processes.
Learn more from the following resources:
Visit the following resources to learn more:
- [@official@GitHub Actions Documentation](https://docs.github.com/en/actions)
- [@official@GitHub Actions Documentation](https://docs.github.com/en/actions)

View File

@@ -9,4 +9,4 @@ Visit the following resources to learn more:
- [@official@Get Started with GitLab CI](https://docs.gitlab.com/ee/ci/quick_start/)
- [@official@Learn GitLab Tutorials](https://docs.gitlab.com/ee/tutorials/)
- [@official@GitLab CI/CD Examples](https://docs.gitlab.com/ee/ci/examples/)
- [@feed@Explore top posts about GitLab](https://app.daily.dev/tags/gitlab?ref=roadmapsh)
- [@feed@Explore top posts about GitLab](https://app.daily.dev/tags/gitlab?ref=roadmapsh)

View File

@@ -0,0 +1,7 @@
# Amazon RDS (Database)
Amazon RDS (Relational Database Service) is a web service from Amazon Web Services. It's designed to simplify the setup, operation, and scaling of relational databases in the cloud. This service provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks. RDS supports six database engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. These engines give you the ability to run instances ranging from 5GB to 6TB of memory, accommodating your specific use case. It also ensures the database is up-to-date with the latest patches, automatically backs up your data and offers encryption at rest and in transit.
Visit the following resources to learn more:
- [@official@Amazon RDS](https://aws.amazon.com/rds/)

View File

@@ -6,6 +6,4 @@ Visit the following resources to learn more:
- [@official@BigQuery overview](https://cloud.google.com/bigquery/docs/introduction)
- [@official@From data warehouse to autonomous data and AI platform](https://cloud.google.com/bigquery)
- [@video@What is BigQuery?](https://www.youtube.com/watch?v=d3MDxC_iuaw)
- [@video@What is BigQuery?](https://www.youtube.com/watch?v=d3MDxC_iuaw)

View File

@@ -1,9 +1,11 @@
# undefined
## GKE - Google Kubernetes Engine
GKE - Google Kubernetes Engine
------------------------------
Google Kubernetes Engine (GKE) is a managed Kubernetes service provided by Google Cloud Platform. It allows organizations to deploy, manage, and scale containerized applications using Kubernetes orchestration. GKE automates cluster management tasks, including upgrades, scaling, and security patches, while providing integration with Google Cloud services. It offers features like auto-scaling, load balancing, and private clusters, enabling developers to focus on application development rather than infrastructure management.
Visit the following resources to learn more:
- [@official@GKE](https://cloud.google.com/kubernetes-engine)
- [@video@What is Google Kubernetes Engine (GKE)?](https://www.youtube.com/watch?v=Rl5M1CzgEH4)
- [@video@What is Google Kubernetes Engine (GKE)?](https://www.youtube.com/watch?v=Rl5M1CzgEH4)

View File

@@ -1,10 +1,9 @@
# Google Cloud Storage
Google Cloud Storage (GCS) is a scalable, secure, and durable object storage service within Google Cloud Platform (GCP) designed for storing and retrieving unstructured data of any type or size. It allows users to store data in "buckets" and access it through APIs, web interfaces, or command-line tools for applications, backups, media hosting, and big data analytics. GCS offers different storage classes to optimize costs based on data access frequency, strong security with encryption, and high availability through redundant data storage across multiple locations.
Google Cloud Storage (GCS) is a scalable, secure, and durable object storage service within Google Cloud Platform (GCP) designed for storing and retrieving unstructured data of any type or size. It allows users to store data in "buckets" and access it through APIs, web interfaces, or command-line tools for applications, backups, media hosting, and big data analytics. GCS offers different storage classes to optimize costs based on data access frequency, strong security with encryption, and high availability through redundant data storage across multiple locations.
Visit the following resources to learn more:
- [@article@Cloud Storage](https://cloud.google.com/storage)
- [@article@Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage)
- [@article@Cloud Storage in a minute](https://www.youtube.com/watch?v=wNOs3LlsH6k)
- [@article@Cloud Storage in a minute](https://www.youtube.com/watch?v=wNOs3LlsH6k)

View File

@@ -1,13 +1,10 @@
# Google Deployment Mgr.
Google Cloud Deployment Manager is an infrastructure deployment service that automates the creation and management of Google Cloud resources. It provides users with flexible template and configuration files to create deployments that have a variety of Google Cloud services, such as Cloud Storage, Compute Engine, and Cloud SQL, configured to work together.
Google Cloud Deployment Manager is an infrastructure deployment service that automates the creation and management of Google Cloud resources. It provides users with flexible template and configuration files to create deployments that have a variety of Google Cloud services, such as Cloud Storage, Compute Engine, and Cloud SQL, configured to work together.
Important, Google Deployment Manager will reach end of support on 31 December 2025. An alternative to this tool is **Google Infrastructure Manager**. Infrastructure Manager (Infra Manager) automates the deployment and management of Google Cloud infrastructure resources using Terraform. Infra Manager allows users to deploy programmatically to Google Cloud, allowing to use this service rather than maintaining a different toolchain to work with Terraform on Google Cloud.
Visit the following resources to learn more:
- [@official@Infrastructure Manager Overview](https://cloud.google.com/infrastructure-manager/docs/overview)
- [@official@Google Cloud Deployment Manager documentation](https://cloud.google.com/deployment-manager/docs)
- [@official@Google Cloud Deployment Manager documentation](https://cloud.google.com/deployment-manager/docs)

View File

@@ -9,4 +9,4 @@ Visit the following resources to learn more:
- [@article@What is a Graph database?](https://aws.amazon.com/nosql/graph/)
- [@article@What is A Graph Database? A Beginner's Guide](https://www.datacamp.com/blog/what-is-a-graph-database)
- [@article@Graph database](https://en.wikipedia.org/wiki/Graph_database)
- [@video@Introduction to NoSQL](https://www.youtube.com/watch?v=qI_g07C_Q5I)
- [@video@Introduction to NoSQL](https://www.youtube.com/watch?v=qI_g07C_Q5I)

View File

@@ -1,10 +1,9 @@
# HBase
HBase is a column-oriented No-SQL database management system that runs on top of Hadoop Distributed File System (HDFS), a main component of Apache Hadoop. HBase provides a fault-tolerant way of storing sparse data sets, which are common in many big data use cases. It is well suited for real-time data processing or random read/write access to large volumes of data. HBase applications are written in Java™ much like a typical Apache MapReduce application.
HBase is a column-oriented No-SQL database management system that runs on top of Hadoop Distributed File System (HDFS), a main component of Apache Hadoop. HBase provides a fault-tolerant way of storing sparse data sets, which are common in many big data use cases. It is well suited for real-time data processing or random read/write access to large volumes of data. HBase applications are written in Java™ much like a typical Apache MapReduce application.
Visit the following resources to learn more:
- [@official@Apacha HBase?](https://hbase.apache.org/)
- [@article@What is HBase?](https://www.ibm.com/think/topics/hbase)
- [@article@Apache HBase](https://en.wikipedia.org/wiki/Apache_HBase)
- [@article@Apache HBase](https://en.wikipedia.org/wiki/Apache_HBase)

View File

@@ -1,10 +1,9 @@
# HDFS
HDFS (Hadoop Distributed File System) is Hadoops primary storage system. It is designed to reliably store data across a cluster of machines. Its architecture is set up for this type of access to large datasets and is optimized for fault tolerance, scalability, and data locality.
HDFS (Hadoop Distributed File System) is Hadoops primary storage system. It is designed to reliably store data across a cluster of machines. Its architecture is set up for this type of access to large datasets and is optimized for fault tolerance, scalability, and data locality.
Visit the following resources to learn more:
- [@official@HDFS Architecture Guide](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html)
- [@article@Hadoop Distributed File System (HDFS)](https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs)
- [@article@What is Hadoop Distributed File System (HDFS)?](https://www.ibm.com/think/topics/hdfs)
- [@article@What is Hadoop Distributed File System (HDFS)?](https://www.ibm.com/think/topics/hdfs)

View File

@@ -1,10 +1,9 @@
# HDFS
HDFS (Hadoop Distributed File System) is Hadoops primary storage system. It is designed to reliably store data across a cluster of machines. Its architecture is set up for this type of access to large datasets and is optimized for fault tolerance, scalability, and data locality.
HDFS (Hadoop Distributed File System) is Hadoops primary storage system. It is designed to reliably store data across a cluster of machines. Its architecture is set up for this type of access to large datasets and is optimized for fault tolerance, scalability, and data locality.
Visit the following resources to learn more:
- [@official@HDFS Architecture Guide](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html)
- [@article@Hadoop Distributed File System (HDFS)](https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs)
- [@article@What is Hadoop Distributed File System (HDFS)?](https://www.ibm.com/think/topics/hdfs)
- [@article@What is Hadoop Distributed File System (HDFS)?](https://www.ibm.com/think/topics/hdfs)

View File

@@ -5,7 +5,4 @@ Hightouch is a reverse ETL and AI platform crafted for marketing and personaliza
Visit the following resources to learn more:
- [@official@Hightouch Docs](https://hightouch.com/docs)
- [@video@What is Hightouch? - The Data Activation Platform](https://www.youtube.com/watch?v=vMm87-MC7og)
- [@video@What is Hightouch? - The Data Activation Platform](https://www.youtube.com/watch?v=vMm87-MC7og)

View File

@@ -7,8 +7,4 @@ By contrast, vertical scaling involves increasing the computing power of individ
Visit the following resources to learn more:
- [@article@Horizontal Vs. Vertical Scaling: Which Should You Choose?](https://www.cloudzero.com/blog/horizontal-vs-vertical-scaling/)
- [@video@Vertical Vs Horizontal Scaling: Key Differences You Should Know](https://www.youtube.com/watch?v=dvRFHG2-uYs)
- [@video@Vertical Vs Horizontal Scaling: Key Differences You Should Know](https://www.youtube.com/watch?v=dvRFHG2-uYs)

View File

@@ -1,11 +1,9 @@
# Hybrid
Hybrid data ingestion combines aspects of both real-time and batch ingestion. This approach gives you the flexibility to adapt your data ingestion strategy as your needs evolve. For example, you could process data in real-time for critical applications and in batches for less time-sensitive tasks. Two common hybrid methods are Lambda architecture-based and micro-batching.
Hybrid data ingestion combines aspects of both real-time and batch ingestion. This approach gives you the flexibility to adapt your data ingestion strategy as your needs evolve. For example, you could process data in real-time for critical applications and in batches for less time-sensitive tasks. Two common hybrid methods are Lambda architecture-based and micro-batching.
Visit the following resources to learn more:
- [@article@What is Data Ingestion: Types, Tools, and Real-Life Use Cases](https://estuary.dev/blog/data-ingestion/)
- [@article@Lambda Architecture](https://www.databricks.com/glossary/lambda-architecture)
- [@article@What is Micro Batching: A Comprehensive Guide 101](https://hevodata.com/learn/micro-batching/)
- [@article@What is Micro Batching: A Comprehensive Guide 101](https://hevodata.com/learn/micro-batching/)

View File

@@ -5,5 +5,4 @@ Idempotency is a crucial concept in IaC. An idempotent operation produces the sa
Visit the following resources to learn more:
- [@article@Why idempotence was important to DevOps](https://dev.to/startpher/why-idempotence-was-important-to-devops-2jn3)
- [@article@Idempotency: The Secret to Seamless DevOps and Infrastructure](https://medium.com/@tiwari.sushil/idempotency-the-secret-to-seamless-devops-and-infrastructure-bf22e63e1be5)
- [@article@Idempotency: The Secret to Seamless DevOps and Infrastructure](https://medium.com/@tiwari.sushil/idempotency-the-secret-to-seamless-devops-and-infrastructure-bf22e63e1be5)

View File

@@ -2,4 +2,4 @@
Indexing is a data structure technique to efficiently retrieve data from a database. It essentially creates a lookup that can be used to quickly find the location of data records on a disk. Indexes are created using a few database columns and are capable of rapidly locating data without scanning every row in a database table each time the database table is accessed. Indexes can be created using any combination of columns in a database table, reducing the amount of time it takes to find data.
Indexes can be structured in several ways: Binary Tree, B-Tree, Hash Map, etc., each having its own particular strengths and weaknesses. When creating an index, it's crucial to understand which type of index to apply in order to achieve maximum efficiency. Indexes, like any other database feature, must be used wisely because they require disk space and need to be maintained, which can slow down insert and update operations.
Indexes can be structured in several ways: Binary Tree, B-Tree, Hash Map, etc., each having its own particular strengths and weaknesses. When creating an index, it's crucial to understand which type of index to apply in order to achieve maximum efficiency. Indexes, like any other database feature, must be used wisely because they require disk space and need to be maintained, which can slow down insert and update operations.

View File

@@ -6,5 +6,4 @@ Visit the following resources to learn more:
- [@article@What is Infrastructure as Code?](https://aws.amazon.com/what-is/iac/)
- [@article@Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_code)
- [@video@What is Infrastructure as Code?](https://www.youtube.com/watch?v=zWw2wuiKd5o)
- [@video@What is Infrastructure as Code?](https://www.youtube.com/watch?v=zWw2wuiKd5o)

View File

@@ -1,8 +1,8 @@
# Introduction
Data engineers are responsible for laying the foundations for the acquisition, storage, transformation, and management of data in an organization. They manage the design, creation, and maintenance of database architecture and data processing systems, ensuring that the subsequent work of analysis, BI, and machine learning model development can be carried out seamlessly, continuously, securely, and effectively.
Data engineers are responsible for laying the foundations for the acquisition, storage, transformation, and management of data in an organization. They manage the design, creation, and maintenance of database architecture and data processing systems, ensuring that the subsequent work of analysis, BI, and machine learning model development can be carried out seamlessly, continuously, securely, and effectively.
Data engineers are one of the most technical profiles in the field of data science, bridging the gap between software and application developers and traditional data science positions.
Data engineers are one of the most technical profiles in the field of data science, bridging the gap between software and application developers and traditional data science positions.
Visit the following resources to learn more:

Some files were not shown because too many files have changed in this diff Show More