mirror of
https://github.com/kamranahmedse/developer-roadmap.git
synced 2025-09-25 16:39:02 +02:00
chore: sync content to repo
This commit is contained in:
committed by
Kamran Ahmed
parent
dd12cf1c99
commit
ba1e5a58b5
@@ -5,8 +5,8 @@ Apache Kafka is an open-source stream-processing software platform developed by
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Apache Kafka](https://kafka.apache.org/quickstart)
|
||||
- [@offical@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
|
||||
- [@offical@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
|
||||
- [@article@Apache Kafka Streams](https://docs.confluent.io/platform/current/streams/concepts.html)
|
||||
- [@article@Kafka Streams Confluent](https://kafka.apache.org/documentation/streams/)
|
||||
- [@video@Apache Kafka Fundamentals](https://www.youtube.com/watch?v=B5j3uNBH8X4)
|
||||
- [@video@Kafka in 100 Seconds](https://www.youtube.com/watch?v=uvb00oaa3k8)
|
||||
- [@feed@Explore top posts about Kafka](https://app.daily.dev/tags/kafka?ref=roadmapsh)
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@SAmazon Aurora](https://aws.amazon.com/rds/aurora/)
|
||||
- [@article@SAmazon Aurora: What It Is, How It Works, and How to Get Started](https://www.datacamp.com/tutorial/amazon-aurora)
|
||||
|
||||
|
@@ -4,5 +4,5 @@ Authentication and authorization are popular terms in modern computer systems th
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap.sh@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
|
||||
- [@article@Basic Authentication](https://roadmap.sh/guides/basic-authentication)
|
||||
- [@article@What is Authentication vs Authorization?](https://auth0.com/intro-to-iam/authentication-vs-authorization)
|
@@ -4,8 +4,8 @@ The AWS Cloud Development Kit (AWS CDK) is an open-source software development f
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
|
||||
- [@official@AWS CDK](https://aws.amazon.com/cdk/)
|
||||
- [@official@AWS CDK Documentation](https://docs.aws.amazon.com/cdk/index.html)
|
||||
- [@course@AWS CDK Crash Course for Beginners](https://www.youtube.com/watch?v=D4Asp5g4fp8)
|
||||
- [@opensource@AWS CDK Examples](https://github.com/aws-samples/aws-cdk-examples)
|
||||
- [@feed@Explore top posts about AWS](https://app.daily.dev/tags/aws?ref=roadmapsh)
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@official@Amazon Simple Notification Service (SNS) ](http://aws.amazon.com/sns/)
|
||||
- [@official@Send Fanout Event Notifications](https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/)
|
||||
- [@article@What is Pub/Sub Messaging?](https://aws.amazon.com/what-is/pub-sub-messaging/)
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@official@Amazon Simple Queue Service](https://aws.amazon.com/sqs/)
|
||||
- [@official@What is Amazon Simple Queue Service?](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html)
|
||||
- [@article@Amazon Simple Queue Service (SQS): A Comprehensive Tutorial](https://www.datacamp.com/tutorial/amazon-sqs)
|
||||
|
||||
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is Batch Processing?](https://aws.amazon.com/what-is/batch-processing/)
|
||||
- [@article@Batch And Streaming Demystified For Unification](https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/)
|
||||
|
||||
|
@@ -8,7 +8,7 @@
|
||||
|
||||
4. **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations.
|
||||
|
||||
6. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.
|
||||
5. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
|
@@ -9,4 +9,3 @@ Visit the following resources to learn more:
|
||||
- [@article@What is Big Data?](https://cloud.google.com/learn/what-is-big-data?hl=en)
|
||||
- [@article@Hadoop vs Spark: Which Big Data Framework Is Right For You?](https://www.datacamp.com/blog/hadoop-vs-spark)
|
||||
- [@video@introduction to Big Data with Spark and Hadoop](http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy)
|
||||
|
||||
|
@@ -5,6 +5,6 @@ Apache Cassandra is a highly scalable, distributed NoSQL database designed to ha
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Apache Cassandra](https://cassandra.apache.org/_/index.html)
|
||||
- [article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm)
|
||||
- [@article@article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm)
|
||||
- [@video@Apache Cassandra - Course for Beginners](https://www.youtube.com/watch?v=J-cSy5MeMOA)
|
||||
- [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh)
|
@@ -2,15 +2,15 @@
|
||||
|
||||
The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:
|
||||
|
||||
- **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
|
||||
- **Interoperability**. When choosing a technology or system, you’ll need to ensure that it interacts and operates smoothly with other technologies.
|
||||
- **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
|
||||
- **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
|
||||
- **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
|
||||
- **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
|
||||
|
||||
* **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
|
||||
* **Interoperability**. When choosing a technology or system, you’ll need to ensure that it interacts and operates smoothly with other technologies.
|
||||
* **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
|
||||
* **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
|
||||
* **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
|
||||
* **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
- [@article@Build hybrid and multicloud architectures using Google Cloud](https://cloud.google.com/architecture/hybrid-multicloud-patterns)
|
||||
- [@article@The Unfulfilled Promise of Serverless](https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/)
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
@@ -4,10 +4,10 @@ Cloud architecture refers to how various cloud technology components, such as ha
|
||||
|
||||
Cloud architecture components can included, among others:
|
||||
|
||||
- A frontend platform
|
||||
- A backend platform
|
||||
- A cloud-based delivery model
|
||||
- A network (internet, intranet, or intercloud)
|
||||
* A frontend platform
|
||||
* A backend platform
|
||||
* A cloud-based delivery model
|
||||
* A network (internet, intranet, or intercloud)
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Cloud Computing - IBM](https://www.ibm.com/think/topics/cloud-computing)
|
||||
- [@article@What is Cloud Computing? - Azure](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-cloud-computing)
|
||||
|
@@ -4,6 +4,6 @@ Google Cloud SQL is a fully-managed, cost-effective and scalable database servic
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@course@Cloud SQL](https://www.cloudskillsboost.google/course_templates/701)
|
||||
- [@official@Cloud SQL](https://cloud.google.com/sql)
|
||||
- [@official@Cloud SQL overview](https://cloud.google.com/sql/docs/introduction)
|
||||
- [@course@Cloud SQL](https://www.cloudskillsboost.google/course_templates/701)
|
||||
|
@@ -1,6 +1,3 @@
|
||||
# Cluster Computing Basics
|
||||
|
||||
Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities.
|
||||
|
||||
|
||||
|
||||
|
@@ -1,11 +1,9 @@
|
||||
# Compute Engine (Compute)
|
||||
|
||||
|
||||
Compute Engine is a computing and hosting service that lets you create and run virtual machines on Google infrastructure. Compute Engine offers scale, performance, and value that lets you easily launch large compute clusters on Google's infrastructure. There are no upfront investments, and you can run thousands of virtual CPUs on a system that offers quick, consistent performance. You can configure and control Compute Engine resources using the Google Cloud console, the Google Cloud CLI, or using a REST-based API. You can also use a variety of programming languages to run Compute Engine, including Python, Go, and Java.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Compute Engine overview](https://cloud.google.com/compute/docs/overview)
|
||||
- [@course@The Basics of Google Cloud Compute](https://www.cloudskillsboost.google/course_templates/754)
|
||||
- [@official@Compute Engine overview](https://cloud.google.com/compute/docs/overview)
|
||||
- [@video@WCompute Engine in a minute](https://www.youtube.com/watch?v=IuK4gQeHRcI)
|
||||
|
||||
|
@@ -1,7 +1,6 @@
|
||||
# CosmosDB
|
||||
|
||||
Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such as.NET, Python, Java, and Node.js.
|
||||
|
||||
Azure Cosmos DB is a native No-SQL database service and vector database for working with the document data model. It can arbitrarily store native JSON documents with flexible schema. Data is indexed automatically and is available for query using a flavor of the SQL query language designed for JSON data. It also supports vector search. You can access the API using SDKs for popular frameworks such [as.NET](http://as.NET), Python, Java, and Node.js.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
|
@@ -2,16 +2,15 @@
|
||||
|
||||
Data Analytics involves extracting meaningful insights from raw data to drive decision-making processes. It includes a wide range of techniques and disciplines ranging from the simple data compilation to advanced algorithms and statistical analysis. Data analysts, as ambassadors of this domain, employ these techniques to answer various questions:
|
||||
|
||||
- Descriptive Analytics *(what happened in the past?)*
|
||||
- Diagnostic Analytics *(why did it happened in the past?)*
|
||||
- Predictive Analytics *(what will happen in the future?)*
|
||||
- Prescriptive Analytics *(how can we make it happen?)*
|
||||
* Descriptive Analytics _(what happened in the past?)_
|
||||
* Diagnostic Analytics _(why did it happened in the past?)_
|
||||
* Predictive Analytics _(what will happen in the future?)_
|
||||
* Prescriptive Analytics _(how can we make it happen?)_
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@course@Introduction to Data Analytics](https://www.coursera.org/learn/introduction-to-data-analytics)
|
||||
- [@article@The 4 Types of Data Analysis: Ultimate Guide](https://careerfoundry.com/en/blog/data-analytics/different-types-of-data-analysis/)
|
||||
- [@article@What is Data Analysis? An Expert Guide With Examples](https://www.datacamp.com/blog/what-is-data-analysis-expert-guide)
|
||||
- [@course@Introduction to Data Analytics](https://www.coursera.org/learn/introduction-to-data-analytics)
|
||||
- [@video@Descriptive vs Diagnostic vs Predictive vs Prescriptive Analytics: What's the Difference?](https://www.youtube.com/watch?v=QoEpC7jUb9k)
|
||||
- [@video@Types of Data Analytics](https://www.youtube.com/watch?v=lsZnSgxMwBA)
|
||||
|
||||
|
@@ -2,13 +2,12 @@
|
||||
|
||||
Before designing the technology archecture to collect and store data, you should consider the following factors:
|
||||
|
||||
- **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.
|
||||
- **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.
|
||||
- **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
|
||||
- **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.
|
||||
- **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isn’t lost or corrupted during the data collection process.
|
||||
* **Bounded versus unbounded**. Bounded data has defined start and end points, forming a finite, complete dataset, like the daily sales report. Unbounded data has no predefined limits in time or scope, flowing continuously and potentially indefinitely, such as user interaction events or real-time sensor data. The distinction is critical in data processing, where bounded data is suitable for batch processing, and unbounded data is processed in stream processing or real-time systems.
|
||||
* **Frequency.** Collection processes can be batch, micro-batch, or real-time, depending on the frequency you need to store the data.
|
||||
* **Synchronous versus asynchronous.** Synchronous ingestion is a process where the system waits for a response from the data source before proceeding. In contrast, asynchronous ingestion is a process where data is ingested without waiting for a response from the data source. Each approach has its benefits and drawbacks, and the choice depends on the specific requirements of the data ingestion process and the business needs.
|
||||
* **Throughput and scalability.** As data demands grow, you will need scalable ingestion solutions to keep pace. Scalable data ingestion pipelines ensure that systems can handle increasing data volumes without compromising performance. Without scalable ingestion, data pipelines face challenges like bottlenecks and data loss. Bottlenecks occur when components can't process data fast enough, leading to delays and reduced throughput. Data loss happens when systems are overwhelmed, causing valuable information to be discarded or corrupted.
|
||||
* **Reliability and durability.** Data reliability in the ingestion phase means ensuring that the acquired data from various sources is accurate, consistent, and trustworthy as it enters the data pipeline. Durability entails making sure that data isn’t lost or corrupted during the data collection process.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
|
||||
|
@@ -11,6 +11,6 @@ It involves 4 steps:
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
|
||||
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
|
@@ -11,6 +11,6 @@ It involves 4 steps:
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
|
||||
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
|
@@ -2,9 +2,9 @@
|
||||
|
||||
Data Factory, most commonly referring to Microsoft's Azure Data Factory, is a cloud-based data integration service that allows you to create, schedule, and orchestrate workflows to move and transform data from various sources into a centralized location for analysis. It provides tools for building Extract, Transform, and Load (ETL) pipelines, enabling businesses to prepare data for analytics, business intelligence, and other data-driven initiatives without extensive coding, thanks to its visual, code-free interface and native connectors.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@course@Microsoft Azure - Data Factory](https://www.coursera.org/learn/microsoft-azure---data-factory)
|
||||
- [@official@What is Azure Data Factory?](https://learn.microsoft.com/en-us/azure/data-factory/introduction)
|
||||
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)
|
||||
- [@course@Microsoft Azure - Data Factory](https://www.coursera.org/learn/microsoft-azure---data-factory)
|
||||
- [@official@Azure Data Factory Documentation](https://learn.microsoft.com/en-gb/azure/data-factory/)
|
@@ -2,7 +2,7 @@
|
||||
|
||||
**Data Lakes** are large-scale data repository systems that store raw, untransformed data, in various formats, from multiple sources. They're often used for big data and real-time analytics requirements. Data lakes preserve the original data format and schema which can be modified as necessary.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Data Lake Definition](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-a-data-lake)
|
||||
- [@video@What is a Data Lake?](https://www.youtube.com/watch?v=LxcH6z8TFpI)
|
@@ -2,7 +2,7 @@
|
||||
|
||||
**Data Lineage** refers to the life-cycle of data, including its origins, movements, characteristics and quality. It's a critical component in Data Engineering for tracking the journey of data through every process in a pipeline, from raw input to model output. Data lineage helps in maintaining transparency, ensuring compliance, and facilitating data debugging or tracing data related bugs. It provides a clear representation of data sources, transformations, and dependencies thereby aiding in audits, governance, or reproduction of machine learning models.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is Data Lineage? - IBM](https://www.ibm.com/topics/data-lineage)
|
||||
- [@article@What is Data Lineage? - Datacamp](https://www.datacamp.com/blog/data-lineage)
|
@@ -2,11 +2,8 @@
|
||||
|
||||
A data mart is a subset of a data warehouse, focused on a specific business function or department. A data mart is streamlined for quicker querying and a more straightforward setup, catering to the specialized needs of a particular team, or function. Data marts only hold data relevant to a specific department or business unit, enabling quicker access to specific datasets, and simpler management
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is a Data Mart?](https://www.ibm.com/think/topics/data-mart)
|
||||
- [@article@WData Mart vs Data Warehouse: a Detailed Comparison](https://www.datacamp.com/blog/data-mart-vs-data-warehouse)
|
||||
- [@video@Data Lake VS Data Warehouse VS Data Marts](https://www.youtube.com/watch?v=w9-WoReNKHk)
|
||||
|
||||
|
||||
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@Data masking](https://en.wikipedia.org/wiki/Data_masking)
|
||||
- [@article@What is data masking?](https://aws.amazon.com/what-is/data-masking/)
|
||||
|
||||
|
@@ -2,12 +2,12 @@
|
||||
|
||||
A data model is a specification of data structures and business rules. It creates a visual representation of data and illustrates how different data elements are related to each other. Different techniques are employed depending on the complexity of the data and the goals. Below you can find a list with the most common data modelling techniques:
|
||||
|
||||
- **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).
|
||||
- **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.
|
||||
- **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors
|
||||
- **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time
|
||||
* **Entity-relationship modeling.** It's one of the most common techniques used to represent data. It's based on three elements: Entities (objects or things within the system), relationships (how these entities interact with each other), and attributes (properties of the entities).
|
||||
* **Dimensional modeling.** Dimensional modeling is widely used in data warehousing and analytics, where data is often represented in terms of facts and dimensions. This technique simplifies complex data by organizing it into a star or snowflake schema.
|
||||
* **Object-oriented modeling.** Object-oriented modeling is used to represent complex systems, where data and the functions that operate on it are encapsulated as objects. This technique is preferred for modeling applications with complex, interrelated data and behaviors
|
||||
* **NoSQL modeling.** NoSQL modeling techniques are designed for flexible, schema-less databases. These approaches are often used when data structures are less rigid or evolve over time
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@7 data modeling techniques and concepts for business](https://www.techtarget.com/searchdatamanagement/tip/7-data-modeling-techniques-and-concepts-for-business)
|
||||
- [@articleData Modeling Explained: Techniques, Examples, and Best Practices](https://www.datacamp.com/blog/data-modeling)
|
||||
- [@article@@articleData Modeling Explained: Techniques, Examples, and Best Practices](https://www.datacamp.com/blog/data-modeling)
|
@@ -1,4 +1,3 @@
|
||||
# Data Obfuscation
|
||||
|
||||
Statistical data obfuscation involves altering the values of sensitive data in a way that preserves the statistical properties and relationships within the data. It ensures that the masked data maintains the overall distribution, patterns, and correlations of the original data for accurate statistical analysis. Statistical data obfuscation techniques include applying mathematical functions or perturbation algorithms to the data.
|
||||
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
Data pipelines are a series of automated processes that transport and transform data from various sources to a destination for analysis or storage. They typically involve steps like data extraction, cleaning, transformation, and loading (ETL) into databases, data lakes, or warehouses. Pipelines can handle batch or real-time data, ensuring that large-scale datasets are processed efficiently and consistently. They play a crucial role in ensuring data integrity and enabling businesses to derive insights from raw data for reporting, analytics, or machine learning.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is a Data Pipeline? - IBM](https://www.ibm.com/topics/data-pipeline)
|
||||
- [@video@What are Data Pipelines?](https://www.youtube.com/watch?v=oKixNpz6jNo)
|
@@ -1,4 +1,3 @@
|
||||
# Data Serving
|
||||
|
||||
Data serving is the last step in the data engineering process. Once the data is stored in your data architectures and transformed into coherent and useful format, it's time for get value from it. Data serving refers to the different ways data is used by downstream applications and users to create value. There are many ways companies can extract value from data, including training machine learning models, BI Analytics, and reverse ETL.
|
||||
|
||||
|
@@ -6,8 +6,7 @@
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@video@Data Structures Illustrated](https://www.youtube.com/watch?v=9rhT3P1MDHk\&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY)
|
||||
- [@article@Interview Questions about Data Structures](https://www.csharpstar.com/csharp-algorithms/)
|
||||
- [@video@Data Structures Illustrated](https://www.youtube.com/watch?v=9rhT3P1MDHk&list=PLkZYeFmDuaN2-KUIv-mvbjfKszIGJ4FaY)
|
||||
- [@video@Intro to Algorithms](https://www.youtube.com/watch?v=rL8X2mlNHPM)
|
||||
- [@feed@Explore top posts about Algorithms](https://app.daily.dev/tags/algorithms?ref=roadmapsh)
|
||||
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What Is a Data Warehouse?](https://www.oracle.com/database/what-is-a-data-warehouse/)
|
||||
- [@video@@hat is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg)
|
@@ -11,7 +11,7 @@ Visit the following resources to learn more:
|
||||
- [@article@Oracle: What is a Database?](https://www.oracle.com/database/what-is-database/)
|
||||
- [@article@Prisma.io: What are Databases?](https://www.prisma.io/dataguide/intro/what-are-databases)
|
||||
- [@article@Intro To Relational Databases](https://www.udacity.com/course/intro-to-relational-databases--ud197)
|
||||
- [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc)
|
||||
- [@article@NoSQL Explained](https://www.mongodb.com/nosql-explained)
|
||||
- [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc)
|
||||
- [@video@How do NoSQL Databases work](https://www.youtube.com/watch?v=0buKQHokLK8)
|
||||
- [@feed@Explore top posts about Database](https://app.daily.dev/tags/database?ref=roadmapsh)
|
@@ -4,8 +4,7 @@ Delta Lake is the optimized storage layer that provides the foundation for table
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@book@The Delta Lake Series — Fundamentals and Performance](https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance)
|
||||
- [@official@What is Delta Lake in Databricks?](https://docs.databricks.com/aws/en/delta)
|
||||
- [@article@Delta Table in Databricks: A Complete Guide](https://www.datacamp.com/tutorial/delta-table-in-databricks)
|
||||
- [@video@Delta Lake](https://www.databricks.com/resources/demos/videos/lakehouse-platform/delta-lake)
|
||||
- [@book@The Delta Lake Series — Fundamentals and Performance](https://www.databricks.com/resources/ebook/the-delta-lake-series-fundamentals-performance)
|
||||
|
||||
|
@@ -4,6 +4,6 @@ dbt, also known as the data build tool, is designed to simplify the management o
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@course@dbt Official Courses](https://learn.getdbt.com/catalog)
|
||||
- [@official@dbt](https://www.getdbt.com/product/what-is-dbt)
|
||||
- [@official@dbt Documentation](https://docs.getdbt.com/docs/build/documentation)
|
||||
- [@course@dbt Official Courses](https://learn.getdbt.com/catalog)
|
||||
|
@@ -10,6 +10,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@Infrastructure as Code: From Imperative to Declarative and Back Again](https://thenewstack.io/infrastructure-as-code-from-imperative-to-declarative-and-back-again/)
|
||||
- [@article@Declarative vs Imperative Programming for Infrastructure as Code (IaC)](https://www.copado.com/resources/blog/declarative-vs-imperative-programming-for-infrastructure-as-code-iac)
|
||||
|
||||
|
||||
|
||||
|
@@ -4,6 +4,6 @@ A distributed system is a collection of independent computers that communicate a
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@video@Quick overview](https://www.youtube.com/watch?v=IJWwfMyPu1c)
|
||||
- [@article@Introduction to Distributed Systems](https://www.freecodecamp.org/news/a-thorough-introduction-to-distributed-systems-3b91562c9b3c/)
|
||||
- [@article@Distributed Systems Guide](https://www.baeldung.com/cs/distributed-systems-guide)
|
||||
- [@video@Quick overview](https://www.youtube.com/watch?v=IJWwfMyPu1c)
|
@@ -1,6 +1,6 @@
|
||||
# Document
|
||||
|
||||
**Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.
|
||||
\*\*Document Databases are a type of No-SQL databases that store data in JSON, BSON, or XML formats, allowing for flexible, semi-structured and hierarchical data structures. These databases are characterized by their dynamic schema, scalability through distribution, and ability to intuitively map data models to application code. Popular examples include MongoDB, which allows for easy storage and retrieval of varied data types without requiring a rigid, predefined schema.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
|
@@ -2,7 +2,6 @@
|
||||
|
||||
The California Consumer Privacy Act (CCPA) is a California state law enacted in 2020 that protects and enforces the rights of Californians regarding the privacy of consumers’ personal information (PI).
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@California Consumer Privacy Act (CCPA)](https://oag.ca.gov/privacy/ccpa)
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
The General Data Protection Regulation (GDPR) is an essential standard in API Design that addresses the storage, transfer, and processing of personal data of individuals within the European Union. With regards to API Design, considerations must be given on how APIs handle, process, and secure the data to conform with GDPR's demands on data privacy and security. This includes requirements for explicit consent, right to erasure, data portability, and privacy by design. Non-compliance with these standards not only leads to hefty fines but may also erode trust from users and clients. As such, understanding the impact and integration of GDPR within API design is pivotal for organizations handling EU residents' data.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@GDPR](https://gdpr-info.eu/)
|
||||
- [@article@What is GDPR Compliance in Web Application and API Security?](https://probely.com/blog/what-is-gdpr-compliance-in-web-application-and-api-security/)
|
@@ -2,6 +2,6 @@
|
||||
|
||||
GitHub Actions is a CI/CD tool integrated directly into GitHub, allowing developers to automate workflows, such as building, testing, and deploying code directly from their repositories. It uses YAML files to define workflows, which can be triggered by various events like pushes, pull requests, or on a schedule. GitHub Actions supports a wide range of actions and integrations, making it highly customizable for different project needs. It provides a marketplace with reusable workflows and actions contributed by the community. With its seamless integration with GitHub, developers can take advantage of features like matrix builds, secrets management, and environment-specific configurations to streamline and enhance their development and deployment processes.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@GitHub Actions Documentation](https://docs.github.com/en/actions)
|
@@ -0,0 +1,7 @@
|
||||
# Amazon RDS (Database)
|
||||
|
||||
Amazon RDS (Relational Database Service) is a web service from Amazon Web Services. It's designed to simplify the setup, operation, and scaling of relational databases in the cloud. This service provides cost-efficient, resizable capacity for an industry-standard relational database and manages common database administration tasks. RDS supports six database engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. These engines give you the ability to run instances ranging from 5GB to 6TB of memory, accommodating your specific use case. It also ensures the database is up-to-date with the latest patches, automatically backs up your data and offers encryption at rest and in transit.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Amazon RDS](https://aws.amazon.com/rds/)
|
@@ -7,5 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@official@BigQuery overview](https://cloud.google.com/bigquery/docs/introduction)
|
||||
- [@official@From data warehouse to autonomous data and AI platform](https://cloud.google.com/bigquery)
|
||||
- [@video@What is BigQuery?](https://www.youtube.com/watch?v=d3MDxC_iuaw)
|
||||
|
||||
|
||||
|
@@ -1,5 +1,7 @@
|
||||
# undefined
|
||||
|
||||
## GKE - Google Kubernetes Engine
|
||||
GKE - Google Kubernetes Engine
|
||||
------------------------------
|
||||
|
||||
Google Kubernetes Engine (GKE) is a managed Kubernetes service provided by Google Cloud Platform. It allows organizations to deploy, manage, and scale containerized applications using Kubernetes orchestration. GKE automates cluster management tasks, including upgrades, scaling, and security patches, while providing integration with Google Cloud services. It offers features like auto-scaling, load balancing, and private clusters, enabling developers to focus on application development rather than infrastructure management.
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@article@Cloud Storage](https://cloud.google.com/storage)
|
||||
- [@article@Google Cloud Storage](https://en.wikipedia.org/wiki/Google_Cloud_Storage)
|
||||
- [@article@Cloud Storage in a minute](https://www.youtube.com/watch?v=wNOs3LlsH6k)
|
||||
|
||||
|
@@ -8,6 +8,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@Infrastructure Manager Overview](https://cloud.google.com/infrastructure-manager/docs/overview)
|
||||
- [@official@Google Cloud Deployment Manager documentation](https://cloud.google.com/deployment-manager/docs)
|
||||
|
||||
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@official@Apacha HBase?](https://hbase.apache.org/)
|
||||
- [@article@What is HBase?](https://www.ibm.com/think/topics/hbase)
|
||||
- [@article@Apache HBase](https://en.wikipedia.org/wiki/Apache_HBase)
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@official@HDFS Architecture Guide](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html)
|
||||
- [@article@Hadoop Distributed File System (HDFS)](https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs)
|
||||
- [@article@What is Hadoop Distributed File System (HDFS)?](https://www.ibm.com/think/topics/hdfs)
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@official@HDFS Architecture Guide](https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html)
|
||||
- [@article@Hadoop Distributed File System (HDFS)](https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs)
|
||||
- [@article@What is Hadoop Distributed File System (HDFS)?](https://www.ibm.com/think/topics/hdfs)
|
||||
|
||||
|
@@ -6,6 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@Hightouch Docs](https://hightouch.com/docs)
|
||||
- [@video@What is Hightouch? - The Data Activation Platform](https://www.youtube.com/watch?v=vMm87-MC7og)
|
||||
|
||||
|
||||
|
||||
|
@@ -8,7 +8,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@Horizontal Vs. Vertical Scaling: Which Should You Choose?](https://www.cloudzero.com/blog/horizontal-vs-vertical-scaling/)
|
||||
- [@video@Vertical Vs Horizontal Scaling: Key Differences You Should Know](https://www.youtube.com/watch?v=dvRFHG2-uYs)
|
||||
|
||||
|
||||
|
||||
|
||||
|
@@ -2,10 +2,8 @@
|
||||
|
||||
Hybrid data ingestion combines aspects of both real-time and batch ingestion. This approach gives you the flexibility to adapt your data ingestion strategy as your needs evolve. For example, you could process data in real-time for critical applications and in batches for less time-sensitive tasks. Two common hybrid methods are Lambda architecture-based and micro-batching.
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is Data Ingestion: Types, Tools, and Real-Life Use Cases](https://estuary.dev/blog/data-ingestion/)
|
||||
- [@article@Lambda Architecture](https://www.databricks.com/glossary/lambda-architecture)
|
||||
- [@article@What is Micro Batching: A Comprehensive Guide 101](https://hevodata.com/learn/micro-batching/)
|
||||
|
||||
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@Why idempotence was important to DevOps](https://dev.to/startpher/why-idempotence-was-important-to-devops-2jn3)
|
||||
- [@article@Idempotency: The Secret to Seamless DevOps and Infrastructure](https://medium.com/@tiwari.sushil/idempotency-the-secret-to-seamless-devops-and-infrastructure-bf22e63e1be5)
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@article@What is Infrastructure as Code?](https://aws.amazon.com/what-is/iac/)
|
||||
- [@article@Infrastructure as Code](https://en.wikipedia.org/wiki/Infrastructure_as_code)
|
||||
- [@video@What is Infrastructure as Code?](https://www.youtube.com/watch?v=zWw2wuiKd5o)
|
||||
|
||||
|
@@ -7,4 +7,3 @@ Visit the following resources to learn more:
|
||||
- [@article@What is the Internet of Things (IoT)?](https://www.ibm.com/think/topics/internet-of-things)
|
||||
- [@article@Internet of Things](https://en.wikipedia.org/wiki/Internet_of_things)
|
||||
- [@video@What is IoT (Internet of Things)? An Introduction](https://www.youtube.com/watch?v=4FxU-xpuCww)
|
||||
|
||||
|
@@ -4,10 +4,10 @@ Java has had a big influence on data engineering because many core big data tool
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@courseIntroduction to Java by Hyperskill (JetBrains Academy)](https://hyperskill.org/courses/8)
|
||||
- [@book@Thinking in Java](https://www.amazon.co.uk/Thinking-Java-Eckel-Bruce-February/dp/B00IBON6C6)
|
||||
- [@article@Effective Java](https://www.amazon.com/Effective-Java-Joshua-Bloch/dp/0134685997)
|
||||
- [@book@Java: The Complete Reference](https://www.amazon.co.uk/gp/product/B09JL8BMK7/ref=dbs_a_def_rwt_bibl_vppi_i2)
|
||||
- [@article@@courseIntroduction to Java by Hyperskill (JetBrains Academy)](https://hyperskill.org/courses/8)
|
||||
- [@article@Effective Java](https://www.amazon.com/Effective-Java-Joshua-Bloch/dp/0134685997)
|
||||
- [@video@Java Tutorial for Beginners](https://www.youtube.com/watch?v=eIrMbAQSU34&feature=youtu.be)
|
||||
- [@video@Java + DSA + Interview Preparation Course (For beginners)](https://www.youtube.com/playlist?list=PL9gnSGHSqcnr_DxHsP7AW9ftq0AtAyYqJ)
|
||||
- [@feed@Explore top posts about Java](https://app.daily.dev/tags/java?ref=roadmapsh)
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@Job scheduler](https://en.wikipedia.org/wiki/Job_scheduler)
|
||||
- [@article@Cluster Resources — Job Scheduling](https://supun-kamburugamuve.medium.com/cluster-resources-job-scheduling-bb63644476bc)
|
||||
|
||||
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is a Key Value Database? - AWS](https://aws.amazon.com/nosql/key-value/)
|
||||
- [@article@What Is A Key-Value Database? - MongoDB](https://www.mongodb.com/resources/basics/databases/key-value-database)
|
||||
|
||||
|
@@ -8,7 +8,7 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@Kubernetes Website](https://kubernetes.io/)
|
||||
- [@official@Kubernetes Documentation](https://kubernetes.io/docs/home/)
|
||||
- [@video@Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
|
||||
- [@article@Primer: How Kubernetes Came to Be, What It Is, and Why You Should Care](https://thenewstack.io/primer-how-kubernetes-came-to-be-what-it-is-and-why-you-should-care/)
|
||||
- [@article@Kubernetes: An Overview](https://thenewstack.io/kubernetes-an-overview/)
|
||||
- [@video@Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
|
||||
- [@feed@Explore top posts about Kubernetes](https://app.daily.dev/tags/kubernetes?ref=roadmapsh)
|
@@ -8,7 +8,7 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@official@Kubernetes Website](https://kubernetes.io/)
|
||||
- [@official@Kubernetes Documentation](https://kubernetes.io/docs/home/)
|
||||
- [@video@Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
|
||||
- [@article@Primer: How Kubernetes Came to Be, What It Is, and Why You Should Care](https://thenewstack.io/primer-how-kubernetes-came-to-be-what-it-is-and-why-you-should-care/)
|
||||
- [@article@Kubernetes: An Overview](https://thenewstack.io/kubernetes-an-overview/)
|
||||
- [@video@Kubernetes Crash Course for Absolute Beginners](https://www.youtube.com/watch?v=s_o8dwzRlu4)
|
||||
- [@feed@Explore top posts about Kubernetes](https://app.daily.dev/tags/kubernetes?ref=roadmapsh)
|
@@ -5,8 +5,8 @@ Knowledge of UNIX is a must for almost all kind of development as most of the co
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Linux Roadmap](https://roadmap.sh/linux)
|
||||
- [@video@Linux Operating System - Crash Course](https://www.youtube.com/watch?v=ROjZy1WbCIA)
|
||||
- [@course@Coursera - Unix Courses](https://www.coursera.org/courses?query=unix)
|
||||
- [@article@Linux Basics](https://dev.to/rudrakshi99/linux-basics-2onj)
|
||||
- [@article@Unix / Linux Tutorial](https://www.tutorialspoint.com/unix/index.htm)
|
||||
- [@video@Linux Operating System - Crash Course](https://www.youtube.com/watch?v=ROjZy1WbCIA)
|
||||
- [@feed@Explore top posts about Linux](https://app.daily.dev/tags/linux?ref=roadmapsh)
|
@@ -1,4 +1,3 @@
|
||||
# Logs
|
||||
|
||||
Logs are files that record events, activities, and system operations over time. They provide a detailed historical record of what has happened within a system, including timestamps, event details, performance data, errors, and user actions. Logs are crucial for troubleshooting problems, monitoring system health and performance, investigating security incidents, and understanding how users interact with a system.
|
||||
|
||||
|
@@ -2,7 +2,6 @@
|
||||
|
||||
Looker is a Google cloud-based business intelligence and data analytics platform. It allows users to explore, analyze, and visualize data to gain insights and make data-driven decisions. Looker is known for its ability to connect to various data sources, create custom dashboards, and generate reports. It also facilitates the integration of analytics, visualizations, and relevant information into business processes.
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Looker business intelligence platform embedded analytics](https://cloud.google.com/looker)
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
Machine learning, a subset of artificial intelligence, is an indispensable tool in the hands of a data analyst. It provides the ability to automatically learn, improve from experience and make decisions without being explicitly programmed. In the context of a data analyst, machine learning contributes significantly in uncovering hidden insights, recognising patterns or making predictions based on large amounts of data. Through the use of varying algorithms and models, data analysts are able to leverage machine learning to convert raw data into meaningful information, making it a critical concept in data analysis.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@video@What is Machine Learning?](https://www.youtube.com/watch?v=9gGnTQTYNaE)
|
||||
- [@article@What is Machine Learning (ML)?](https://www.ibm.com/topics/machine-learning)
|
||||
- [@video@What is Machine Learning?](https://www.youtube.com/watch?v=9gGnTQTYNaE)
|
@@ -2,7 +2,7 @@
|
||||
|
||||
MapReduce is a prominent data processing technique used by Data Analysts around the world. It allows them to handle large data sets with complex, unstructured data efficiently. MapReduce breaks down a big data problem into smaller sub-tasks (Map) and then takes those results to create an output in a more usable format (Reduce). This technique is particularly useful in conducting exploratory analysis, as well as in handling big data operations such as text processing, graph processing, or more complicated machine learning algorithms.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@MapReduce](https://www.databricks.com/glossary/mapreduce)
|
||||
- [@article@What is Apache MapReduce?](https://www.ibm.com/topics/mapreduce)
|
@@ -2,6 +2,4 @@
|
||||
|
||||
Messages and Streams are often used interchange‐ably but a subtle but essential differences exists between the two. A message is raw data communicated across two or more systems. Messages are discrete and singular signals in an event-driven system.
|
||||
|
||||
By contrast, a stream is an append-only log of event records. As events occur, streams are accumulated in an ordered
|
||||
sequence, using a timestamp or an ID to record events order. Streams are used when you need to analyze what happened over many events. Because of the append-only nature of streams, records in a stream are persisted over a long
|
||||
retention window—often weeks or months—allowing for complex operations on records such as aggregations on multiple records or the ability to rewind to a point in time within the stream.
|
||||
By contrast, a stream is an append-only log of event records. As events occur, streams are accumulated in an ordered sequence, using a timestamp or an ID to record events order. Streams are used when you need to analyze what happened over many events. Because of the append-only nature of streams, records in a stream are persisted over a long retention window—often weeks or months—allowing for complex operations on records such as aggregations on multiple records or the ability to rewind to a point in time within the stream.
|
@@ -2,7 +2,7 @@
|
||||
|
||||
PowerBI, an interactive data visualization and business analytics tool developed by Microsoft, plays a crucial role in the field of a data analyst's work. It helps data analysts to convert raw data into meaningful insights through it's easy-to-use dashboards and reports function. This tool provides a unified view of business data, allowing analysts to track and visualize key performance metrics and make better-informed business decisions. With PowerBI, data analysts also have the ability to manipulate and produce visualizations of large data sets that can be shared across an organization, making complex statistical information more digestible.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Power BI](https://www.microsoft.com/en-us/power-platform/products/power-bi)
|
||||
- [@video@Power BI for beginners](https://www.youtube.com/watch?v=NNSHu0rkew8)
|
@@ -1,4 +1,3 @@
|
||||
# Mobile apps
|
||||
|
||||
Mobile apps are programs for phones and tablets, usually from app stores. They can be native (for one OS like iOS or Android), hybrid (web tech in a native shell), or cross-platform (like React Native). Apps use phone features like GPS and cameras. They do many things from games to shopping. Good mobile apps focus on easy use, speed, offline working, and security.
|
||||
|
||||
|
@@ -6,7 +6,7 @@ If you have networking experience or want to be a reliability engineer or operat
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@video@Computer Networking Course - Network Engineering](https://www.youtube.com/watch?v=qiQR5rTSshw)
|
||||
- [@article@Khan Academy - Networking](https://www.khanacademy.org/computing/code-org/computers-and-the-internet)
|
||||
- [@video@Computer Networking Course - Network Engineering](https://www.youtube.com/watch?v=qiQR5rTSshw)
|
||||
- [@video@Networking Video Series (21 videos)](https://www.youtube.com/playlist?list=PLEbnTDJUr_IegfoqO4iPnPYQui46QqT0j)
|
||||
- [@feed@Explore top posts about Networking](https://app.daily.dev/tags/networking?ref=roadmapsh)
|
@@ -9,4 +9,3 @@ Visit the following resources to learn more:
|
||||
- [@article@What is OLTP?](https://www.oracle.com/uk/database/what-is-oltp/)
|
||||
- [@article@What is OLAP? - Online Analytical Processing Explained](https://aws.amazon.com/what-is/olap/)
|
||||
- [@video@OLTP vs OLAP](https://www.youtube.com/watch?v=iw-5kFzIdgY)
|
||||
|
||||
|
@@ -5,8 +5,8 @@ Redis is an open-source, in-memory data structure store known for its speed and
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Redis Roadmap](https://roadmap.sh/redis)
|
||||
- [@course@Redis Crash Course](https://www.youtube.com/watch?v=XCsS_NVAa1g)
|
||||
- [@official@Redis](https://redis.io/)
|
||||
- [@official@Redis Documentation](https://redis.io/docs/latest/)
|
||||
- [@video@Redis in 100 Seconds](https://www.youtube.com/watch?v=G1rOthIU-uo)
|
||||
- [@course@Redis Crash Course](https://www.youtube.com/watch?v=XCsS_NVAa1g)
|
||||
- [@feed@Explore top posts about Redis](https://app.daily.dev/tags/redis?ref=roadmapsh)
|
@@ -2,7 +2,6 @@
|
||||
|
||||
Reverse ETL is the process of extracting data from a data warehouse, transforming it to fit the requirements of operational systems, and then loading it into those other systems. This approach contrasts with traditional ETL, where data is extracted from operational systems, transformed, and loaded into a data warehouse.
|
||||
|
||||
While ETL and ELT focus on centralizing data, Reverse ETL aims to operationalize this data by making it actionable within third-party systems such as CRMs, marketing platforms, and other operational tools.
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is Reverse ETL? A Helpful Guide](https://www.datacamp.com/blog/reverse-etl)
|
||||
|
@@ -2,7 +2,6 @@
|
||||
|
||||
Serverless data storage involves using cloud provider services for databases and object storage that automatically scale infrastructure and implement a consumption-based, pay-as-you-go model, eliminating the need for developers to manage, provision, or maintain any physical or virtual servers. This approach simplifies development, reduces operational overhead, and offers cost-effectiveness by charging only for the resources used, allowing teams to focus on applications rather than infrastructure management.
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What Is Serverless Computing?](https://www.ibm.com/think/topics/serverless)
|
@@ -6,4 +6,3 @@ Visit the following resources to learn more:
|
||||
|
||||
- [@article@WMastering Slowly Changing Dimensions (SCD)](https://www.datacamp.com/tutorial/mastering-slowly-changing-dimensions-scd)
|
||||
- [@article@Implementing Slowly Changing Dimensions (SCDs) in Data Warehouses](https://www.sqlshack.com/implementing-slowly-changing-dimensions-scds-in-data-warehouses/)
|
||||
|
||||
|
@@ -2,11 +2,8 @@
|
||||
|
||||
Streamlit is a free and open-source framework to rapidly build and share machine learning and data science web apps. It is a Python-based library specifically designed for data and machine learning engineers. Data scientists or machine learning engineers are not web developers and they're not interested in spending weeks learning to use these frameworks to build web apps. Instead, they want a tool that is easier to learn and to use, as long as it can display data and collect needed parameters for modeling.
|
||||
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Streamlit Docs](https://docs.streamlit.io/)
|
||||
- [@official@Streamlit Python: Tutorial](https://www.datacamp.com/tutorial/streamlit)
|
||||
- [@video@EStreamlit Explained: Python Tutorial for Data Scientists](https://www.youtube.com/watch?v=c8QXUrvSSyg)
|
||||
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
Tableau is a powerful data visualization tool utilized extensively by data analysts worldwide. Its primary role is to transform raw, unprocessed data into an understandable format without any technical skills or coding. Data analysts use Tableau to create data visualizations, reports, and dashboards that help businesses make more informed, data-driven decisions. They also use it to perform tasks like trend analysis, pattern identification, and forecasts, all within a user-friendly interface. Moreover, Tableau's data visualization capabilities make it easier for stakeholders to understand complex data and act on insights quickly.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Tableau](https://www.tableau.com/en-gb)
|
||||
- [@video@What is Tableau?](https://www.youtube.com/watch?v=NLCzpPRCc7U)
|
@@ -5,8 +5,8 @@ Terraform is an open-source infrastructure as code (IaC) tool developed by Hashi
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Terraform Roadmap](https://roadmap.sh/terraform)
|
||||
- [@course@Complete Terraform Course](https://www.youtube.com/watch?v=7xngnjfIlK4)
|
||||
- [@official@Terraform Documentation](https://www.terraform.io/docs)
|
||||
- [@official@Terraform Tutorials](https://learn.hashicorp.com/terraform)
|
||||
- [@article@How to Scale Your Terraform Infrastructure](https://thenewstack.io/how-to-scale-your-terraform-infrastructure/)
|
||||
- [@course@Complete Terraform Course](https://www.youtube.com/watch?v=7xngnjfIlK4)
|
||||
- [@feed@Explore top posts about Terraform](https://app.daily.dev/tags/terraform?ref=roadmapsh)
|
@@ -2,7 +2,7 @@
|
||||
|
||||
Transactions in SQL are units of work that group one or more database operations into a single, atomic unit. They ensure data integrity by following the ACID properties: Atomicity (all or nothing), Consistency (database remains in a valid state), Isolation (transactions don't interfere with each other), and Durability (committed changes are permanent). Transactions are essential for maintaining data consistency in complex operations and handling concurrent access to the database.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@articles@Transactions](https://www.tutorialspoint.com/sql/sql-transactions.htm)
|
||||
- [@article@Transactions](https://www.tutorialspoint.com/sql/sql-transactions.htm)
|
||||
- [@article@A Guide to ACID Properties in Database Management Systems](https://www.mongodb.com/resources/basics/databases/acid-transactions)
|
@@ -1,4 +1,3 @@
|
||||
# Transform Data
|
||||
|
||||
In the second step, ETL tools transform and consolidate the raw data in the staging area to prepare it for the target data warehouse. The data transformation phase is normally the most complex and prone to errors, as it can involved multiple transformations, including basic data cleaning operations, deduplication, cata casting, filtering, grouping, encrypting, and many more.
|
||||
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes.
|
||||
|
||||
Learn more from the following resources:
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What Is a Data Warehouse?](https://www.oracle.com/database/what-is-a-data-warehouse/)
|
||||
- [@video@What is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg)
|
Reference in New Issue
Block a user