diff --git a/src/data/roadmaps/data-engineer/content/realtime@oqxNr0Lj34mgRi5Z5wJt_.md b/src/data/roadmaps/data-engineer/content/realtime@oqxNr0Lj34mgRi5Z5wJt_.md index 044241b4b..a07cc77e3 100644 --- a/src/data/roadmaps/data-engineer/content/realtime@oqxNr0Lj34mgRi5Z5wJt_.md +++ b/src/data/roadmaps/data-engineer/content/realtime@oqxNr0Lj34mgRi5Z5wJt_.md @@ -1 +1,3 @@ -# Realtime \ No newline at end of file +# Realtime + +Real-time processing, also known as streaming processing, involves the immediate ingestion, as well as analysis, of data as it is generated, providing instantaneous insights and enabling timely decisions in time-sensitive applications like financial trading, medical monitoring, and autonomous vehicles. This differs from batch processing, which handles data in later batches, and typically involves continuous data streaming, low latency, and high availability to deliver immediate outcomes for critical tasks. diff --git a/src/data/roadmaps/data-engineer/content/relational-databases@cslVSSKBMO7I6CpO7vG1H.md b/src/data/roadmaps/data-engineer/content/relational-databases@cslVSSKBMO7I6CpO7vG1H.md index 3f8c4eae9..70bc3b9e9 100644 --- a/src/data/roadmaps/data-engineer/content/relational-databases@cslVSSKBMO7I6CpO7vG1H.md +++ b/src/data/roadmaps/data-engineer/content/relational-databases@cslVSSKBMO7I6CpO7vG1H.md @@ -4,9 +4,9 @@ Relational databases are a type of database management system (DBMS) that organi Visit the following resources to learn more: +- [@course@Databases and SQL](https://www.edx.org/course/databases-5-sql) - [@article@Relational Databases](https://www.ibm.com/cloud/learn/relational-databases) - [@article@51 Years of Relational Databases](https://learnsql.com/blog/codd-article-databases/) -- [@course@Databases and SQL](https://www.edx.org/course/databases-5-sql) - [@article@Intro To Relational Databases](https://www.udacity.com/course/intro-to-relational-databases--ud197) - [@video@What is Relational Database](https://youtu.be/OqjJjpjDRLc) - [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/reusability@Rzk6HlMosx3FN_JD5kELZ.md b/src/data/roadmaps/data-engineer/content/reusability@Rzk6HlMosx3FN_JD5kELZ.md index db8ba2d04..633acee80 100644 --- a/src/data/roadmaps/data-engineer/content/reusability@Rzk6HlMosx3FN_JD5kELZ.md +++ b/src/data/roadmaps/data-engineer/content/reusability@Rzk6HlMosx3FN_JD5kELZ.md @@ -1 +1,7 @@ -# Reusability \ No newline at end of file +# Reusability + +One of the goals of Infrastructure as Code (IaC) is to creat modular, standardized units of code—like modules or templates that can be used across multiple projects, environments, and teams, embodying the "Don't Repeat Yourself" (DRY) principle. This approach significantly boosts efficiency, consistency, and maintainability, as it allows for rapid deployment of identical infrastructure patterns, enforces organizational standards, simplifies complex setups, and improves collaboration by providing shared, tested building blocks for infrastructure management. + +Visit the following resources to learn more: + +- [@article@What is Infrastructure as Code (IaC)?](https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/reverse-etl@JpuiYsipNWBcrjmn2ji6b.md b/src/data/roadmaps/data-engineer/content/reverse-etl@JpuiYsipNWBcrjmn2ji6b.md index ebdd26c8a..95e801bd7 100644 --- a/src/data/roadmaps/data-engineer/content/reverse-etl@JpuiYsipNWBcrjmn2ji6b.md +++ b/src/data/roadmaps/data-engineer/content/reverse-etl@JpuiYsipNWBcrjmn2ji6b.md @@ -1 +1,9 @@ -# Reverse ETL \ No newline at end of file +# Reverse ETL + +Reverse ETL is the process of extracting data from a data warehouse, transforming it to fit the requirements of operational systems, and then loading it into those other systems. This approach contrasts with traditional ETL, where data is extracted from operational systems, transformed, and loaded into a data warehouse. + +While ETL and ELT focus on centralizing data, Reverse ETL aims to operationalize this data by making it actionable within third-party systems such as CRMs, marketing platforms, and other operational tools. +Visit the following resources to learn more: + +- [@article@What is Reverse ETL? A Helpful Guide](https://www.datacamp.com/blog/reverse-etl) +- [@video@What is Reverse ETL?](https://www.youtube.com/watch?v=DRAGfc5or2Y) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/s3-storage@tbut25IZI2aU7TkI9fFYV.md b/src/data/roadmaps/data-engineer/content/s3-storage@tbut25IZI2aU7TkI9fFYV.md index 6f404f095..23ecd47dc 100644 --- a/src/data/roadmaps/data-engineer/content/s3-storage@tbut25IZI2aU7TkI9fFYV.md +++ b/src/data/roadmaps/data-engineer/content/s3-storage@tbut25IZI2aU7TkI9fFYV.md @@ -1 +1,7 @@ -# S3 (Storage) \ No newline at end of file +# S3 + +Amazon S3 (Simple Storage Service) is an object storage service offered by Amazon Web Services (AWS). It provides scalable, secure and durable storage on the internet. Designed for storing and retrieving any amount of data from anywhere on the web, it is a key tool for many companies in the field of data storage, including mobile applications, websites, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. + +Visit the following resources to learn more: + +- [@official@S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) diff --git a/src/data/roadmaps/data-engineer/content/segment@8vqjI-uFwJIr_TBEVyM_3.md b/src/data/roadmaps/data-engineer/content/segment@8vqjI-uFwJIr_TBEVyM_3.md index af21080cd..eed7a6649 100644 --- a/src/data/roadmaps/data-engineer/content/segment@8vqjI-uFwJIr_TBEVyM_3.md +++ b/src/data/roadmaps/data-engineer/content/segment@8vqjI-uFwJIr_TBEVyM_3.md @@ -1 +1,7 @@ -# Segment \ No newline at end of file +# Segment + +Segment is an analytics platform that provides a single API for collecting, storing, and routing customer data from various sources. With Segment, data engineers can easily add analytics tracking to their app, without having to integrate with multiple analytics tools individually. Segment acts as a single point of integration, allowing developers to send data to multiple analytics tools with a single API. + +Visit the following resources to learn more: + +- [@official@flutter_segment](https://pub.dev/packages/flutter_segment) diff --git a/src/data/roadmaps/data-engineer/content/sentry@i54fx-NV6nWzQVCdi0aKL.md b/src/data/roadmaps/data-engineer/content/sentry@i54fx-NV6nWzQVCdi0aKL.md index efc441663..20214d679 100644 --- a/src/data/roadmaps/data-engineer/content/sentry@i54fx-NV6nWzQVCdi0aKL.md +++ b/src/data/roadmaps/data-engineer/content/sentry@i54fx-NV6nWzQVCdi0aKL.md @@ -1 +1,8 @@ -# Sentry \ No newline at end of file +# Sentry + +Sentry tracks your software performance, measuring metrics like throughput and latency, and displaying the impact of errors across multiple systems. Sentry captures distributed traces consisting of transactions and spans, which measure individual services and individual operations within those services. + +Visit the following resources to learn more: + +- [@official@Sentry](https://sentry.io) +- [@official@Sentry Documentation](https://docs.sentry.io/) diff --git a/src/data/roadmaps/data-engineer/content/serverless-options@ZnGX8pg4GagdSalg_P0oq.md b/src/data/roadmaps/data-engineer/content/serverless-options@ZnGX8pg4GagdSalg_P0oq.md index 9bce957b7..1fdd5f1f2 100644 --- a/src/data/roadmaps/data-engineer/content/serverless-options@ZnGX8pg4GagdSalg_P0oq.md +++ b/src/data/roadmaps/data-engineer/content/serverless-options@ZnGX8pg4GagdSalg_P0oq.md @@ -1 +1,8 @@ -# Serverless Options \ No newline at end of file +# Serverless Options + +Serverless data storage involves using cloud provider services for databases and object storage that automatically scale infrastructure and implement a consumption-based, pay-as-you-go model, eliminating the need for developers to manage, provision, or maintain any physical or virtual servers. This approach simplifies development, reduces operational overhead, and offers cost-effectiveness by charging only for the resources used, allowing teams to focus on applications rather than infrastructure management. + + +Visit the following resources to learn more: + +- [@official@What Is Serverless Computing?](https://www.ibm.com/think/topics/serverless) diff --git a/src/data/roadmaps/data-engineer/content/slowly-changing-dimension---scd@5KgPfywItqLFQRnIZldZH.md b/src/data/roadmaps/data-engineer/content/slowly-changing-dimension---scd@5KgPfywItqLFQRnIZldZH.md index 78b91badd..24a37828d 100644 --- a/src/data/roadmaps/data-engineer/content/slowly-changing-dimension---scd@5KgPfywItqLFQRnIZldZH.md +++ b/src/data/roadmaps/data-engineer/content/slowly-changing-dimension---scd@5KgPfywItqLFQRnIZldZH.md @@ -1 +1,9 @@ -# Slowly Changing Dimension - SCD \ No newline at end of file +# Slowly Changing Dimension - SCD + +Slowly Changing Dimensions (SCDs) are a data warehousing technique used to track changes in dimension data over time. Instead of simply overwriting old data with new data, SCDs allow you to maintain historical records of how dimension attributes have changed. This is crucial for accurate analysis of historical trends and business performance. + +Visit the following resources to learn more: + +- [@article@WMastering Slowly Changing Dimensions (SCD)](https://www.datacamp.com/tutorial/mastering-slowly-changing-dimensions-scd) +- [@article@Implementing Slowly Changing Dimensions (SCDs) in Data Warehouses](https://www.sqlshack.com/implementing-slowly-changing-dimensions-scds-in-data-warehouses/) + diff --git a/src/data/roadmaps/data-engineer/content/smoke-testing@woa5K4Dt9L6aBzlJMNS31.md b/src/data/roadmaps/data-engineer/content/smoke-testing@woa5K4Dt9L6aBzlJMNS31.md index 5237aee90..5caadea76 100644 --- a/src/data/roadmaps/data-engineer/content/smoke-testing@woa5K4Dt9L6aBzlJMNS31.md +++ b/src/data/roadmaps/data-engineer/content/smoke-testing@woa5K4Dt9L6aBzlJMNS31.md @@ -1 +1,8 @@ -# Smoke Testing \ No newline at end of file +# Smoke Testing + +Smoke Testing is a software testing process that determines whether the deployed software build is stable or not. Smoke testing is a confirmation for QA team to proceed with further software testing. It consists of a minimal set of tests run on each build to test software functionalities. + +Visit the following resources to learn more: + +- [@article@Smoke Testing | Software Testing](https://www.guru99.com/smoke-testing.html) +- [@feed@Explore top posts about Testing](https://app.daily.dev/tags/testing?ref=roadmapsh) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/snowflake@Pf0_CBGkmSEfWDQ2_iFXr.md b/src/data/roadmaps/data-engineer/content/snowflake@Pf0_CBGkmSEfWDQ2_iFXr.md index 03c647e42..e2fbf6c7a 100644 --- a/src/data/roadmaps/data-engineer/content/snowflake@Pf0_CBGkmSEfWDQ2_iFXr.md +++ b/src/data/roadmaps/data-engineer/content/snowflake@Pf0_CBGkmSEfWDQ2_iFXr.md @@ -1 +1,10 @@ -# Snowflake \ No newline at end of file +# Snowflake + +Snowflake is a cloud-based data platform that provides a data warehouse as a service. It allows organizations to store, analyze, and share data, offering features like data engineering, data governance, and collaboration capabilities. Snowflake is known for its scalability, ease of use, and ability to handle diverse workloads, including data warehousing, data lakes, and machine learning. + +Visit the following resources to learn more: + +- [@official@Snowflake Docs](https://docs.snowflake.com/) +- [@official@Snowflake in 20 minutes](https://docs.snowflake.com/en/user-guide/tutorials/snowflake-in-20minutes) +- [@article@Snowflake Tutorial For Beginners: From Architecture to Running Databases](https://www.datacamp.com/tutorial/introduction-to-snowflake-for-beginners) +- [@video@Learn Snowflake in 2 Hours](https://www.youtube.com/watch?v=mP3QbYURT9k) diff --git a/src/data/roadmaps/data-engineer/content/snowflake@W3l1_66fsIqR3MqgBJUmU.md b/src/data/roadmaps/data-engineer/content/snowflake@W3l1_66fsIqR3MqgBJUmU.md index 03c647e42..e2fbf6c7a 100644 --- a/src/data/roadmaps/data-engineer/content/snowflake@W3l1_66fsIqR3MqgBJUmU.md +++ b/src/data/roadmaps/data-engineer/content/snowflake@W3l1_66fsIqR3MqgBJUmU.md @@ -1 +1,10 @@ -# Snowflake \ No newline at end of file +# Snowflake + +Snowflake is a cloud-based data platform that provides a data warehouse as a service. It allows organizations to store, analyze, and share data, offering features like data engineering, data governance, and collaboration capabilities. Snowflake is known for its scalability, ease of use, and ability to handle diverse workloads, including data warehousing, data lakes, and machine learning. + +Visit the following resources to learn more: + +- [@official@Snowflake Docs](https://docs.snowflake.com/) +- [@official@Snowflake in 20 minutes](https://docs.snowflake.com/en/user-guide/tutorials/snowflake-in-20minutes) +- [@article@Snowflake Tutorial For Beginners: From Architecture to Running Databases](https://www.datacamp.com/tutorial/introduction-to-snowflake-for-beginners) +- [@video@Learn Snowflake in 2 Hours](https://www.youtube.com/watch?v=mP3QbYURT9k) diff --git a/src/data/roadmaps/data-engineer/content/sources-of-data@zGKTlMUzhrbVbqpLZBsMZ.md b/src/data/roadmaps/data-engineer/content/sources-of-data@zGKTlMUzhrbVbqpLZBsMZ.md index be1e78cfb..58c636d29 100644 --- a/src/data/roadmaps/data-engineer/content/sources-of-data@zGKTlMUzhrbVbqpLZBsMZ.md +++ b/src/data/roadmaps/data-engineer/content/sources-of-data@zGKTlMUzhrbVbqpLZBsMZ.md @@ -1 +1,3 @@ -# Sources of Data \ No newline at end of file +# Sources of Data + +Sources of data are origins or locations from which data is collected, categorized as primary (direct, firsthand information) or secondary (collected by others). Common primary sources include surveys, interviews, experiments, and sensor data. Secondary sources encompass databases, published reports, government data, books, articles, and web data like social media posts. Data sources can also be classified as internal (within an organization) or external (from outside sources). diff --git a/src/data/roadmaps/data-engineer/content/star-vs-snowflake-schema@OfH_UXnxvGQgwlNQwOEfS.md b/src/data/roadmaps/data-engineer/content/star-vs-snowflake-schema@OfH_UXnxvGQgwlNQwOEfS.md index 023e5296f..39256fc0b 100644 --- a/src/data/roadmaps/data-engineer/content/star-vs-snowflake-schema@OfH_UXnxvGQgwlNQwOEfS.md +++ b/src/data/roadmaps/data-engineer/content/star-vs-snowflake-schema@OfH_UXnxvGQgwlNQwOEfS.md @@ -1 +1,11 @@ -# Star vs Snowflake Schema \ No newline at end of file +# Star vs Snowflake Schema + +A star schema is a way to organize data in a database, namely in data warehouses, to make it easier and faster to analyze. At the center, there's a main table called the **fact table**, which holds measurable data like sales or revenue. Around it are **dimension tables**, which add details like product names, customer info, or dates. This layout forms a star-like shape. + +A snowflake schema is another way of organizing data. In this schema, dimension tables are split into smaller sub-dimensions to keep data more organized and detailed, just like snowflakes in a large lake. + +The star schema is simple and fast -ideal when you need to extract data for analysis quickly. On the other hand, the snowflake schema is more detailed. It prioritizes storage efficiency and managing complex data relationships. + +Visit the following resources to learn more: + +- [@official@Star Schema vs Snowflake Schema: Differences & Use Cases](https://www.datacamp.com/blog/star-schema-vs-snowflake-schema) diff --git a/src/data/roadmaps/data-engineer/content/streaming@wwPO5Uc6qnwYgibrbPn7y.md b/src/data/roadmaps/data-engineer/content/streaming@wwPO5Uc6qnwYgibrbPn7y.md index 3ab695ba8..fe36c9707 100644 --- a/src/data/roadmaps/data-engineer/content/streaming@wwPO5Uc6qnwYgibrbPn7y.md +++ b/src/data/roadmaps/data-engineer/content/streaming@wwPO5Uc6qnwYgibrbPn7y.md @@ -1 +1,3 @@ -# Streaming \ No newline at end of file +# Streaming + +Streaming processing, also known as real-time processing, involves the immediate ingestion, as well as analysis, of data as it is generated, providing instantaneous insights and enabling timely decisions in time-sensitive applications like financial trading, medical monitoring, and autonomous vehicles. This differs from batch processing, which handles data in later batches, and typically involves continuous data streaming, low latency, and high availability to deliver immediate outcomes for critical tasks. diff --git a/src/data/roadmaps/data-engineer/content/streamlit@FfU6Vwf0PXva91FoqxFgp.md b/src/data/roadmaps/data-engineer/content/streamlit@FfU6Vwf0PXva91FoqxFgp.md index ec950278b..7a27b6a86 100644 --- a/src/data/roadmaps/data-engineer/content/streamlit@FfU6Vwf0PXva91FoqxFgp.md +++ b/src/data/roadmaps/data-engineer/content/streamlit@FfU6Vwf0PXva91FoqxFgp.md @@ -1 +1,12 @@ -# Streamlit \ No newline at end of file +# Streamlit + +Streamlit is a free and open-source framework to rapidly build and share machine learning and data science web apps. It is a Python-based library specifically designed for data and machine learning engineers. Data scientists or machine learning engineers are not web developers and they're not interested in spending weeks learning to use these frameworks to build web apps. Instead, they want a tool that is easier to learn and to use, as long as it can display data and collect needed parameters for modeling. + + + +Visit the following resources to learn more: + +- [@official@Streamlit Docs](https://docs.streamlit.io/) +- [@official@Streamlit Python: Tutorial](https://www.datacamp.com/tutorial/streamlit) +- [@video@EStreamlit Explained: Python Tutorial for Data Scientists](https://www.youtube.com/watch?v=c8QXUrvSSyg) + diff --git a/src/data/roadmaps/data-engineer/content/tableu@gqEAOwHFrQiYSejNUdV7-.md b/src/data/roadmaps/data-engineer/content/tableu@gqEAOwHFrQiYSejNUdV7-.md index 190ffbf19..641eb0057 100644 --- a/src/data/roadmaps/data-engineer/content/tableu@gqEAOwHFrQiYSejNUdV7-.md +++ b/src/data/roadmaps/data-engineer/content/tableu@gqEAOwHFrQiYSejNUdV7-.md @@ -1 +1,8 @@ -# Tableu \ No newline at end of file +# Tableau + +Tableau is a powerful data visualization tool utilized extensively by data analysts worldwide. Its primary role is to transform raw, unprocessed data into an understandable format without any technical skills or coding. Data analysts use Tableau to create data visualizations, reports, and dashboards that help businesses make more informed, data-driven decisions. They also use it to perform tasks like trend analysis, pattern identification, and forecasts, all within a user-friendly interface. Moreover, Tableau's data visualization capabilities make it easier for stakeholders to understand complex data and act on insights quickly. + +Learn more from the following resources: + +- [@official@Tableau](https://www.tableau.com/en-gb) +- [@video@What is Tableau?](https://www.youtube.com/watch?v=NLCzpPRCc7U) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/terraform@N-xRhdOTHijAymcTWPXPJ.md b/src/data/roadmaps/data-engineer/content/terraform@N-xRhdOTHijAymcTWPXPJ.md index bffd293c1..8fddf75a5 100644 --- a/src/data/roadmaps/data-engineer/content/terraform@N-xRhdOTHijAymcTWPXPJ.md +++ b/src/data/roadmaps/data-engineer/content/terraform@N-xRhdOTHijAymcTWPXPJ.md @@ -1 +1,12 @@ -# Terraform \ No newline at end of file +# Terraform + +Terraform is an open-source infrastructure as code (IaC) tool developed by HashiCorp, used to define, provision, and manage cloud and on-premises infrastructure using declarative configuration files. It supports multiple cloud providers like AWS, Azure, and Google Cloud, as well as various services and platforms, enabling infrastructure automation across diverse environments. Terraform's state management and modular structure allow for efficient scaling, reusability, and version control of infrastructure. It is widely used for automating infrastructure provisioning, reducing manual errors, and improving infrastructure consistency and repeatability. + +Visit the following resources to learn more: + +- [@roadmap@Visit Dedicated Terraform Roadmap](https://roadmap.sh/terraform) +- [@official@Terraform Documentation](https://www.terraform.io/docs) +- [@official@Terraform Tutorials](https://learn.hashicorp.com/terraform) +- [@article@How to Scale Your Terraform Infrastructure](https://thenewstack.io/how-to-scale-your-terraform-infrastructure/) +- [@course@Complete Terraform Course](https://www.youtube.com/watch?v=7xngnjfIlK4) +- [@feed@Explore top posts about Terraform](https://app.daily.dev/tags/terraform?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/testing@DZoxLu-j1vq5leoXLRZqt.md b/src/data/roadmaps/data-engineer/content/testing@DZoxLu-j1vq5leoXLRZqt.md index 94cfd7d77..e9b7a0130 100644 --- a/src/data/roadmaps/data-engineer/content/testing@DZoxLu-j1vq5leoXLRZqt.md +++ b/src/data/roadmaps/data-engineer/content/testing@DZoxLu-j1vq5leoXLRZqt.md @@ -1 +1,9 @@ -# Testing \ No newline at end of file +# Testing + +Testing is a systematic process used to evaluate the functionality, performance, and quality of software or systems to ensure they meet specified requirements and standards. It involves various methodologies and levels, including unit testing (testing individual components), integration testing (verifying interactions between components), system testing (assessing the entire system's behavior), and acceptance testing (confirming it meets user needs). Testing can be manual or automated and aims to identify defects, validate that features work as intended, and ensure the system performs reliably under different conditions. Effective testing is critical for delivering high-quality software and mitigating risks before deployment. + +Visit the following resources to learn more: + +- [@article@What is Software Testing?](https://www.guru99.com/software-testing-introduction-importance.html) +- [@article@Testing Pyramid](https://www.browserstack.com/guide/testing-pyramid-for-test-automation) +- [@feed@Explore top posts about Testing](https://app.daily.dev/tags/testing?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/tokenization@ZAKo9Svb8TQ6KkmOnfB5x.md b/src/data/roadmaps/data-engineer/content/tokenization@ZAKo9Svb8TQ6KkmOnfB5x.md index 2a71b6b86..d42a680e8 100644 --- a/src/data/roadmaps/data-engineer/content/tokenization@ZAKo9Svb8TQ6KkmOnfB5x.md +++ b/src/data/roadmaps/data-engineer/content/tokenization@ZAKo9Svb8TQ6KkmOnfB5x.md @@ -1 +1,8 @@ -# Tokenization \ No newline at end of file +# Tokenization + +Tokenization is the step where raw text is broken into small pieces called tokens, and each token is given a unique number. A token can be a whole word, part of a word, a punctuation mark, or even a space. The list of all possible tokens is the model’s vocabulary. Once text is turned into these numbered tokens, the model can look up an embedding for each number and start its math. By working with tokens instead of full sentences, the model keeps the input size steady and can handle new or rare words by slicing them into familiar sub-pieces. After the model finishes its work, the numbered tokens are turned back into text through the same vocabulary map, letting the user read the result. + +Visit the following resources to learn more: + +- [@article@Explaining Tokens — the Language and Currency of AI](https://blogs.nvidia.com/blog/ai-tokens-explained/) +- [@article@What is Tokenization? Types, Use Cases, Implementation](https://www.datacamp.com/blog/what-is-tokenization) diff --git a/src/data/roadmaps/data-engineer/content/transactions@1BJGXWax6CONuFkaYR4Jm.md b/src/data/roadmaps/data-engineer/content/transactions@1BJGXWax6CONuFkaYR4Jm.md index 0125c964d..bcd13f237 100644 --- a/src/data/roadmaps/data-engineer/content/transactions@1BJGXWax6CONuFkaYR4Jm.md +++ b/src/data/roadmaps/data-engineer/content/transactions@1BJGXWax6CONuFkaYR4Jm.md @@ -1 +1,8 @@ -# Transactions \ No newline at end of file +# Transactions + +Transactions in SQL are units of work that group one or more database operations into a single, atomic unit. They ensure data integrity by following the ACID properties: Atomicity (all or nothing), Consistency (database remains in a valid state), Isolation (transactions don't interfere with each other), and Durability (committed changes are permanent). Transactions are essential for maintaining data consistency in complex operations and handling concurrent access to the database. + +Learn more from the following resources: + +- [@articles@Transactions](https://www.tutorialspoint.com/sql/sql-transactions.htm) +- [@article@A Guide to ACID Properties in Database Management Systems](https://www.mongodb.com/resources/basics/databases/acid-transactions) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/types-of-data-ingestion@GN1Xh3kA25ge-wTbdiSio.md b/src/data/roadmaps/data-engineer/content/types-of-data-ingestion@GN1Xh3kA25ge-wTbdiSio.md index d77e28472..491ee8333 100644 --- a/src/data/roadmaps/data-engineer/content/types-of-data-ingestion@GN1Xh3kA25ge-wTbdiSio.md +++ b/src/data/roadmaps/data-engineer/content/types-of-data-ingestion@GN1Xh3kA25ge-wTbdiSio.md @@ -1 +1,3 @@ -# Types of Data Ingestion \ No newline at end of file +# Types of Data Ingestion + +The primary types of data ingestion are Batch, Streaming, and Hybrid. Batch ingestion processes data in large, scheduled chunks, suitable for non-time-sensitive tasks like monthly reports. Streaming (or Real-time) ingestion handles data as it arrives, ideal for time-sensitive applications such as fraud detection or IoT monitoring. Hybrid ingestion combines both methods, offering flexibility for diverse business needs. diff --git a/src/data/roadmaps/data-engineer/content/unit-testing@8dXD4ddR_USEbAJhUMcB6.md b/src/data/roadmaps/data-engineer/content/unit-testing@8dXD4ddR_USEbAJhUMcB6.md index 3b3752cf9..5b395b83a 100644 --- a/src/data/roadmaps/data-engineer/content/unit-testing@8dXD4ddR_USEbAJhUMcB6.md +++ b/src/data/roadmaps/data-engineer/content/unit-testing@8dXD4ddR_USEbAJhUMcB6.md @@ -1 +1,9 @@ -# Unit Testing \ No newline at end of file +# Unit Testing + +Unit testing is where individual **units** (modules, functions/methods, routines, etc.) of software are tested to ensure their correctness. This low-level testing ensures smaller components are functionally sound while taking the burden off of higher-level tests. Generally, a developer writes these tests during the development process and they are run as automated tests. + +Visit the following resources to learn more: + +- [@article@Unit Testing Tutorial](https://www.guru99.com/unit-testing-guide.html) +- [@video@What is Unit Testing?](https://youtu.be/3kzHmaeozDI) +- [@feed@Explore top posts about Testing](https://app.daily.dev/tags/testing?ref=roadmapsh) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/what-and-why-use-them@1qju7UlcMo2Ebp4a3BGxH.md b/src/data/roadmaps/data-engineer/content/what-and-why-use-them@1qju7UlcMo2Ebp4a3BGxH.md index 798e94693..2ae1df97f 100644 --- a/src/data/roadmaps/data-engineer/content/what-and-why-use-them@1qju7UlcMo2Ebp4a3BGxH.md +++ b/src/data/roadmaps/data-engineer/content/what-and-why-use-them@1qju7UlcMo2Ebp4a3BGxH.md @@ -1 +1,3 @@ -# What and why use them? \ No newline at end of file +# What and why use them? + +In data engineering, messaging systems act as central brokers for data communication, allowing different applications and services to send and receive data in a decoupled, scalable, and fault-tolerant way. They are crucial for handling high-volume, real-time data streams, building resilient data pipelines, and enabling event-driven architectures by acting as buffers and communication channels between data producers and consumers. Key benefits include decoupling systems for agility, ensuring data reliability through queuing and retries, and horizontal scalability to manage growing data loads, while common examples include Apache Kafka and message queues like RabbitMQ and AWS SQS. diff --git a/src/data/roadmaps/data-engineer/content/what-is-data-warehouse@dc3lJI27hJ3zZ45UCVqM1.md b/src/data/roadmaps/data-engineer/content/what-is-data-warehouse@dc3lJI27hJ3zZ45UCVqM1.md index 23df18b2e..2d506c0bd 100644 --- a/src/data/roadmaps/data-engineer/content/what-is-data-warehouse@dc3lJI27hJ3zZ45UCVqM1.md +++ b/src/data/roadmaps/data-engineer/content/what-is-data-warehouse@dc3lJI27hJ3zZ45UCVqM1.md @@ -1 +1,8 @@ -# What is Data Warehouse? \ No newline at end of file +# Data Warehouse + +**Data Warehouses** are data storage systems which are designed for analyzing, reporting and integrating with transactional systems. The data in a warehouse is clean, consistent, and often transformed to meet wide-range of business requirements. Hence, data warehouses provide structured data but require more processing and management compared to data lakes. + +Learn more from the following resources: + +- [@article@What Is a Data Warehouse?](https://www.oracle.com/database/what-is-a-data-warehouse/) +- [@video@@hat is a Data Warehouse?](https://www.youtube.com/watch?v=k4tK2ttdSDg) \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/yarn@KcW4z48pk2x6IjQhZs_Ub.md b/src/data/roadmaps/data-engineer/content/yarn@KcW4z48pk2x6IjQhZs_Ub.md index e80c66907..9ca9024e7 100644 --- a/src/data/roadmaps/data-engineer/content/yarn@KcW4z48pk2x6IjQhZs_Ub.md +++ b/src/data/roadmaps/data-engineer/content/yarn@KcW4z48pk2x6IjQhZs_Ub.md @@ -1 +1,7 @@ -# YARN \ No newline at end of file +# Apache Hadoop YARN + +Apache Hadoop YARN (Yet Another Resource Negotiator) is the part of Hadoop that manages resources and runs jobs on a cluster. It has a ResourceManager that controls all cluster resources and an ApplicationMaster for each job that schedules and runs tasks. YARN lets different tools like MapReduce and Spark share the same cluster, making it more efficient, flexible, and reliable. + +Visit the following resources to learn more: + +- [@video@Hadoop Yarn Tutorial](https://www.youtube.com/watch?v=6bIF9VwRwE0) \ No newline at end of file