diff --git a/src/data/roadmaps/data-engineer/content/aws-sqs@uIU5Yncp6hGDcNO1fpjUS.md b/src/data/roadmaps/data-engineer/content/aws-sqs@uIU5Yncp6hGDcNO1fpjUS.md index a6b089f41..caa4a3bf3 100644 --- a/src/data/roadmaps/data-engineer/content/aws-sqs@uIU5Yncp6hGDcNO1fpjUS.md +++ b/src/data/roadmaps/data-engineer/content/aws-sqs@uIU5Yncp6hGDcNO1fpjUS.md @@ -1 +1,10 @@ -# AWS SQS \ No newline at end of file +# AWS SQS + +Amazon Simple Queue Service (Amazon SQS) offers a secure, durable, and available hosted queue that lets you integrate and decouple distributed software systems and components. Amazon SQS offers common constructs such as dead-letter queues and cost allocation tags. It provides a generic web services API that you can access using any programming language that the AWS SDK supports. + +Visit the following resources to learn more: + +- [@official@Amazon Simple Queue Service](https://aws.amazon.com/sqs/) +- [@official@What is Amazon Simple Queue Service?](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/welcome.html) +- [@article@Amazon Simple Queue Service (SQS): A Comprehensive Tutorial](https://www.datacamp.com/tutorial/amazon-sqs) + diff --git a/src/data/roadmaps/data-engineer/content/azure-blob-storage@gzbEGCUwMsD1gL4nW668g.md b/src/data/roadmaps/data-engineer/content/azure-blob-storage@gzbEGCUwMsD1gL4nW668g.md index 4c75b7d01..41a160dfa 100644 --- a/src/data/roadmaps/data-engineer/content/azure-blob-storage@gzbEGCUwMsD1gL4nW668g.md +++ b/src/data/roadmaps/data-engineer/content/azure-blob-storage@gzbEGCUwMsD1gL4nW668g.md @@ -1 +1,9 @@ -# Azure Blob Storage \ No newline at end of file +# Azure Blob Storage + +Azure Blob Storage is Microsoft's object storage solution for the cloud. “Blob” stands for Binary Large Object, a term used to describe storage for unstructured data like text, images, and video. Azure Blob Storage is Microsoft Azure’s solution for storing these blobs in the cloud. It offers flexible storage—you only pay based on your usage. Depending on the access speed you need for your data, you can choose from various storage tiers (hot, cool, and archive). Being cloud-based, it is scalable, secure, and easy to manage. + +Visit the following resources to learn more: + +- [@official@Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) +- [@official@Introduction to Azure Blob Storage](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction) +- [@video@A Beginners Guide to Azure Blob Storage](https://www.youtube.com/watch?v=ah1XqItWkuc&t=300s) diff --git a/src/data/roadmaps/data-engineer/content/azure-sql-database@iIZ3g70KRwEJCBNaONd2d.md b/src/data/roadmaps/data-engineer/content/azure-sql-database@iIZ3g70KRwEJCBNaONd2d.md index 3b2019831..dacd99bb2 100644 --- a/src/data/roadmaps/data-engineer/content/azure-sql-database@iIZ3g70KRwEJCBNaONd2d.md +++ b/src/data/roadmaps/data-engineer/content/azure-sql-database@iIZ3g70KRwEJCBNaONd2d.md @@ -1 +1,10 @@ -# Azure SQL Database \ No newline at end of file +# Azure SQL Database + +Azure SQL Database is a fully managed Platform as a Service (PaaS) offering. It abstracts the underlying infrastructure, enabling developers to focus on building and deploying applications without worrying about database maintenance tasks. + +Visit the following resources to learn more: + +- [@official@Azure SQL Database](https://azure.microsoft.com/en-us/products/azure-sql/database) +- [@official@What is Azure SQL Database?](https://learn.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview?view=azuresql) +- [@article@Azure SQL Database: Step-by-Step Setup and Management](https://www.datacamp.com/tutorial/azure-sql-database) +- [@video@Azure SQL for Beginners](https://www.youtube.com/playlist?list=PLlrxD0HtieHi5c9-i_Dnxw9vxBY-TqaeN) diff --git a/src/data/roadmaps/data-engineer/content/azure-virtual-machines@-yi-xk-kv0njW9GdytiAQ.md b/src/data/roadmaps/data-engineer/content/azure-virtual-machines@-yi-xk-kv0njW9GdytiAQ.md index 846b01301..287860136 100644 --- a/src/data/roadmaps/data-engineer/content/azure-virtual-machines@-yi-xk-kv0njW9GdytiAQ.md +++ b/src/data/roadmaps/data-engineer/content/azure-virtual-machines@-yi-xk-kv0njW9GdytiAQ.md @@ -1 +1,9 @@ -# Azure Virtual Machines \ No newline at end of file +# Azure Virtual Machines + +Azure Virtual Machines (VMs) enable virtualization without requiring hardware investments. They provide customizable environments for development, testing, and cloud applications so you can run different operating systems like Ubuntu on a Windows host based on your needs. One of the key advantages of Azure VMs is the pay-as-you-go pricing model. It allows you to scale resources up or down as needed, ensuring cost efficiency without wasting resources. + +Visit the following resources to learn more: + +- [@official@Azure Virtual Machines](https://azure.microsoft.com/en-us/products/virtual-machines) +- [@official@Virtual Machines in Azure](https://learn.microsoft.com/en-us/azure/virtual-machines/overview) +- [@video@AVirtual Machines in Azure | Beginner's Guide](https://www.youtube.com/watch?v=_abaWXoQFZU) diff --git a/src/data/roadmaps/data-engineer/content/batch@f-a3Hy1ldnvSv8W2mFiJK.md b/src/data/roadmaps/data-engineer/content/batch@f-a3Hy1ldnvSv8W2mFiJK.md index a595e4e7e..ac3d0dcc0 100644 --- a/src/data/roadmaps/data-engineer/content/batch@f-a3Hy1ldnvSv8W2mFiJK.md +++ b/src/data/roadmaps/data-engineer/content/batch@f-a3Hy1ldnvSv8W2mFiJK.md @@ -1 +1,9 @@ -# Batch \ No newline at end of file +# Batch + +Batch processing is a method in which large volumes of collected data are processed in chunks or batches. This approach is especially effective for resource-intensive jobs, repetitive tasks, and managing extensive datasets where real-time processing isn’t required. It is ideal for applications like data warehousing, ETL (Extract, Transform, Load), and large-scale reporting. Data batch processing is mainly automated, requiring minimal human interaction once the process is set up. Tasks are predefined, and the system executes them according to a scheduled timeline, typically during off-peak hours when computing resources are readily available. + +Visit the following resources to learn more: + +- [@article@What is Batch Processing?](https://aws.amazon.com/what-is/batch-processing/) +- [@article@Batch And Streaming Demystified For Unification](https://towardsdatascience.com/batch-and-streaming-demystified-for-unification-dee0b48f921d/) + diff --git a/src/data/roadmaps/data-engineer/content/best-practices@yyJJGinOv3M21MFuqJs0j.md b/src/data/roadmaps/data-engineer/content/best-practices@yyJJGinOv3M21MFuqJs0j.md index ed28f90fa..b84f52580 100644 --- a/src/data/roadmaps/data-engineer/content/best-practices@yyJJGinOv3M21MFuqJs0j.md +++ b/src/data/roadmaps/data-engineer/content/best-practices@yyJJGinOv3M21MFuqJs0j.md @@ -1 +1,15 @@ -# Best Practices \ No newline at end of file +# Best Practices + +1. **Ensure Reliability.** A robust messaging system must guarantee that messages aren’t lost, even during node failures or network issues. This means using acknowledgments, replication across multiple brokers, and durable storage on disk. These measures ensure that producers and consumers can recover seamlessly without data loss when something goes wrong. + +2. **Design for Scalability.** Scalability should be baked in from the start. Partition topics strategically to distribute load across brokers and consumer groups, enabling horizontal scaling. + +3. **Maintain Message Ordering.** For systems that depend on message sequence, ensure ordering within partitions and design producers to consistently route related messages to the same partition. + +4. **Secure Communication.** Messaging queues often carry sensitive data, so encrypt messages both in transit and at rest. Implement authentication techniques to ensure only trusted clients can publish or consume, and enforce authorization rules to limit access to specific topics or operations. + +6. **Monitor & Alert.** Continuous visibility into your messaging system is essential. Track metrics such as message lag, throughput, consumer group health, and broker disk usage. Set alerts for abnormal patterns, like growing lag or dropped connections, so you can respond before they affect downstream systems. + +Visit the following resources to learn more: + +- [@article@Best Practices for Message Queue Architecture](https://abhishek-patel.medium.com/best-practices-for-message-queue-architecture-f69d47e3565) diff --git a/src/data/roadmaps/data-engineer/content/big-data-tools@03BHmPhYkZrJwRvQdmxxr.md b/src/data/roadmaps/data-engineer/content/big-data-tools@03BHmPhYkZrJwRvQdmxxr.md index 9130120b0..fe914d253 100644 --- a/src/data/roadmaps/data-engineer/content/big-data-tools@03BHmPhYkZrJwRvQdmxxr.md +++ b/src/data/roadmaps/data-engineer/content/big-data-tools@03BHmPhYkZrJwRvQdmxxr.md @@ -1 +1,12 @@ -# Big Data Tools \ No newline at end of file +# Big Data Tools + +Big data tools are specialized software and platforms designed to handle the massive volume, velocity, and variety of data that traditional data processing tools cannot effectively manage. These tools provide the infrastructure, frameworks, and capabilities to process, analyze, and extract meaningful knowledge from vast datasets. They are essential for modern data-driven organizations seeking to gain insights, make informed decisions, and achieve a competitive advantage. + +Hadoop and Spark are two of the most prominent frameworks in big data they handle the processing of large-scale data in very different ways. While Hadoop can be credited with democratizing the distributed computing paradigm through a robust storage system called HDFS and a computational model called MapReduce, Spark is changing the game with its in-memory architecture and flexible programming model. + +Visit the following resources to learn more: + +- [@article@What is Big Data?](https://cloud.google.com/learn/what-is-big-data?hl=en) +- [@article@Hadoop vs Spark: Which Big Data Framework Is Right For You?](https://www.datacamp.com/blog/hadoop-vs-spark) +- [@video@introduction to Big Data with Spark and Hadoop](http://youtube.com/watch?v=vHlwg4ciCsI&t=80s&ab_channel=freeCodeAcademy) + diff --git a/src/data/roadmaps/data-engineer/content/bigtable@ltZftFsiOo12AkQ-04N3B.md b/src/data/roadmaps/data-engineer/content/bigtable@ltZftFsiOo12AkQ-04N3B.md index bbcf6b17a..5a1cc9f29 100644 --- a/src/data/roadmaps/data-engineer/content/bigtable@ltZftFsiOo12AkQ-04N3B.md +++ b/src/data/roadmaps/data-engineer/content/bigtable@ltZftFsiOo12AkQ-04N3B.md @@ -1 +1,8 @@ -# BigTable \ No newline at end of file +# BigTable + +Bigtable is a high-performance, scalable database that excels at capturing, processing, and analyzing data in real-time. It aggregates data as it's written, providing immediate insights into user behavior, A/B testing results, and engagement metrics. This real-time capability also fuels AI/ML models for interactive applications. Bigtable integrates seamlessly with both Dataflow, enriching streaming pipelines with low-latency lookups, and BigQuery, enabling real-time serving of analytics in user facing application and ad-hoc querying on the same data. + +Visit the following resources to learn more: + +- [@official@Bigtable: Fast, Flexible NoSQL](https://cloud.google.com/bigtable?hl=en#scale-your-latency-sensitive-applications-with-the-nosql-pioneer) +- [@article@Google Bigtable](https://www.techtarget.com/searchdatamanagement/definition/Google-BigTable) diff --git a/src/data/roadmaps/data-engineer/content/business-intelligence@zA5QqqBMsqymdiPGFdUnt.md b/src/data/roadmaps/data-engineer/content/business-intelligence@zA5QqqBMsqymdiPGFdUnt.md index 070f69524..cfa9b0971 100644 --- a/src/data/roadmaps/data-engineer/content/business-intelligence@zA5QqqBMsqymdiPGFdUnt.md +++ b/src/data/roadmaps/data-engineer/content/business-intelligence@zA5QqqBMsqymdiPGFdUnt.md @@ -1 +1,11 @@ -# Business Intelligence \ No newline at end of file +# Business Intelligence + +Business intelligence encompasses a set of techniques and technologies to transform raw data into meaningful insights that drive strategic decision-making within an organization. BI tools enable business users to access different types of data, historical and current, third-party and in-house, as well as semistructured data and unstructured data such as social media. Users can analyze this information to gain insights into how the business is performing and what it should do next. + +BI platforms traditionally rely on data warehouses for their baseline information. The strength of a data warehouse is that it aggregates data from multiple data sources into one central system to support business data analytics and reporting. BI presents the results to the user in the form of reports, charts and maps, which might be displayed through a dashboard. + +Visit the following resources to learn more: + +- [@article@What is business intelligence (BI)?](https://www.ibm.com/think/topics/business-intelligence) +- [@article@Business intelligence: A complete overview](https://www.tableau.com/business-intelligence/what-is-business-intelligence) +- [@video@What is business intelligence?](https://www.youtube.com/watch?v=l98-BcB3UIE) diff --git a/src/data/roadmaps/data-engineer/content/cap-theorem@AslPFjoakcC44CmPB5nuw.md b/src/data/roadmaps/data-engineer/content/cap-theorem@AslPFjoakcC44CmPB5nuw.md index df8a07253..b80e2649a 100644 --- a/src/data/roadmaps/data-engineer/content/cap-theorem@AslPFjoakcC44CmPB5nuw.md +++ b/src/data/roadmaps/data-engineer/content/cap-theorem@AslPFjoakcC44CmPB5nuw.md @@ -1 +1,10 @@ -# CAP Theorem \ No newline at end of file +# CAP Theorem + +The CAP Theorem, also known as Brewer's Theorem, is a fundamental principle in distributed database systems. It states that in a distributed system, it's impossible to simultaneously guarantee all three of the following properties: Consistency (all nodes see the same data at the same time), Availability (every request receives a response, without guarantee that it contains the most recent version of the data), and Partition tolerance (the system continues to operate despite network failures between nodes). According to the theorem, a distributed system can only strongly provide two of these three guarantees at any given time. This principle guides the design and architecture of distributed systems, influencing decisions on data consistency models, replication strategies, and failure handling. Understanding the CAP Theorem is crucial for designing robust, scalable distributed systems and for choosing appropriate database solutions for specific use cases in distributed computing environments. + +Visit the following resources to learn more: + +- [@article@What is CAP Theorem?](https://www.bmc.com/blogs/cap-theorem/) +- [@article@An Illustrated Proof of the CAP Theorem](https://mwhittaker.github.io/blog/an_illustrated_proof_of_the_cap_theorem/) +- [@article@CAP Theorem and its applications in NoSQL Databases](https://www.ibm.com/uk-en/cloud/learn/cap-theorem) +- [@video@What is CAP Theorem?](https://www.youtube.com/watch?v=_RbsFXWRZ10) diff --git a/src/data/roadmaps/data-engineer/content/cassandra@QYR8ESN7xhi4ZxcoiZbgn.md b/src/data/roadmaps/data-engineer/content/cassandra@QYR8ESN7xhi4ZxcoiZbgn.md index fb6bc61fc..75460123d 100644 --- a/src/data/roadmaps/data-engineer/content/cassandra@QYR8ESN7xhi4ZxcoiZbgn.md +++ b/src/data/roadmaps/data-engineer/content/cassandra@QYR8ESN7xhi4ZxcoiZbgn.md @@ -1 +1,10 @@ -# Cassandra \ No newline at end of file +# Cassandra + +Apache Cassandra is a highly scalable, distributed NoSQL database designed to handle large amounts of structured data across multiple commodity servers. It provides high availability with no single point of failure, offering linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure. Cassandra uses a masterless ring architecture, where all nodes are equal, allowing for easy data distribution and replication. It supports flexible data models and can handle both unstructured and structured data. Cassandra excels in write-heavy environments and is particularly suitable for applications requiring high throughput and low latency. Its data model is based on wide column stores, offering a more complex structure than key-value stores. Widely used in big data applications, Cassandra is known for its ability to handle massive datasets while maintaining performance and reliability. + +Visit the following resources to learn more: + +- [@official@Apache Cassandra](https://cassandra.apache.org/_/index.html) +- [article@Cassandra - Quick Guide](https://www.tutorialspoint.com/cassandra/cassandra_quick_guide.htm) +- [@video@Apache Cassandra - Course for Beginners](https://www.youtube.com/watch?v=J-cSy5MeMOA) +- [@feed@Explore top posts about Backend Development](https://app.daily.dev/tags/backend?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/census@vZGDtlyt_yj4szcPTw3cv.md b/src/data/roadmaps/data-engineer/content/census@vZGDtlyt_yj4szcPTw3cv.md index 978ad7a90..6712aee09 100644 --- a/src/data/roadmaps/data-engineer/content/census@vZGDtlyt_yj4szcPTw3cv.md +++ b/src/data/roadmaps/data-engineer/content/census@vZGDtlyt_yj4szcPTw3cv.md @@ -1 +1,10 @@ -# Census \ No newline at end of file +# Census + +Census is a reverse ETL platform that synchronizes data from a data warehouse to various business applications and SaaS apps like Salesforce and Hubspot. It's a crucial part of the modern data stack, enabling businesses to operationalize their data by making it available in the tools where teams work, like CRMs, marketing platforms, and more. + +Visit the following resources to learn more: + +- [@official@Census](https://www.getcensus.com/reverse-etl) +- [@official@Census Documentation](https://developers.getcensus.com/getting-started/introduction) +- [@article@A starter guide to reverse ETL with Census](https://www.getcensus.com/blog/starter-guide-for-first-time-census-users) +- [@video@How to "Reverse ETL" with Census](https://www.youtube.com/watch?v=XkS7DQFHzbA) diff --git a/src/data/roadmaps/data-engineer/content/cicd@k2SJ4ELGa4B2ZERDAk1uj.md b/src/data/roadmaps/data-engineer/content/cicd@k2SJ4ELGa4B2ZERDAk1uj.md index d2bd67913..ee0e41469 100644 --- a/src/data/roadmaps/data-engineer/content/cicd@k2SJ4ELGa4B2ZERDAk1uj.md +++ b/src/data/roadmaps/data-engineer/content/cicd@k2SJ4ELGa4B2ZERDAk1uj.md @@ -1 +1,11 @@ -# CI/CD \ No newline at end of file +# CI / CD + +**Continuous Integration** is a software development method where team members integrate their work at least once daily. An automated build checks every integration to detect errors in this method. In Continuous Integration, the software is built and tested immediately after a code commit. In a large project with many developers, commits are made many times during the day. With each commit, code is built and tested. + +**Continuous Delivery** is a software engineering method in which a team develops software products in a short cycle. It ensures that software can be easily released at any time. The main aim of continuous delivery is to build, test, and release software with good speed and frequency. It helps reduce the cost, time, and risk of delivering changes by allowing for frequent updates in production. + +Visit the following resources to learn more: + +- [@article@What is CI/CD? Continuous Integration and Continuous Delivery](https://www.guru99.com/continuous-integration.html) +- [@article@Continuous Integration vs Delivery vs Deployment](https://www.guru99.com/continuous-integration-vs-delivery-vs-deployment.html) +- [@article@CI/CD Pipeline: Learn with Example](https://www.guru99.com/ci-cd-pipeline.html) diff --git a/src/data/roadmaps/data-engineer/content/circle-ci@CewITBPtfVs32LD5Acb2E.md b/src/data/roadmaps/data-engineer/content/circle-ci@CewITBPtfVs32LD5Acb2E.md index 18b4e1cac..db9d25692 100644 --- a/src/data/roadmaps/data-engineer/content/circle-ci@CewITBPtfVs32LD5Acb2E.md +++ b/src/data/roadmaps/data-engineer/content/circle-ci@CewITBPtfVs32LD5Acb2E.md @@ -1 +1,10 @@ -# Circle CI \ No newline at end of file +# CircleCI + +CircleCI is a CI/CD service that can be integrated with GitHub, BitBucket and GitLab repositories. The service that can be used as a SaaS offering or self-managed using your own resources. + +Visit the following resources to learn more: + +- [@official@CircleCI](https://circleci.com/) +- [@official@CircleCI Documentation](https://circleci.com/docs) +- [@official@Configuration Tutorial](https://circleci.com/docs/config-intro) +- [@feed@Explore top posts about CI/CD](https://app.daily.dev/tags/cicd?ref=roadmapsh) diff --git a/src/data/roadmaps/data-engineer/content/cloud-architectures@YLfyb_ycgz1hu0yW8SPNE.md b/src/data/roadmaps/data-engineer/content/cloud-architectures@YLfyb_ycgz1hu0yW8SPNE.md index 6ff878155..6a1d8a6f0 100644 --- a/src/data/roadmaps/data-engineer/content/cloud-architectures@YLfyb_ycgz1hu0yW8SPNE.md +++ b/src/data/roadmaps/data-engineer/content/cloud-architectures@YLfyb_ycgz1hu0yW8SPNE.md @@ -1 +1,15 @@ -# Cloud Architectures \ No newline at end of file +# Cloud Architectures + +Cloud architecture refers to how various cloud technology components, such as hardware, virtual resources, software capabilities, and virtual network systems interact and connect to create cloud computing environments. Cloud architecture dictates how components are integrated so that you can pool, share, and scale resources over a network. It acts as a blueprint that defines the best way to strategically combine resources to build a cloud environment for a specific business need. + +Cloud architecture components can included, among others: + +- A frontend platform +- A backend platform +- A cloud-based delivery model +- A network (internet, intranet, or intercloud) + +Visit the following resources to learn more: + +- [@article@What is cloud architecture? - Google](https://cloud.google.com/learn/what-is-cloud-architecture) +- [@video@WWhat is Cloud Architecture and Common Models?](https://www.youtube.com/watch?v=zTP-bx495hU) diff --git a/src/data/roadmaps/data-engineer/content/cloud-computing@lDeSL9qvgQgyAMcWXF7Fr.md b/src/data/roadmaps/data-engineer/content/cloud-computing@lDeSL9qvgQgyAMcWXF7Fr.md index 838d0e7b4..8e9874cb2 100644 --- a/src/data/roadmaps/data-engineer/content/cloud-computing@lDeSL9qvgQgyAMcWXF7Fr.md +++ b/src/data/roadmaps/data-engineer/content/cloud-computing@lDeSL9qvgQgyAMcWXF7Fr.md @@ -1 +1,9 @@ -# Cloud Computing \ No newline at end of file +# Cloud Computing + +**Cloud Computing** refers to the delivery of computing services over the internet rather than using local servers or personal devices. These services include servers, storage, databases, networking, software, analytics, and intelligence. Cloud Computing enables faster innovation, flexible resources, and economies of scale. There are various types of cloud computing such as public clouds, private clouds, and hybrids clouds. Furthermore, it's divided into different services like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). These services differ mainly in the level of control an organization has over their data and infrastructures. + +Learn more from the following resources: + +- [@article@Cloud Computing - IBM](https://www.ibm.com/think/topics/cloud-computing) +- [@article@What is Cloud Computing? - Azure](https://azure.microsoft.com/en-gb/resources/cloud-computing-dictionary/what-is-cloud-computing) +- [@video@What is Cloud Computing? - Amazon Web Services](https://www.youtube.com/watch?v=mxT233EdY5c) diff --git a/src/data/roadmaps/data-engineer/content/cloud-sql-database@9-wQWQIdAxQmMaJC9ojPg.md b/src/data/roadmaps/data-engineer/content/cloud-sql-database@9-wQWQIdAxQmMaJC9ojPg.md index aba15c7bc..d7a6710ab 100644 --- a/src/data/roadmaps/data-engineer/content/cloud-sql-database@9-wQWQIdAxQmMaJC9ojPg.md +++ b/src/data/roadmaps/data-engineer/content/cloud-sql-database@9-wQWQIdAxQmMaJC9ojPg.md @@ -1 +1,9 @@ -# Cloud SQL (Database) \ No newline at end of file +# Cloud SQL (Database) + +Google Cloud SQL is a fully-managed, cost-effective and scalable database service that makes it easy to set-up, maintain, manage and administer MySQL, PostgreSQL, and SQL Server databases in the cloud. Hosted on Google Cloud Platform, Cloud SQL provides a database infrastructure for applications running anywhere. + +Visit the following resources to learn more: + +- [@official@Cloud SQL](https://cloud.google.com/sql) +- [@official@Cloud SQL overview](https://cloud.google.com/sql/docs/introduction) +- [@course@Cloud SQL](https://www.cloudskillsboost.google/course_templates/701) diff --git a/src/data/roadmaps/data-engineer/content/cluster-computing-basics@hB0y8A2U3owpAbTUb7LN5.md b/src/data/roadmaps/data-engineer/content/cluster-computing-basics@hB0y8A2U3owpAbTUb7LN5.md index e100952f3..2ce8fc42f 100644 --- a/src/data/roadmaps/data-engineer/content/cluster-computing-basics@hB0y8A2U3owpAbTUb7LN5.md +++ b/src/data/roadmaps/data-engineer/content/cluster-computing-basics@hB0y8A2U3owpAbTUb7LN5.md @@ -1 +1,6 @@ -# Cluster Computing Basics \ No newline at end of file +# Cluster Computing Basics + +Cluster computing is the process of using multiple computing nodes, called clusters, to increase processing power for solving complex problems, such as Big Data analytics and AI model training. These tasks require parallel processing of millions of data points for complex classification and prediction tasks. Cluster computing technology coordinates multiple computing nodes, each with its own CPUs, GPUs, and internal memory, to work together on the same data processing task. Applications on cluster computing infrastructure run as if on a single machine and are unaware of the underlying system complexities. + + + diff --git a/src/data/roadmaps/data-engineer/content/cluster-management-tools@wpZfbIFtfiUSLMASk4t7f.md b/src/data/roadmaps/data-engineer/content/cluster-management-tools@wpZfbIFtfiUSLMASk4t7f.md index 9b3589f74..057ffc78b 100644 --- a/src/data/roadmaps/data-engineer/content/cluster-management-tools@wpZfbIFtfiUSLMASk4t7f.md +++ b/src/data/roadmaps/data-engineer/content/cluster-management-tools@wpZfbIFtfiUSLMASk4t7f.md @@ -1 +1,5 @@ -# Cluster Management Tools \ No newline at end of file +# Cluster Management Tools + +Cluster management software maximizes the work that a cluster of computers can perform. A cluster manager balances workload to reduce bottlenecks, monitors the health of the elements of the cluster, and manages failover when an element fails. A cluster manager can also help a system administrator to perform administration tasks on elements in the cluster. + +Some of the most popular Cluster Management Tools are Kubernetes and Apache Hadoop YARN. \ No newline at end of file diff --git a/src/data/roadmaps/data-engineer/content/column@fBD6ZQoMac8w4kMJw_Jrd.md b/src/data/roadmaps/data-engineer/content/column@fBD6ZQoMac8w4kMJw_Jrd.md index a68abef02..c4b9f75fe 100644 --- a/src/data/roadmaps/data-engineer/content/column@fBD6ZQoMac8w4kMJw_Jrd.md +++ b/src/data/roadmaps/data-engineer/content/column@fBD6ZQoMac8w4kMJw_Jrd.md @@ -1 +1,9 @@ -# Column \ No newline at end of file +# Column + +A columnar database is a type of No-SQL database that stores data by columns instead of by rows. In a traditional SQL database, all the information for one record is stored together, but in a columnar database, all the values for a single column are stored together. This makes it much faster to read and analyze large amounts of data, especially when you only need a few columns instead of the whole record. For example, if you want to quickly find the average sales price from millions of rows, a columnar database can scan just the "price" column instead of every piece of data. This design is often used in data warehouses and analytics systems because it speeds up queries and saves storage space through better compression. + +Visit the following resources to learn more: + +- [@article@What are columnar databases? Here are 35 examples.](https://www.tinybird.co/blog-posts/what-is-a-columnar-database) +- [@article@Columnar Databases](https://www.techtarget.com/searchdatamanagement/definition/columnar-database) +- [@video@WWhat is a Columnar Database? (vs. Row-oriented Database)](https://www.youtube.com/watch?v=1MnvuNg33pA) diff --git a/src/data/roadmaps/data-engineer/content/hightouch@8NTe5-XQ5tKAWUyg1rnzb.md b/src/data/roadmaps/data-engineer/content/hightouch@8NTe5-XQ5tKAWUyg1rnzb.md index 2b53b0ae0..4e00ee11c 100644 --- a/src/data/roadmaps/data-engineer/content/hightouch@8NTe5-XQ5tKAWUyg1rnzb.md +++ b/src/data/roadmaps/data-engineer/content/hightouch@8NTe5-XQ5tKAWUyg1rnzb.md @@ -1 +1,5 @@ -# Hightouch \ No newline at end of file +# Hightouch + +Hightouch is a reverse ETL and AI platform crafted for marketing and personalization, allowing companies to uncover insights, execute campaigns, and develop AI agents using their data. It features an AI Decisioning Platform for lifecycle marketing and a Composable Customer Data Platform (CDP) that is adaptable, secure, and quick to deploy, built on top of a data warehouse. + + diff --git a/src/data/roadmaps/data-engineer/content/what-is-cluster-computing@Ad10evrGQuYRl5GaMhQwu.md b/src/data/roadmaps/data-engineer/content/what-is-cluster-computing@Ad10evrGQuYRl5GaMhQwu.md index 305654589..f4efb8e98 100644 --- a/src/data/roadmaps/data-engineer/content/what-is-cluster-computing@Ad10evrGQuYRl5GaMhQwu.md +++ b/src/data/roadmaps/data-engineer/content/what-is-cluster-computing@Ad10evrGQuYRl5GaMhQwu.md @@ -1 +1,11 @@ -# What is Cluster Computing \ No newline at end of file +# What is Cluster Computing + +Cluster computing is a type of distributing computing where multiple computers are connected so they work together as a single system. By working together, a cluster of machines can address complex tasks with higher computational power and efficiency. +The term “cluster” refers to the network of linked computer systems programmed to perform the same task. Computing clusters typically consist of servers, workstations and personal computers (PCs) that communicate over a local area network (LAN) or a wide area network (WAN). Each computer or “node,” in a computer network has an operating system (OS) and a central processing unit (CPU) core that handles the tasks required for the software to run properly. + +Visit the following resources to learn more: + +- [@article@What is cluster computing? - IBM](https://www.ibm.com/think/topics/cluster-computing) +- [@article@What is cluster computing? - AWS](https://aws.amazon.com/what-is/cluster-computing/) +- [@article@Computer cluster - Wikipedia](http://en.wikipedia.org/wiki/Computer_cluster) +- [@video@WUnderstand the Basic Cluster Concepts](https://www.youtube.com/watch?v=8BBDxzJL6fY)