mirror of
https://github.com/kamranahmedse/developer-roadmap.git
synced 2025-08-19 23:53:24 +02:00
add content to data engineer roadmap
This commit is contained in:
@@ -1 +1,7 @@
|
||||
# Apache Hadoop YARN
|
||||
# Apache Hadoop YARN
|
||||
|
||||
Apache Hadoop YARN (Yet Another Resource Negotiator) is the part of Hadoop that manages resources and runs jobs on a cluster. It has a ResourceManager that controls all cluster resources and an ApplicationMaster for each job that schedules and runs tasks. YARN lets different tools like MapReduce and Spark share the same cluster, making it more efficient, flexible, and reliable.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@video@Hadoop Yarn Tutorial](https://www.youtube.com/watch?v=6bIF9VwRwE0)
|
@@ -1 +1,16 @@
|
||||
# Choosing the Right Technologies
|
||||
# Choosing the Right Technologies
|
||||
|
||||
The data engineering ecosystem is rapidly expanding, and selecting the right technologies for your use case can be challenging. Below you can find some considerations for choosing data technologies across the data engineering lifecycle:
|
||||
|
||||
- **Team size and capabilities.** Your team's size will determine the amount of bandwidth your team can dedicate to complex solutions. For small teams, try to stick to simple solutions and technologies your team is familiar with.
|
||||
- **Interoperability**. When choosing a technology or system, you’ll need to ensure that it interacts and operates smoothly with other technologies.
|
||||
- **Cost optimization and business value,** Consider direct and indirect costs of a technology and the opportunity cost of choosing some technologies over others.
|
||||
- **Location** Companies have many options when it comes to choosing where to run their technology stack, including cloud providers, on-premises systems, hybrid clouds, and multicloud.
|
||||
- **Build versus buy**. Depending on your needs and capabilities, you can either invest in building your own technologies, implement open-source solutions, or purchase proprietary solutions and services.
|
||||
- **Server versus serverless**. Depending on your needs, you may prefer server-based setups, where developers manage servers, or serverless systems, which translates the server management to cloud providers, allowing developers to focus solely on writing code.
|
||||
|
||||
|
||||
Visit the following resources to learn more:
|
||||
- [@article@Build hybrid and multicloud architectures using Google Cloud](https://cloud.google.com/architecture/hybrid-multicloud-patterns)
|
||||
- [@article@The Unfulfilled Promise of Serverless](https://www.lastweekinaws.com/blog/the-unfulfilled-promise-of-serverless/)
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
@@ -1 +1,16 @@
|
||||
# Data Engineering Lifecycle
|
||||
# Data Engineering Lifecycle
|
||||
|
||||
The data engineering lifecycle encompasses the entire process of transforming raw data into a useful end product. It involves several stages, each with specific roles and responsibilities. This lifecycle ensures that data is handled efficiently and effectively, from its initial generation to its final consumption.
|
||||
|
||||
It involves 4 steps:
|
||||
|
||||
1. Data Generation: Collecting data from various source systems.
|
||||
2. Data Storage: Safely storing data for future processing and analysis.
|
||||
3. Data Ingestion: Transforming and bringing data into a centralized system.
|
||||
4. Data Data Serving: Providing data to end-users for decision-making and operational purposes.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Data Engineering Lifecycle](hhttps://medium.com/towards-data-engineering/data-engineering-lifecycle-d1e7ee81632e)
|
||||
- [@video@Getting Into Data Engineering](https://www.youtube.com/watch?v=hZu_87l62J4)
|
||||
- [@book@Fundamentals of Data Engineering](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
|
||||
|
@@ -1 +1,8 @@
|
||||
# Data Engineering vs Data Science
|
||||
# Data Engineering vs Data Science
|
||||
|
||||
Data engineering and data science are distinct but complementary roles within the field of data. Data engineering focuses on building and maintaining the infrastructure for data collection, storage, and processing, essentially creating the systems that make data available for downstream users. On the other hand, data science professionals, like data analysts and data scientists, uses that data to extract insights, build predictive models, and ultimately inform decision-making.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Data Scientist vs Data Engineer](https://www.datacamp.com/blog/data-scientist-vs-data-engineer)
|
||||
- [@video@Should You Be a Data Scientist, Analyst or Engineer?](https://www.youtube.com/watch?v=dUnKYhripIE)
|
@@ -1 +1,12 @@
|
||||
# Go
|
||||
# Go
|
||||
|
||||
Go, also known as Golang, is a statically typed, compiled programming language designed by Google. It combines the efficiency of compiled languages with the ease of use of dynamically typed interpreted languages. Go features built-in concurrency support through goroutines and channels, making it well-suited for networked and multicore systems. It has a simple and clean syntax, fast compilation times, and efficient garbage collection. Go's standard library is comprehensive, reducing the need for external dependencies. The language emphasizes simplicity and readability, with features like implicit interfaces and a lack of inheritance. Go is particularly popular for building microservices, web servers, and distributed systems. Its performance, simplicity, and robust tooling make it a favored choice for cloud-native development, DevOps tools, and large-scale backend systems.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@roadmap@Visit Dedicated Go Roadmap](https://roadmap.sh/golang)
|
||||
- [@official@Go Reference Documentation](https://go.dev/doc/)
|
||||
- [@article@Go by Example - annotated example programs](https://gobyexample.com/)
|
||||
- [@article@Go, the Programming Language of the Cloud](https://thenewstack.io/go-the-programming-language-of-the-cloud/)
|
||||
- [@video@Go Programming – Golang Course with Bonus Projects](https://www.youtube.com/watch?v=un6ZyFkqFKo)
|
||||
- [@feed@Explore top posts about Golang](https://app.daily.dev/tags/golang?ref=roadmapsh)
|
@@ -1 +1,10 @@
|
||||
# Introduction
|
||||
# Introduction
|
||||
|
||||
Data engineers are responsible for laying the foundations for the acquisition, storage, transformation, and management of data in an organization. They manage the design, creation, and maintenance of database architecture and data processing systems, ensuring that the subsequent work of analysis, BI, and machine learning model development can be carried out seamlessly, continuously, securely, and effectively.
|
||||
|
||||
Data engineers are one of the most technical profiles in the field of data science, bridging the gap between software and application developers and traditional data science positions.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@How to Become a Data Engineer in 2025: 5 Steps for Career Success](https://www.datacamp.com/blog/how-to-become-a-data-engineer)
|
||||
- [@video@What Does a Data Engineer ACTUALLY Do?](https://www.youtube.com/watch?v=hTjo-QVWcK0)
|
@@ -1 +1,13 @@
|
||||
# Java
|
||||
# Java
|
||||
|
||||
Java has had a big influence on data engineering because many core big data tools and frameworks, like Hadoop, Spark (originally in Scala, which runs on the JVM), and Kafka, are built using Java or run on the Java Virtual Machine (JVM). This means Java’s performance, scalability, and cross-platform capabilities have shaped how large-scale data processing systems are designed.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@courseIntroduction to Java by Hyperskill (JetBrains Academy)](https://hyperskill.org/courses/8)
|
||||
- [@book@Thinking in Java](https://www.amazon.co.uk/Thinking-Java-Eckel-Bruce-February/dp/B00IBON6C6)
|
||||
- [@article@Effective Java](https://www.amazon.com/Effective-Java-Joshua-Bloch/dp/0134685997)
|
||||
- [@book@Java: The Complete Reference](https://www.amazon.co.uk/gp/product/B09JL8BMK7/ref=dbs_a_def_rwt_bibl_vppi_i2)
|
||||
- [@video@Java Tutorial for Beginners](https://www.youtube.com/watch?v=eIrMbAQSU34&feature=youtu.be)
|
||||
- [@video@Java + DSA + Interview Preparation Course (For beginners)](https://www.youtube.com/playlist?list=PL9gnSGHSqcnr_DxHsP7AW9ftq0AtAyYqJ)
|
||||
- [@feed@Explore top posts about Java](https://app.daily.dev/tags/java?ref=roadmapsh)
|
@@ -1 +1,3 @@
|
||||
# Programming Skills
|
||||
# Programming Skills
|
||||
|
||||
To be successful as a data engineer, you need to be proficient in coding. This involves knowning basic concepts and principles that form the foundation of any computer programming language. These include understanding variables, which store data for processing, control structures such as loops and conditional statements that direct the flow of a program, data structures which organize and store data efficiently, and algorithms which step by step instructions to solve specific problems or perform specific tasks.
|
@@ -1 +1,12 @@
|
||||
# Python
|
||||
# Python
|
||||
|
||||
Python’s inherent characteristics and the wealth of resources that have grown around it have made it the data engineer’s language of choice. Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically-typed and garbage-collected.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@Python Website](https://www.python.org/)
|
||||
- [@article@Python - Wiki](https://en.wikipedia.org/wiki/Python_(programming_language))
|
||||
- [@article@Tutorial Series: How to Code in Python](https://www.digitalocean.com/community/tutorials/how-to-write-your-first-python-3-program)
|
||||
- [@article@Google's Python Class](https://developers.google.com/edu/python)
|
||||
- [@video@Learn Python - Full Course](https://www.youtube.com/watch?v=4M87qBgpafk)
|
||||
- [@feed@Explore top posts about Python](https://app.daily.dev/tags/python?ref=roadmapsh)
|
@@ -1 +1,9 @@
|
||||
# Scala
|
||||
# Scala
|
||||
|
||||
Scala is a programming language that combines the strengths of object-oriented and functional programming, and it runs on the Java Virtual Machine (JVM). In data engineering, Scala is especially important because Apache Spark, one of the most popular big data processing frameworks, was written in Scala. This means Scala can use Spark’s features directly and efficiently, often with cleaner and more concise code than Java. Its ability to handle complex data transformations with less code makes it a powerful tool for building fast, scalable data pipelines.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@official@The Scala Programming Language](https://www.scala-lang.org/)
|
||||
- [@article@Scala for Beginners: An Introduction](https://daily.dev/blog/scala-for-beginners-an-introduction)
|
||||
- [@video@Scala Tutorial](https://www.youtube.com/playlist?list=PLS1QulWo1RIagob5D6kMIAvu7DQC5VTh3)
|
||||
|
@@ -1 +1,29 @@
|
||||
# Skills and Responsibilities
|
||||
# Skills and Responsibilities
|
||||
|
||||
Here’s a list of essential data engineering skills:
|
||||
|
||||
1. SQL & Database Management: Ability to query, manipulate, and design relational databases efficiently using SQL. This is the bread-and-butter for extracting, transforming, and analyzing data.
|
||||
|
||||
2. Data Modeling: Designing schemas and structures (star, snowflake, normalized forms) to optimize storage, performance, and usability of data.
|
||||
|
||||
3. ETL/ELT Development: Building Extract-Transform-Load (or Load-Transform) pipelines to move and reshape data between systems while ensuring quality and consistency.
|
||||
|
||||
4. Big Data Frameworks: Proficiency with tools like Apache Spark, Hadoop, or Flink to process and analyze massive datasets in distributed environments.
|
||||
|
||||
5. Cloud Platforms: Working knowledge of AWS, Azure, or GCP for storage, compute, and orchestration (e.g., S3, BigQuery, Dataflow, Redshift).
|
||||
|
||||
6. Data Warehousing: Understanding concepts and tools (Snowflake, BigQuery, Redshift) for centralizing, optimizing, and querying large volumes of business data.
|
||||
|
||||
7. Workflow Orchestration: Using tools like Apache Airflow, Prefect, or Dagster to automate and schedule complex data pipelines reliably.
|
||||
|
||||
8. Scripting & Programming: Strong skills in Python or Scala for building data processing scripts, automation tasks, and integration with APIs.
|
||||
|
||||
9. Data Governance & Security: Applying practices for data quality, lineage tracking, access control, compliance (GDPR, HIPAA), and encryption.
|
||||
|
||||
10. Monitoring & Performance Optimization: Setting up alerts, logging, and tuning pipelines to ensure they run efficiently, catch errors early, and scale smoothly.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@Top Data Engineer Skills and Responsibilities](https://www.simplilearn.com/data-engineer-role-article)
|
||||
- [@article@5 Essential Data Engineering Skills For 2025](https://www.datacamp.com/blog/essential-data-engineering-skills)
|
||||
- [@video@What skills do you need as a Data Engineer?](https://www.youtube.com/watch?v=sF04UxNAvmg)
|
@@ -1 +1,9 @@
|
||||
# What is Data Engineering?
|
||||
# What is Data Engineering?
|
||||
|
||||
Data engineering is the practice of designing and building systems for the aggregation, storage and analysis of data at scale. Data engineers excel at creating and deploying algorithms, data pipelines and workflows that sort raw data into ready-to-use datasets. Data engineering is an integral component of the modern data platform and makes it possible for businesses to analyze and apply the data they receive, regardless of the data source or format.
|
||||
|
||||
Visit the following resources to learn more:
|
||||
|
||||
- [@article@What is data engineering?](https://www.ibm.com/think/topics/data-engineering)
|
||||
- [@article@How to Become a Data Engineer in 2025: 5 Steps for Career Success](https://www.datacamp.com/blog/how-to-become-a-data-engineer)
|
||||
- [@video@WHow Data Engineering Works?](https://www.youtube.com/watch?v=qWru-b6m030)
|
Reference in New Issue
Block a user