Content Recommender

What, why, when to use Apache Kafka, with an example

https://www.startdataengineering.com/post/what-why-and-how-apache-kafka/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:53

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

What, why, when to use Apache Kafka, with an example I have seen, heard and been asked questions and comments like What is Kafka and When should I use it? I don’t understand why we have to use Kafka The objective of this post is to get you up to speed with what Apache Kafka is, when to use them and ...

💡 Top Recommendations:

Ensuring Data Quality, With Great Expectations

https://www.startdataengineering.com/post/ensuring-data-quality-with-great-expectations/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:52

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

Ensuring Data Quality, With Great Expectations What is data quality As the name suggest, it refers to the quality of our data. Quality should be defined based on your project requirements. It can be as simple as ensuring a certain column has only the allowed values present or falls within a given ra...

💡 Top Recommendations:

Designing a "low-effort" ELT system, using stitch and dbt

https://www.startdataengineering.com/post/build-a-simple-data-engineering-platform/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:52

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

Designing a "low-effort" ELT system, using stitch and dbt Intro A very common use case in data engineering is to build a ETL system for a data warehouse, to have data loaded in from multiple separate databases to enable data analysts/scientists to be able to run queries on this data, since the sourc...

💡 Top Recommendations:

How to Pull Data from an API, Using AWS Lambda

https://www.startdataengineering.com/post/pull-data-from-api-using-lambda-s3/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:51

Status: ✓ Success

Hacker News: 🟠 4 points, 0 comments

Text length: 10000 characters

www.startdataengineering.com

How to Pull Data from an API, Using AWS Lambda Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice. This post aims to answer questions like the ones shown below ...

💡 Top Recommendations:

How to do Change Data Capture (CDC), using Singer

https://www.startdataengineering.com/post/cdc-using-singer/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:50

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to do Change Data Capture (CDC), using Singer Introduction Change data capture is a software design pattern used to track every change(update, insert, delete) to the data in a database. In most databases these types of changes are added to an append only log (Binlog in MySQL, Write Ahead Log in ...

💡 Top Recommendations:

How to unit test sql transforms in dbt

https://www.startdataengineering.com/post/how-to-test-sql-using-dbt/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:49

Status: ✓ Success

Text length: 8052 characters

www.startdataengineering.com

How to unit test sql transforms in dbt Introduction With the recent advancements in data warehouses and tools like dbt most transformations(T of ELT) are being done directly in the data warehouse. While this provides a lot of functionality out of the box, it gets tricky when you want to test your sq...

💡 Top Recommendations:

How to Join a fact and a type 2 dimension (SCD2) table

https://www.startdataengineering.com/post/how-to-join-fact-scd2-tables/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:48

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to Join a fact and a type 2 dimension (SCD2) table - Introduction - What is an SCD2 table and why use it? - Setup - Joining fact and SCD2 tables - Conclusion - Further reading Introduction If you are using a data warehouse, you would have heard of fact and dimension tables. Simply put, fact tabl...

💡 Top Recommendations:

How to update millions of records in MySQL?

https://www.startdataengineering.com/post/update-mysql-in-batch/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:48

Status: ✓ Success

Text length: 9355 characters

www.startdataengineering.com

How to update millions of records in MySQL? - Introduction - Setup - Problems with a single large update - Updating in batches - Conclusion - Further reading Introduction When updating a large number of records in an OLTP database, such as MySQL, you have to be mindful about locking the records. If ...

💡 Top Recommendations:

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

https://www.startdataengineering.com/post/cicd-dbt/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:47

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake - Introduction - Pre-requisites - Setting up the data-ops pipeline - Conclusion and next steps - Further reading - References Introduction With companies realizing the importance of having correct data, there has been a lot of atte...

💡 Top Recommendations:

Apache Superset Tutorial

https://www.startdataengineering.com/post/apache-superset-tutorial/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:47

Status: ✓ Success

Text length: 6890 characters

www.startdataengineering.com

Apache Superset Tutorial - Why data exploration - Apache Superset architecture - Setup - Using Apache Superset - Pros and Cons - Conclusion Why data exploration In most companies the end users of a data warehouse are analysts, data scientists and business people. Visualizing data is a powerful tool ...

💡 Top Recommendations:

How to trigger a spark job from AWS Lambda

https://www.startdataengineering.com/post/trigger-emr-spark-job-from-lambda/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:46

Status: ✓ Success

Text length: 8456 characters

www.startdataengineering.com

How to trigger a spark job from AWS Lambda - Event driven pipelines - Lambda function to trigger spark jobs - Setup and run - Monitoring and logging - Teardown - Conclusion - Further reading - References Event driven pipelines Event driven systems represent a software design pattern where a logic is...

💡 Top Recommendations:

Writing memory efficient data pipelines in Python

https://www.startdataengineering.com/post/writing-memory-efficient-dps-in-python/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:45

Status: ✓ Success

Text length: 9568 characters

www.startdataengineering.com

Writing memory efficient data pipelines in Python - Introduction - 1. Using generators - 2. Using distributed frameworks - Conclusion - Further reading - References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fi...

💡 Top Recommendations:

How to gather requirements to re-engineer a legacy data pipeline

https://www.startdataengineering.com/post/how-to-gather-requirements-reengineering-data-pipeline/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:45

Status: ✓ Success

Text length: 6817 characters

www.startdataengineering.com

How to gather requirements to re-engineer a legacy data pipeline Introduction As data engineers, you will have to re-engineer legacy data pipelines. While re-engineering data pipelines, if you have struggled with a lack of clarity of deliverables among the project’s stakeholders. constantly being qu...

💡 Top Recommendations:

Designing a Data Project to Impress Hiring Managers

https://www.startdataengineering.com/post/data-engineering-project-to-impress-hiring-managers/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:44

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

Designing a Data Project to Impress Hiring Managers - Introduction - Objective - Setup - Project - Future Work - Tear down infra - Conclusion - Further Reading - References Introduction Building a data project for your portfolio is hard. Getting hiring managers to read through your Github code is ev...

💡 Top Recommendations:

How to make data pipelines idempotent

https://www.startdataengineering.com/post/why-how-idempotent-data-pipeline/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:44

Status: ✓ Success

Text length: 6161 characters

www.startdataengineering.com

How to make data pipelines idempotent - What is an idempotent function - Pre-requisites - Why idempotency matters - Making your data pipeline idempotent - Conclusion - Further reading - References What is an idempotent function “Idempotence is the property of certain operations in mathematics and co...

💡 Top Recommendations:

4 Key Patterns to Load Data Into A Data Warehouse

https://www.startdataengineering.com/post/patterns-to-load-data-into-data-warehouse/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:43

Status: ✓ Success

Text length: 5186 characters

www.startdataengineering.com

4 Key Patterns to Load Data Into A Data Warehouse Introduction Loading data into a data warehouse is a key component of most data pipelines. If you are wondering How to handle SQL loads What are the patterns used to load data into a data warehouse? Then this post is for you. In this post, we go over...

💡 Top Recommendations:

How to Validate Datatypes in Python

https://www.startdataengineering.com/post/how-to-validate-datatypes-in-python/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:43

Status: ✓ Success

Text length: 7244 characters

www.startdataengineering.com

How to Validate Datatypes in Python - Introduction - Using Native Python - Using Pydantic - Pydantic Caveats - Conclusion - Further reading - References Introduction Data type issues are one of the biggest concerns when processing data in python. If you are wondering how to Make sure that a column i...

💡 Top Recommendations:

How to Scale Your Data Pipelines

https://www.startdataengineering.com/post/scale-data-pipelines/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:42

Status: ✓ Success

Text length: 5281 characters

www.startdataengineering.com

How to Scale Your Data Pipelines - 1. Introduction - 2. What is scaling & why do we need it? - 3. Types of scaling - 4. Choose your scaling strategy - 5. Conclusion - 6. Further reading - 7. References 1. Introduction Choosing tools/frameworks to scale your data pipelines can be confusing. If you ha...

💡 Top Recommendations:

Understand & Deliver on Your Data Engineering Task

https://www.startdataengineering.com/post/how-to-deliver-on-your-de-tasks/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:42

Status: ✓ Success

Text length: 8587 characters

www.startdataengineering.com

Understand & Deliver on Your Data Engineering Task - 1. Introduction - 2. Understanding your data engineering task - 3. Delivering your data engineering task - 4. Conclusion - 5. Further reading 1. Introduction Congratulations! You are given a quick overview of the business and data architecture and...

💡 Top Recommendations:

What is a staging area?

https://www.startdataengineering.com/post/what-and-why-staging/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:41

Status: ✓ Success

Text length: 3680 characters

www.startdataengineering.com

What is a staging area? - 1. Introduction - 2. What is a staging area - 3. The advantages of having a staging area - 5. Conclusion - 6. Further reading 1. Introduction Working with data pipelines, you might have noticed a staging area in most data pipelines. If you work in the data space and have qu...

💡 Top Recommendations:

Sort & Filter Options

Read Status

Hacker News

Sort By

Filter By