Content Recommender

What are the types of data quality checks?

https://www.startdataengineering.com/post/types-of-dq-checks/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:17

Status: ✓ Success

Text length: 9836 characters

www.startdataengineering.com

What are the types of data quality checks? - 1. Introduction - 2. Data Quality(DQ) checks are run as part of your pipeline - 3. Run a background data monitoring job - 4. Not all DQ failures require you to stop the pipeline - 5. Cost of DQ checks - 6. Data quality tools - 7. Conclusion - 8. Further r...

💡 Top Recommendations:

Data Engineering Interview Preparation Series #1: Data Structures and Algorithms

https://www.startdataengineering.com/post/de_interview_dsa/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:16

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

Data Engineering Interview Preparation Series #1: Data Structures and Algorithms - 1. Introduction - 2. Data structures and algorithms to know - 3. Common DSA questions asked during DE interviews - 4. Company specific research - 5. Conclusion - 6. Further reading 1. Introduction Preparing for data e...

💡 Top Recommendations:

How to build a data project with step-by-step instructions

https://www.startdataengineering.com/post/de-proj-step-by-step/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:15

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to build a data project with step-by-step instructions - 1. Introduction - 2. Setup - 3. Parts of data engineering - 3.1. Requirements - 3.2. Identify what tool to use to process data - 3.3. Data flow architecture - 3.4. Data quality implementation - 3.5. Code organization - 3.6. Code testing - ...

💡 Top Recommendations:

What are the Key Parts of Data Engineering?

https://www.startdataengineering.com/post/parts-of-dataengineering/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:15

Status: ✓ Success

Text length: 5468 characters

www.startdataengineering.com

What are the Key Parts of Data Engineering? 1. Introduction If you are trying to break into (or land a new) data engineering job, you will inevitably encounter a slew of data engineering tools. The list of tools/frameworks to know can be overwhelming. If you are wondering What are the parts of data ...

💡 Top Recommendations:

How to use nested data types effectively in SQL

https://www.startdataengineering.com/post/use-structs-sql/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:14

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to use nested data types effectively in SQL - 1. Introduction - 2. Code & Data - 3. Using nested data types effectively - 4. Conclusion - 5. Continue reading 1. Introduction If you have worked in the data space, you’d inevitably come across tables with so many columns that it gets difficult to r...

💡 Top Recommendations:

How to decide on a data project for your portfolio

https://www.startdataengineering.com/post/what-data-project-to-build/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:14

Status: ✓ Success

Text length: 7753 characters

www.startdataengineering.com

How to decide on a data project for your portfolio 1. Introduction Whether you are looking to improve your data skills or building portfolio projects to land a job, you would have faced the issue of deciding what and how to build data projects. If you are Struggling to decide what tools/frameworks t...

💡 Top Recommendations:

25 SQL tips to level up your data engineering skills

https://www.startdataengineering.com/post/n-sql-tips-de/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:13

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

25 SQL tips to level up your data engineering skills - Introduction - Setup - SQL tips - 1. Handy functions for common data processing scenarios - 1.1. Need to filter on WINDOW function without CTE/Subquery use QUALIFY - 1.2. Need the first/last row in a partition, use DISTINCT ON - 1.3. STRUCT data...

💡 Top Recommendations:

How to reference a seed from a different dbt project?

https://www.startdataengineering.com/post/ref-seed-from-diff-dbt-project/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:12

Status: ✓ Success

Text length: 5130 characters

www.startdataengineering.com

How to reference a seed from a different dbt project? - 1. Introduction - 2. Ways to reuse seed data across multiple dbt projects - 3. dbt deps = download all dependency packages to your local dbt_packages folder - 4. Conclusion - 5. Further reading 1. Introduction If your company has multiple dbt p...

💡 Top Recommendations:

What do Snowflake, Databricks, Redshift, BigQuery actually do?

https://www.startdataengineering.com/post/sf-v-dbx/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:12

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

What do Snowflake, Databricks, Redshift, BigQuery actually do? - 1. Introduction - 2. Analytical databases aggregate large amounts of data - 3. Most platforms enable you to do the same thing but have different strengths - 3.1. Understand how the platforms process data - 3.1.1. A compute engine is a ...

💡 Top Recommendations:

Data Engineering Interview Preparation Series #2: System Design

https://www.startdataengineering.com/post/de_interview_sd/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:11

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

Data Engineering Interview Preparation Series #2: System Design - 1. Introduction - 2. Guide the interviewer through the process - 2.1. [Requirements gathering] Make sure you clearly understand the requirements & business use case - 2.2. [Understand source data] Know what you have to work with - 2.3...

💡 Top Recommendations:

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

https://www.startdataengineering.com/post/quick-scalable-business-value-pipeline/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:10

Status: ✓ Success

Text length: 5800 characters

www.startdataengineering.com

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline? 1. Introduction If you’ve been in the data space long enough, you would have come across really long SQL scripts that someone had written years ago. However, no one dares to touch them, as they may be powering some i...

💡 Top Recommendations:

How to ensure consistent metrics in your warehouse

https://www.startdataengineering.com/post/metrics_sot/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:10

Status: ✓ Success

Text length: 3347 characters

www.startdataengineering.com

How to ensure consistent metrics in your warehouse 1. Introduction If you’ve worked on a data team, you’ve likely encountered situations where multiple teams define metrics in slightly different ways, leaving you to untangle why discrepancies exist. The root cause of these metric deviations often st...

💡 Top Recommendations:

Should Data Pipelines in Python be Function based or Object-Oriented (OOP)?

https://www.startdataengineering.com/post/python-fp-v-oop/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:09

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

Should Data Pipelines in Python be Function based or Object-Oriented (OOP)? - 1. Introduction - 2. Data transformations as functions lead to maintainable code - 3. Objects help track things (aka state) - 4. Class lets you define reusable code and pipeline patterns - 5. Functional code uses objects v...

💡 Top Recommendations:

How to quickly deliver data to business users? #1. Adv Data types & Schema evolution

https://www.startdataengineering.com/post/deliver-data-quickly-with-schema-evolution-and-adv-data-types/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:08

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to quickly deliver data to business users? #1. Adv Data types & Schema evolution - 1. Introduction - 2. Use Schema evolution & advanced data types to quickly deliver new columns to the end-user - 3. Create systems to effectively leverage schema evolution - 3.1. Auto schema evolution is high-risk...

💡 Top Recommendations:

How to Manage Upstream Schema Changes in Data Driven Fast Moving Company

https://www.startdataengineering.com/post/how-to-manage-upstream-schema-changes-in-data-driven-fast-moving-company/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:08

Status: ✓ Success

Text length: 4834 characters

www.startdataengineering.com

How to Manage Upstream Schema Changes in Data Driven Fast Moving Company - 1. Introduction - 2.Strategies for data teams to handle changing schemas - 3. Conclusion - 4. Recommended reading 1. Introduction If you have worked at a company that moves fast (or claims to), you’ve inevitably had to deal w...

💡 Top Recommendations:

Visual Studio Code (VSCode) extensions for data engineers

https://www.startdataengineering.com/post/vscode-extensions-for-data-engineers/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:08

Status: ✓ Success

Text length: 5515 characters

www.startdataengineering.com

Visual Studio Code (VSCode) extensions for data engineers - 1. Introduction - 2. Python environment setup - 3. VSCode Primer - 4. Extensions overview - 5. Privacy, Performance, and Cognitive Overload - 6. Conclusion - 7. Recommended reading 1. Introduction Whether you are setting up visual studio co...

💡 Top Recommendations:

How to create an SCD2 Table using MERGE INTO with Spark & Iceberg

https://www.startdataengineering.com/post/create-scd2-table-with-merge-into-with-spark-iceberg/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:07

Status: ✓ Success

Text length: 9871 characters

www.startdataengineering.com

How to create an SCD2 Table using MERGE INTO with Spark & Iceberg - 1. Introduction - 2. MERGE INTO is used to UPDATE/DELETE/INSERT rows into a target table based on data in the source table - 3. SCD2 table pipeline: INSERT new data, UPDATE existing data, and DELETE stale data - 4. Conclusion - 5. R...

💡 Top Recommendations:

Data Engineering Interview Preparation Series #3: SQL

https://www.startdataengineering.com/post/de_interview_sql/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:06

Status: ✓ Success

Text length: 8080 characters

www.startdataengineering.com

Data Engineering Interview Preparation Series #3: SQL - 1. Introduction - 2. Step-by-step process to solve any SQL interview question - 3. Lead the conversation with a step-by-step approach and stating assumptions - 4. Conclusion - 5. Further reading 1. Introduction Every data engineering interview ...

💡 Top Recommendations:

How to Extract Data from APIs for Data Pipelines using Python

https://www.startdataengineering.com/post/how-to-extract-data-from-api-for-data-pipelines/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:06

Status: ✓ Success

Text length: 10000 characters

www.startdataengineering.com

How to Extract Data from APIs for Data Pipelines using Python - 1. Introduction - 2. APIs are a way to communicate between systems on the Internet - 3. API Data extraction = GET-ting data from a server - 4. Conclusion - 5. Further reading 1. Introduction Extracting data is one of the critical skills...

💡 Top Recommendations:

CTEs(Common Table Expression) or Temporary Tables for Spark SQL

https://www.startdataengineering.com/post/cte-or-temp-table/

Domain: www.startdataengineering.com

Added: 2025-08-13 20:55:05

Status: ✓ Success

Text length: 8142 characters

www.startdataengineering.com

CTEs(Common Table Expression) or Temporary Tables for Spark SQL - 1. Introduction - 2. CTE for short clean code & temp tables for re-usability - 3. Conclusion - 4. Recommended reading 1. Introduction As a data engineer, CTEs(Common Table Expression) are one of the best techniques you can use to impr...

💡 Top Recommendations:

Sort & Filter Options

Read Status

Hacker News

Sort By

Filter By