Content Recommender

All URLs in your content database.

Showing 20 of 636 URL(s) (Page 17 of 32)

SQL or Python for Data Transformations?

www.startdataengineering.com
SQL or Python for Data Transformations? - 1. Introduction - 2. Code is an interface to the execution engine - 3. How to choose the execution engine and the coding interface - 4. Conclusion - 5. Further reading - 6. References 1. Introduction If you follow the data space, you would have noticed two c...
💡 Top Recommendations:

Why use Apache Airflow (or any orchestrator)?

www.startdataengineering.com
Why use Apache Airflow (or any orchestrator)? - 1. Introduction - 2. Features crucial to building and maintaining data pipelines - 3. Conclusion - 4. Further reading 1. Introduction Are you trying to understand why someone would use a system like Airflow (or Dagster) to run simple scripts? If you ar...
💡 Top Recommendations:

How to implement data quality checks with greatexpectations

www.startdataengineering.com
How to implement data quality checks with greatexpectations - 1. Introduction - 2. Project overview - 3. Check your data before making it available to end-users; Write-Audit-Publish(WAP) pattern - 4. TL;DR: How the greatexpectations library works - 5. From an implementation perspective, there are fo...
💡 Top Recommendations:

What are the types of data quality checks?

www.startdataengineering.com
What are the types of data quality checks? - 1. Introduction - 2. Data Quality(DQ) checks are run as part of your pipeline - 3. Run a background data monitoring job - 4. Not all DQ failures require you to stop the pipeline - 5. Cost of DQ checks - 6. Data quality tools - 7. Conclusion - 8. Further r...
💡 Top Recommendations:

Data Engineering Interview Preparation Series #1: Data Structures and Algorithms

www.startdataengineering.com
Data Engineering Interview Preparation Series #1: Data Structures and Algorithms - 1. Introduction - 2. Data structures and algorithms to know - 3. Common DSA questions asked during DE interviews - 4. Company specific research - 5. Conclusion - 6. Further reading 1. Introduction Preparing for data e...
💡 Top Recommendations:

How to build a data project with step-by-step instructions

www.startdataengineering.com
How to build a data project with step-by-step instructions - 1. Introduction - 2. Setup - 3. Parts of data engineering - 3.1. Requirements - 3.2. Identify what tool to use to process data - 3.3. Data flow architecture - 3.4. Data quality implementation - 3.5. Code organization - 3.6. Code testing - ...
💡 Top Recommendations:

What are the Key Parts of Data Engineering?

www.startdataengineering.com
What are the Key Parts of Data Engineering? 1. Introduction If you are trying to break into (or land a new) data engineering job, you will inevitably encounter a slew of data engineering tools. The list of tools/frameworks to know can be overwhelming. If you are wondering What are the parts of data ...
💡 Top Recommendations:

How to use nested data types effectively in SQL

www.startdataengineering.com
How to use nested data types effectively in SQL - 1. Introduction - 2. Code & Data - 3. Using nested data types effectively - 4. Conclusion - 5. Continue reading 1. Introduction If you have worked in the data space, you’d inevitably come across tables with so many columns that it gets difficult to r...
💡 Top Recommendations:

How to decide on a data project for your portfolio

www.startdataengineering.com
How to decide on a data project for your portfolio 1. Introduction Whether you are looking to improve your data skills or building portfolio projects to land a job, you would have faced the issue of deciding what and how to build data projects. If you are Struggling to decide what tools/frameworks t...
💡 Top Recommendations:

25 SQL tips to level up your data engineering skills

www.startdataengineering.com
25 SQL tips to level up your data engineering skills - Introduction - Setup - SQL tips - 1. Handy functions for common data processing scenarios - 1.1. Need to filter on WINDOW function without CTE/Subquery use QUALIFY - 1.2. Need the first/last row in a partition, use DISTINCT ON - 1.3. STRUCT data...
💡 Top Recommendations:

How to reference a seed from a different dbt project?

www.startdataengineering.com
How to reference a seed from a different dbt project? - 1. Introduction - 2. Ways to reuse seed data across multiple dbt projects - 3. dbt deps = download all dependency packages to your local dbt_packages folder - 4. Conclusion - 5. Further reading 1. Introduction If your company has multiple dbt p...
💡 Top Recommendations:

What do Snowflake, Databricks, Redshift, BigQuery actually do?

www.startdataengineering.com
What do Snowflake, Databricks, Redshift, BigQuery actually do? - 1. Introduction - 2. Analytical databases aggregate large amounts of data - 3. Most platforms enable you to do the same thing but have different strengths - 3.1. Understand how the platforms process data - 3.1.1. A compute engine is a ...
💡 Top Recommendations:

Data Engineering Interview Preparation Series #2: System Design

www.startdataengineering.com
Data Engineering Interview Preparation Series #2: System Design - 1. Introduction - 2. Guide the interviewer through the process - 2.1. [Requirements gathering] Make sure you clearly understand the requirements & business use case - 2.2. [Understand source data] Know what you have to work with - 2.3...
💡 Top Recommendations:

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

www.startdataengineering.com
How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline? 1. Introduction If you’ve been in the data space long enough, you would have come across really long SQL scripts that someone had written years ago. However, no one dares to touch them, as they may be powering some i...
💡 Top Recommendations:

How to ensure consistent metrics in your warehouse

www.startdataengineering.com
How to ensure consistent metrics in your warehouse 1. Introduction If you’ve worked on a data team, you’ve likely encountered situations where multiple teams define metrics in slightly different ways, leaving you to untangle why discrepancies exist. The root cause of these metric deviations often st...
💡 Top Recommendations:

Should Data Pipelines in Python be Function based or Object-Oriented (OOP)?

www.startdataengineering.com
Should Data Pipelines in Python be Function based or Object-Oriented (OOP)? - 1. Introduction - 2. Data transformations as functions lead to maintainable code - 3. Objects help track things (aka state) - 4. Class lets you define reusable code and pipeline patterns - 5. Functional code uses objects v...
💡 Top Recommendations:

How to quickly deliver data to business users? #1. Adv Data types & Schema evolution

www.startdataengineering.com
How to quickly deliver data to business users? #1. Adv Data types & Schema evolution - 1. Introduction - 2. Use Schema evolution & advanced data types to quickly deliver new columns to the end-user - 3. Create systems to effectively leverage schema evolution - 3.1. Auto schema evolution is high-risk...
💡 Top Recommendations:

How to Manage Upstream Schema Changes in Data Driven Fast Moving Company

www.startdataengineering.com
How to Manage Upstream Schema Changes in Data Driven Fast Moving Company - 1. Introduction - 2.Strategies for data teams to handle changing schemas - 3. Conclusion - 4. Recommended reading 1. Introduction If you have worked at a company that moves fast (or claims to), you’ve inevitably had to deal w...
💡 Top Recommendations:

Visual Studio Code (VSCode) extensions for data engineers

www.startdataengineering.com
Visual Studio Code (VSCode) extensions for data engineers - 1. Introduction - 2. Python environment setup - 3. VSCode Primer - 4. Extensions overview - 5. Privacy, Performance, and Cognitive Overload - 6. Conclusion - 7. Recommended reading 1. Introduction Whether you are setting up visual studio co...
💡 Top Recommendations:

How to create an SCD2 Table using MERGE INTO with Spark & Iceberg

www.startdataengineering.com
How to create an SCD2 Table using MERGE INTO with Spark & Iceberg - 1. Introduction - 2. MERGE INTO is used to UPDATE/DELETE/INSERT rows into a target table based on data in the source table - 3. SCD2 table pipeline: INSERT new data, UPDATE existing data, and DELETE stale data - 4. Conclusion - 5. R...
💡 Top Recommendations: