Showing 20 of 732 URL(s)
(Page 22 of 37)
What are the types of data quality checks?
What are the types of data quality checks?
- 1. Introduction
- 2. Data Quality(DQ) checks are run as part of your pipeline
- 3. Run a background data monitoring job
- 4. Not all DQ failures require you to stop the pipeline
- 5. Cost of DQ checks
- 6. Data quality tools
- 7. Conclusion
- 8. Further r...
💡 Top Recommendations:
Data Engineering Interview Preparation Series #1: Data Structures and Algorithms
Data Engineering Interview Preparation Series #1: Data Structures and Algorithms
- 1. Introduction
- 2. Data structures and algorithms to know
- 3. Common DSA questions asked during DE interviews
- 4. Company specific research
- 5. Conclusion
- 6. Further reading
1. Introduction
Preparing for data e...
💡 Top Recommendations:
How to build a data project with step-by-step instructions
How to build a data project with step-by-step instructions
- 1. Introduction
- 2. Setup
- 3. Parts of data engineering
- 3.1. Requirements
- 3.2. Identify what tool to use to process data
- 3.3. Data flow architecture
- 3.4. Data quality implementation
- 3.5. Code organization
- 3.6. Code testing
- ...
💡 Top Recommendations:
What are the Key Parts of Data Engineering?
What are the Key Parts of Data Engineering?
1. Introduction
If you are trying to break into (or land a new) data engineering job, you will inevitably encounter a slew of data engineering tools. The list of tools/frameworks to know can be overwhelming. If you are wondering
What are the parts of data ...
💡 Top Recommendations:
How to use nested data types effectively in SQL
How to use nested data types effectively in SQL
- 1. Introduction
- 2. Code & Data
- 3. Using nested data types effectively
- 4. Conclusion
- 5. Continue reading
1. Introduction
If you have worked in the data space, you’d inevitably come across tables with so many columns that it gets difficult to r...
💡 Top Recommendations:
How to decide on a data project for your portfolio
How to decide on a data project for your portfolio
1. Introduction
Whether you are looking to improve your data skills or building portfolio projects to land a job, you would have faced the issue of deciding what and how to build data projects. If you are
Struggling to decide what tools/frameworks t...
💡 Top Recommendations:
25 SQL tips to level up your data engineering skills
25 SQL tips to level up your data engineering skills
- Introduction
- Setup
- SQL tips
- 1. Handy functions for common data processing scenarios
- 1.1. Need to filter on WINDOW function without CTE/Subquery use QUALIFY
- 1.2. Need the first/last row in a partition, use DISTINCT ON
- 1.3. STRUCT data...
💡 Top Recommendations:
How to reference a seed from a different dbt project?
How to reference a seed from a different dbt project?
- 1. Introduction
- 2. Ways to reuse seed data across multiple dbt projects
- 3. dbt deps = download all dependency packages to your local dbt_packages folder
- 4. Conclusion
- 5. Further reading
1. Introduction
If your company has multiple dbt p...
💡 Top Recommendations:
What do Snowflake, Databricks, Redshift, BigQuery actually do?
What do Snowflake, Databricks, Redshift, BigQuery actually do?
- 1. Introduction
- 2. Analytical databases aggregate large amounts of data
- 3. Most platforms enable you to do the same thing but have different strengths
- 3.1. Understand how the platforms process data
- 3.1.1. A compute engine is a ...
💡 Top Recommendations:
Data Engineering Interview Preparation Series #2: System Design
Data Engineering Interview Preparation Series #2: System Design
- 1. Introduction
- 2. Guide the interviewer through the process
- 2.1. [Requirements gathering] Make sure you clearly understand the requirements & business use case
- 2.2. [Understand source data] Know what you have to work with
- 2.3...
💡 Top Recommendations:
How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?
How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?
1. Introduction
If you’ve been in the data space long enough, you would have come across really long SQL scripts that someone had written years ago. However, no one dares to touch them, as they may be powering some i...
💡 Top Recommendations:
How to ensure consistent metrics in your warehouse
How to ensure consistent metrics in your warehouse
1. Introduction
If you’ve worked on a data team, you’ve likely encountered situations where multiple teams define metrics in slightly different ways, leaving you to untangle why discrepancies exist.
The root cause of these metric deviations often st...
💡 Top Recommendations:
Should Data Pipelines in Python be Function based or Object-Oriented (OOP)?
Should Data Pipelines in Python be Function based or Object-Oriented (OOP)?
- 1. Introduction
- 2. Data transformations as functions lead to maintainable code
- 3. Objects help track things (aka state)
- 4. Class lets you define reusable code and pipeline patterns
- 5. Functional code uses objects v...
💡 Top Recommendations:
How to quickly deliver data to business users? #1. Adv Data types & Schema evolution
How to quickly deliver data to business users? #1. Adv Data types & Schema evolution
- 1. Introduction
- 2. Use Schema evolution & advanced data types to quickly deliver new columns to the end-user
- 3. Create systems to effectively leverage schema evolution
- 3.1. Auto schema evolution is high-risk...
💡 Top Recommendations:
How to Manage Upstream Schema Changes in Data Driven Fast Moving Company
How to Manage Upstream Schema Changes in Data Driven Fast Moving Company
- 1. Introduction
- 2.Strategies for data teams to handle changing schemas
- 3. Conclusion
- 4. Recommended reading
1. Introduction
If you have worked at a company that moves fast (or claims to), you’ve inevitably had to deal w...
💡 Top Recommendations:
Visual Studio Code (VSCode) extensions for data engineers
Visual Studio Code (VSCode) extensions for data engineers
- 1. Introduction
- 2. Python environment setup
- 3. VSCode Primer
- 4. Extensions overview
- 5. Privacy, Performance, and Cognitive Overload
- 6. Conclusion
- 7. Recommended reading
1. Introduction
Whether you are setting up visual studio co...
💡 Top Recommendations:
How to create an SCD2 Table using MERGE INTO with Spark & Iceberg
How to create an SCD2 Table using MERGE INTO with Spark & Iceberg
- 1. Introduction
- 2. MERGE INTO is used to UPDATE/DELETE/INSERT rows into a target table based on data in the source table
- 3. SCD2 table pipeline: INSERT new data, UPDATE existing data, and DELETE stale data
- 4. Conclusion
- 5. R...
💡 Top Recommendations:
Data Engineering Interview Preparation Series #3: SQL
Data Engineering Interview Preparation Series #3: SQL
- 1. Introduction
- 2. Step-by-step process to solve any SQL interview question
- 3. Lead the conversation with a step-by-step approach and stating assumptions
- 4. Conclusion
- 5. Further reading
1. Introduction
Every data engineering interview ...
💡 Top Recommendations:
How to Extract Data from APIs for Data Pipelines using Python
How to Extract Data from APIs for Data Pipelines using Python
- 1. Introduction
- 2. APIs are a way to communicate between systems on the Internet
- 3. API Data extraction = GET-ting data from a server
- 4. Conclusion
- 5. Further reading
1. Introduction
Extracting data is one of the critical skills...
💡 Top Recommendations:
CTEs(Common Table Expression) or Temporary Tables for Spark SQL
CTEs(Common Table Expression) or Temporary Tables for Spark SQL
- 1. Introduction
- 2. CTE for short clean code & temp tables for re-usability
- 3. Conclusion
- 4. Recommended reading
1. Introduction
As a data engineer, CTEs(Common Table Expression) are one of the best techniques you can use to impr...
💡 Top Recommendations: