Content Recommender

All URLs in your content database.

Showing 20 of 636 URL(s) (Page 14 of 32)

3 Key Points to Help You Partition Late Arriving Events

www.startdataengineering.com
3 Key Points to Help You Partition Late Arriving Events One of the most common issues when ingesting and processing user generated events is, how to deal with late arriving events. Yet this topic is not extensively discussed. Some of the general issues that data engineers usually have are β€œWhat shou...
πŸ’‘ Top Recommendations:

A proven approach to land a Data Engineering job

www.startdataengineering.com
A proven approach to land a Data Engineering job I have seen and been asked the following questions by students, backend engineers and analysts who want to get into the data engineering industry. What approach should i take to land a Data Engineering job? I really want to get into DE. What can I do ...
πŸ’‘ Top Recommendations:

What Does It Mean for a Column to Be Indexed

www.startdataengineering.com
What Does It Mean for a Column to Be Indexed When optimizing queries on a database table, most developers tend to just create an index on the field to be queried. They have questions like I don’t really understand what it means for a column to be β€œindexed” in addition to simply boosting the efficien...
πŸ’‘ Top Recommendations:

What, why, when to use Apache Kafka, with an example

www.startdataengineering.com
What, why, when to use Apache Kafka, with an example I have seen, heard and been asked questions and comments like What is Kafka and When should I use it? I don’t understand why we have to use Kafka The objective of this post is to get you up to speed with what Apache Kafka is, when to use them and ...
πŸ’‘ Top Recommendations:

Ensuring Data Quality, With Great Expectations

www.startdataengineering.com
Ensuring Data Quality, With Great Expectations What is data quality As the name suggest, it refers to the quality of our data. Quality should be defined based on your project requirements. It can be as simple as ensuring a certain column has only the allowed values present or falls within a given ra...
πŸ’‘ Top Recommendations:

Designing a "low-effort" ELT system, using stitch and dbt

www.startdataengineering.com
Designing a "low-effort" ELT system, using stitch and dbt Intro A very common use case in data engineering is to build a ETL system for a data warehouse, to have data loaded in from multiple separate databases to enable data analysts/scientists to be able to run queries on this data, since the sourc...
πŸ’‘ Top Recommendations:

How to Pull Data from an API, Using AWS Lambda

www.startdataengineering.com
How to Pull Data from an API, Using AWS Lambda Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice. This post aims to answer questions like the ones shown below ...
πŸ’‘ Top Recommendations:

How to do Change Data Capture (CDC), using Singer

www.startdataengineering.com
How to do Change Data Capture (CDC), using Singer Introduction Change data capture is a software design pattern used to track every change(update, insert, delete) to the data in a database. In most databases these types of changes are added to an append only log (Binlog in MySQL, Write Ahead Log in ...
πŸ’‘ Top Recommendations:

How to unit test sql transforms in dbt

www.startdataengineering.com
How to unit test sql transforms in dbt Introduction With the recent advancements in data warehouses and tools like dbt most transformations(T of ELT) are being done directly in the data warehouse. While this provides a lot of functionality out of the box, it gets tricky when you want to test your sq...
πŸ’‘ Top Recommendations:

How to Join a fact and a type 2 dimension (SCD2) table

www.startdataengineering.com
How to Join a fact and a type 2 dimension (SCD2) table - Introduction - What is an SCD2 table and why use it? - Setup - Joining fact and SCD2 tables - Conclusion - Further reading Introduction If you are using a data warehouse, you would have heard of fact and dimension tables. Simply put, fact tabl...
πŸ’‘ Top Recommendations:

How to update millions of records in MySQL?

www.startdataengineering.com
How to update millions of records in MySQL? - Introduction - Setup - Problems with a single large update - Updating in batches - Conclusion - Further reading Introduction When updating a large number of records in an OLTP database, such as MySQL, you have to be mindful about locking the records. If ...
πŸ’‘ Top Recommendations:

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

www.startdataengineering.com
How to set up a dbt data-ops workflow, using dbt cloud and Snowflake - Introduction - Pre-requisites - Setting up the data-ops pipeline - Conclusion and next steps - Further reading - References Introduction With companies realizing the importance of having correct data, there has been a lot of atte...
πŸ’‘ Top Recommendations:

Apache Superset Tutorial

www.startdataengineering.com
Apache Superset Tutorial - Why data exploration - Apache Superset architecture - Setup - Using Apache Superset - Pros and Cons - Conclusion Why data exploration In most companies the end users of a data warehouse are analysts, data scientists and business people. Visualizing data is a powerful tool ...
πŸ’‘ Top Recommendations:

How to trigger a spark job from AWS Lambda

www.startdataengineering.com
How to trigger a spark job from AWS Lambda - Event driven pipelines - Lambda function to trigger spark jobs - Setup and run - Monitoring and logging - Teardown - Conclusion - Further reading - References Event driven pipelines Event driven systems represent a software design pattern where a logic is...
πŸ’‘ Top Recommendations:

Writing memory efficient data pipelines in Python

www.startdataengineering.com
Writing memory efficient data pipelines in Python - Introduction - 1. Using generators - 2. Using distributed frameworks - Conclusion - Further reading - References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fi...
πŸ’‘ Top Recommendations:

How to gather requirements to re-engineer a legacy data pipeline

www.startdataengineering.com
How to gather requirements to re-engineer a legacy data pipeline Introduction As data engineers, you will have to re-engineer legacy data pipelines. While re-engineering data pipelines, if you have struggled with a lack of clarity of deliverables among the project’s stakeholders. constantly being qu...
πŸ’‘ Top Recommendations:

Designing a Data Project to Impress Hiring Managers

www.startdataengineering.com
Designing a Data Project to Impress Hiring Managers - Introduction - Objective - Setup - Project - Future Work - Tear down infra - Conclusion - Further Reading - References Introduction Building a data project for your portfolio is hard. Getting hiring managers to read through your Github code is ev...
πŸ’‘ Top Recommendations:

How to make data pipelines idempotent

www.startdataengineering.com
How to make data pipelines idempotent - What is an idempotent function - Pre-requisites - Why idempotency matters - Making your data pipeline idempotent - Conclusion - Further reading - References What is an idempotent function β€œIdempotence is the property of certain operations in mathematics and co...
πŸ’‘ Top Recommendations:

4 Key Patterns to Load Data Into A Data Warehouse

www.startdataengineering.com
4 Key Patterns to Load Data Into A Data Warehouse Introduction Loading data into a data warehouse is a key component of most data pipelines. If you are wondering How to handle SQL loads What are the patterns used to load data into a data warehouse? Then this post is for you. In this post, we go over...
πŸ’‘ Top Recommendations:

How to Validate Datatypes in Python

www.startdataengineering.com
How to Validate Datatypes in Python - Introduction - Using Native Python - Using Pydantic - Pydantic Caveats - Conclusion - Further reading - References Introduction Data type issues are one of the biggest concerns when processing data in python. If you are wondering how to Make sure that a column i...
πŸ’‘ Top Recommendations: