Content Recommender

All URLs in your content database.

Showing 20 of 636 URL(s) (Page 16 of 32)

Data Pipeline Design Patterns - #2. Coding patterns in Python

www.startdataengineering.com
Data Pipeline Design Patterns - #2. Coding patterns in Python - Introduction - Sample project - Code design patterns - Python helpers - Misc - Conclusion - Further reading - References Introduction Using the appropriate code design pattern can make your code easy to read, extensible, and seamless to...
πŸ’‘ Top Recommendations:

Change Data Capture, with Debezium

www.startdataengineering.com
Change Data Capture, with Debezium Introduction Change data capture is a pattern where every change to a row in a table is captured and sent to downstream systems. If you have wondered How to ingest data from multiple databases into your data warehouse? How to make data available for analytical quer...
πŸ’‘ Top Recommendations:

How to become a valuable data engineer

www.startdataengineering.com
How to become a valuable data engineer 1. Introduction So you are a new data engineer (or looking for a DE job) and want to better yourself as a data engineer. However, when you look at job postings or company tech stack, you are overwhelmed by the sheer amount of tools you have to learn! You feel o...
πŸ’‘ Top Recommendations:

Data Engineering Project: Stream Edition

www.startdataengineering.com
Data Engineering Project: Stream Edition - 1. Introduction - 2. Sample project - 3. Streaming concepts - 4. Future work - 5. Conclusion - 6. Further reading - 7. References 1. Introduction Stream processing differs from batch; one needs to be mindful of the system’s memory, event order, and system r...
πŸ’‘ Top Recommendations:

Data Engineering Best Practices - #1. Data flow & Code

www.startdataengineering.com
Data Engineering Best Practices - #1. Data flow & Code - 1. Introduction - 2. Sample project - 3. Best practices - 3.1. Use standard patterns that progressively transform your data - 3.2. Ensure data is valid before exposing it to its consumers (aka data quality checks) - 3.3. Avoid data duplicates ...
πŸ’‘ Top Recommendations:

What is a self-serve data platform & how to build one

www.startdataengineering.com
What is a self-serve data platform & how to build one - 1. Introduction - 2. What is self-serve? - 3. Building a self-serve data platform - 4. Conclusion - 5. Further reading - 6. References 1. Introduction Most companies want to build a self-serve data platform. But what does a self-serve data plat...
πŸ’‘ Top Recommendations:

What is an Open Table Format? & Why to use one?

www.startdataengineering.com
What is an Open Table Format? & Why to use one? - 1. Introduction - 2. What is an Open Table Format (OTF) - 3. Why use an Open Table Format (OTF) - 4. Conclusion - 5. Further reading - 6. References 1. Introduction If you are in the data space, you might have heard of open table formats such as Apac...
πŸ’‘ Top Recommendations:

6 Steps to Avoid Messy Data in Your Warehouse

www.startdataengineering.com
6 Steps to Avoid Messy Data in Your Warehouse - 1. Introduction - 2. Six Steps for a Clean Data Warehouse - 2.1. Understand the business - 2.2. Make data easy to use with the appropriate data model - 2.3. Good input data is necessary for a good data warehouse - 2.4. Define Source of Truth (SOT) and ...
πŸ’‘ Top Recommendations:

Uplevel your dbt workflow with these tools and techniques

www.startdataengineering.com
Uplevel your dbt workflow with these tools and techniques - 1. Introduction - 2. Setup - 3. Ways to uplevel your dbt workflow - 3.1. Reproducible environment - 3.2. Reduce feedback loop time when developing locally - 3.3. Reduce the amount of code to write using dbt packages - 3.4. Validate data bef...
πŸ’‘ Top Recommendations:

Data Engineering Best Practices - #2. Metadata & Logging

www.startdataengineering.com
Data Engineering Best Practices - #2. Metadata & Logging - 1. Introduction - 2. Setup & Logging architecture - 3. Data Pipeline Logging Best Practices - 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline - 3.2. Obtain visibility into the code’s execution sequence us...
πŸ’‘ Top Recommendations:

How to test PySpark code with pytest

www.startdataengineering.com
How to test PySpark code with pytest - 1. Introduction - 2. Ensure the code’s logic is working as expected with tests - 3. Conclusion - 4. Further Reading - 5. References 1. Introduction Have you worked, or are you working with a code base that β€œmoved fast” but had zero to no tests? Every minor feat...
πŸ’‘ Top Recommendations:

Docker Fundamentals for Data Engineers

www.startdataengineering.com
Docker Fundamentals for Data Engineers 1. Introduction Docker can be overwhelming to start with. Most data projects use Docker to set up the data infra locally (and often in production). Setting up data tools locally without Docker is (usually)a nightmare! The official docker documentation, while ex...
πŸ’‘ Top Recommendations:

How to reduce your Snowflake cost

www.startdataengineering.com
How to reduce your Snowflake cost - 1. Introduction - 2. Snowflake pricing and settings inheritance model - 3. Strategies to reduce Snowflake cost - 4. Conclusion - 5. Read more about using Snowflake - 6. References 1. Introduction Most data engineers love Snowflake, it is easy to get started, there...
πŸ’‘ Top Recommendations:

Building Cost Efficient Data Pipelines with Python & DuckDB

www.startdataengineering.com
Building Cost Efficient Data Pipelines with Python & DuckDB - 1. Introduction - 2. Project demo - 3. TL;DR - 4. Considerations when building pipelines with DuckDB - 4.1. ⭐ Use DuckDB to process data, not for multiple users to access data - 4.2. βœ… Cost calculation: DuckDB + Ephemeral VMs = dirt cheap...
πŸ’‘ Top Recommendations:

Enable stakeholder data access with Text-to-SQL RAGs

www.startdataengineering.com
Enable stakeholder data access with Text-to-SQL RAGs - 1. Introduction - 2. TL;DR - 3. Enabling Stakeholder data access with RAGs - 3.1. Set up - 3.2. Loading: Read raw data and convert them into LlamaIndex data structures - 3.3. Indexing: Generate & store numerical representation of your data - 3.4...
πŸ’‘ Top Recommendations:

dbt(Data Build Tool) Tutorial

www.startdataengineering.com
dbt(Data Build Tool) Tutorial 1. Introduction If you are a student, analyst, engineer, or anyone in the data space and are curious about what dbt is and how to use it. Then this post is for you. If you are keen to understand why dbt is widely used, please read this article . 2. Dbt, the T in ELT In ...
πŸ’‘ Top Recommendations:

Build Data Engineering Projects, with Free Template

www.startdataengineering.com
Build Data Engineering Projects, with Free Template - 1. Introduction - 2. Run Data Pipeline - 3. Architecture and services in this template - 4. CI/CD setup - 5. Putting it all together with a Makefile - 6. Data projects using other tools and services - 7. Conclusion - 8. Further reading - 9. Refer...
πŸ’‘ Top Recommendations:

Python Essentials for Data Engineers

www.startdataengineering.com
Python Essentials for Data Engineers - Introduction - Data is stored on disk and processed in memory - Practicing Python - Python basics - Python is used for extracting data from sources, transforming it, & loading it into a destination - [Extract & Load] Read and write data to any system - [Transfo...
πŸ’‘ Top Recommendations:

Data Engineering Projects

www.startdataengineering.com
Data Engineering Projects 1. Introduction Whether you are new to data engineering or have been in the data field for a few years, one of the most challenging parts of learning new frameworks is setting them up! Data infra is notoriously hard to set up. You want to improve your skills on a specific t...
πŸ’‘ Top Recommendations:

Data Engineering Project for Beginners - Batch edition

www.startdataengineering.com
Data Engineering Project for Beginners - Batch edition - 1. Introduction - 2. Objective - 3. Run Data Pipeline - 4. Architecture - 5. Code walkthrough - 6. Design considerations - 7. Next steps - 8. Conclusion - 9. Further reading - 10. References 1. Introduction An actual data engineering project u...
πŸ’‘ Top Recommendations: