Similar Articles

Articles similar to the selected content.

Domain: www.startdataengineering.com Added: 2025-08-13 Status: βœ“ Success
www.startdataengineering.com
How to trigger a spark job from AWS Lambda - Event driven pipelines - Lambda function to trigger spark jobs - Setup and run - Monitoring and logging - Teardown - Conclusion - Further reading - Referen...
Similar Articles (10 found)
πŸ” 72.5% similar
How to submit Spark jobs to EMR cluster from Airflow
https://www.startdataengineering.com/post/how-to-submit-spark-jobs-to-emr-cluster-from-airflow/
How to submit Spark jobs to EMR cluster from Airflow Table of Contents Introduction I have been asked and seen the questions how others are automating...
πŸ” View Similar Articles
πŸ” 62.9% similar
3 Key techniques, to optimize your Apache Spark code
https://www.startdataengineering.com/post/how-to-optimize-your-spark-jobs/
3 Key techniques, to optimize your Apache Spark code - Intro - Distributed Systems - Setup - Optimizing your spark code - Technique 1: reduce data shu...
πŸ” View Similar Articles
πŸ” 61.9% similar
How to Pull Data from an API, Using AWS Lambda
https://www.startdataengineering.com/post/pull-data-from-api-using-lambda-s3/
How to Pull Data from an API, Using AWS Lambda Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a ...
πŸ” View Similar Articles 🟠 HN
πŸ” 60.2% similar
Data Engineering Project for Beginners - Batch edition
https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/
Data Engineering Project for Beginners - Batch edition - 1. Introduction - 2. Objective - 3. Run Data Pipeline - 4. Architecture - 5. Code walkthrough...
πŸ” View Similar Articles 🟠 HN
πŸ” 59.8% similar
Data Engineering Projects
https://www.startdataengineering.com/post/data-engineering-projects/
Data Engineering Projects 1. Introduction Whether you are new to data engineering or have been in the data field for a few years, one of the most chal...
πŸ” View Similar Articles
πŸ” 59.0% similar
How to test PySpark code with pytest
https://www.startdataengineering.com/post/test-pyspark/
How to test PySpark code with pytest - 1. Introduction - 2. Ensure the code’s logic is working as expected with tests - 3. Conclusion - 4. Further Rea...
πŸ” View Similar Articles
πŸ” 57.9% similar
Setting up end-to-end tests for cloud data pipelines
https://www.startdataengineering.com/post/setting-up-e2e-tests/
Setting up end-to-end tests for cloud data pipelines - 1. Introduction - 2. Setting up services locally - 3. Writing an end-to-end data pipeline test ...
πŸ” View Similar Articles
πŸ” 57.2% similar
Why use Apache Airflow (or any orchestrator)?
https://www.startdataengineering.com/post/why-to-use-orchestrators/
Why use Apache Airflow (or any orchestrator)? - 1. Introduction - 2. Features crucial to building and maintaining data pipelines - 3. Conclusion - 4. ...
πŸ” View Similar Articles
πŸ” 56.7% similar
Data Engineering Best Practices - #2. Metadata & Logging
https://www.startdataengineering.com/post/de_best_practices_log/
Data Engineering Best Practices - #2. Metadata & Logging - 1. Introduction - 2. Setup & Logging architecture - 3. Data Pipeline Logging Best Practices...
πŸ” View Similar Articles
πŸ” 55.8% similar
Designing a "low-effort" ELT system, using stitch and dbt
https://www.startdataengineering.com/post/build-a-simple-data-engineering-platform/
Designing a "low-effort" ELT system, using stitch and dbt Intro A very common use case in data engineering is to build a ETL system for a data warehou...
πŸ” View Similar Articles