Similar Articles

Articles similar to the selected content.

Domain: www.startdataengineering.com Added: 2025-08-13 Status: ✓ Success
www.startdataengineering.com
3 Key techniques, to optimize your Apache Spark code - Intro - Distributed Systems - Setup - Optimizing your spark code - Technique 1: reduce data shuffle - Technique 2. Use caching, when necessary - ...
Similar Articles (10 found)
🔍 65.2% similar
How to submit Spark jobs to EMR cluster from Airflow
https://www.startdataengineering.com/post/how-to-submit-spark-jobs-to-emr-cluster-from-airflow/
How to submit Spark jobs to EMR cluster from Airflow Table of Contents Introduction I have been asked and seen the questions how others are automating...
🔍 View Similar Articles
🔍 62.9% similar
How to trigger a spark job from AWS Lambda
https://www.startdataengineering.com/post/trigger-emr-spark-job-from-lambda/
How to trigger a spark job from AWS Lambda - Event driven pipelines - Lambda function to trigger spark jobs - Setup and run - Monitoring and logging -...
🔍 View Similar Articles
🔍 62.2% similar
What do Snowflake, Databricks, Redshift, BigQuery actually do?
https://www.startdataengineering.com/post/sf-v-dbx/
What do Snowflake, Databricks, Redshift, BigQuery actually do? - 1. Introduction - 2. Analytical databases aggregate large amounts of data - 3. Most p...
🔍 View Similar Articles
🔍 61.3% similar
Building Cost Efficient Data Pipelines with Python & DuckDB
https://www.startdataengineering.com/post/cost-effective-pipelines/
Building Cost Efficient Data Pipelines with Python & DuckDB - 1. Introduction - 2. Project demo - 3. TL;DR - 4. Considerations when building pipelines...
🔍 View Similar Articles
🔍 59.8% similar
Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster
https://pub.towardsai.net/mastering-hadoop-part-3-hadoop-ecosystem-get-the-most-out-of-your-cluster-746a94cf5afd?source=rss----98111c9905da---4
Member-only story Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster Exploring the Hadoop ecosystem — key tools to maximize ...
🔍 View Similar Articles
🔍 59.6% similar
How to quickly set up a local Spark development environment?
https://www.startdataengineering.com/post/spark-local-setup/
How to quickly set up a local Spark development environment? - 1. Introduction - 2. Setup - 3. Use VSCode devcontainers to set up Spark environment - ...
🔍 View Similar Articles
🔍 59.1% similar
How to improve at SQL as a data engineer
https://www.startdataengineering.com/post/improve-sql-skills-de/
How to improve at SQL as a data engineer - 1. Introduction - 2. SQL skills - 3. Practice - 4. Conclusion - 5. Further reading - 6. References 1. Intro...
🔍 View Similar Articles
🔍 58.5% similar
How to quickly deliver data to business users? #1. Adv Data types & Schema evolution
https://www.startdataengineering.com/post/deliver-data-quickly-with-schema-evolution-and-adv-data-types/
How to quickly deliver data to business users? #1. Adv Data types & Schema evolution - 1. Introduction - 2. Use Schema evolution & advanced data types...
🔍 View Similar Articles
🔍 58.3% similar
Data Engineering Best Practices - #1. Data flow & Code
https://www.startdataengineering.com/post/de_best_practices/
Data Engineering Best Practices - #1. Data flow & Code - 1. Introduction - 2. Sample project - 3. Best practices - 3.1. Use standard patterns that pro...
🔍 View Similar Articles
🔍 57.9% similar
Data Engineering Project for Beginners - Batch edition
https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/
Data Engineering Project for Beginners - Batch edition - 1. Introduction - 2. Objective - 3. Run Data Pipeline - 4. Architecture - 5. Code walkthrough...
🔍 View Similar Articles 🟠 HN