Similar Articles

3 Key techniques, to optimize your Apache Spark code

https://www.startdataengineering.com/post/how-to-optimize-your-spark-jobs/

Domain: www.startdataengineering.com Added: 2025-08-13 Status: ✓ Success

www.startdataengineering.com

3 Key techniques, to optimize your Apache Spark code - Intro - Distributed Systems - Setup - Optimizing your spark code - Technique 1: reduce data shuffle - Technique 2. Use caching, when necessary - ...

Similar Articles (10 found)

https://www.startdataengineering.com/post/how-to-submit-spark-jobs-to-emr-cluster-from-airflow/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

How to submit Spark jobs to EMR cluster from Airflow Table of Contents Introduction I have been asked and seen the questions how others are automating...

🔍 View Similar Articles

https://www.startdataengineering.com/post/trigger-emr-spark-job-from-lambda/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

How to trigger a spark job from AWS Lambda - Event driven pipelines - Lambda function to trigger spark jobs - Setup and run - Monitoring and logging -...

🔍 View Similar Articles

https://www.startdataengineering.com/post/sf-v-dbx/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

What do Snowflake, Databricks, Redshift, BigQuery actually do? - 1. Introduction - 2. Analytical databases aggregate large amounts of data - 3. Most p...

🔍 View Similar Articles

https://www.startdataengineering.com/post/cost-effective-pipelines/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

Building Cost Efficient Data Pipelines with Python & DuckDB - 1. Introduction - 2. Project demo - 3. TL;DR - 4. Considerations when building pipelines...

🔍 View Similar Articles

https://pub.towardsai.net/mastering-hadoop-part-3-hadoop-ecosystem-get-the-most-out-of-your-cluster-746a94cf5afd?source=rss----98111c9905da---4

pub.towardsai.net 2025-09-01

pub.towardsai.net

Member-only story Mastering Hadoop, Part 3: Hadoop Ecosystem: Get the most out of your cluster Exploring the Hadoop ecosystem — key tools to maximize ...

🔍 View Similar Articles

https://www.startdataengineering.com/post/spark-local-setup/

www.startdataengineering.com 2025-08-28

www.startdataengineering.com

How to quickly set up a local Spark development environment? - 1. Introduction - 2. Setup - 3. Use VSCode devcontainers to set up Spark environment - ...

🔍 View Similar Articles

https://www.startdataengineering.com/post/improve-sql-skills-de/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

How to improve at SQL as a data engineer - 1. Introduction - 2. SQL skills - 3. Practice - 4. Conclusion - 5. Further reading - 6. References 1. Intro...

🔍 View Similar Articles

https://www.startdataengineering.com/post/deliver-data-quickly-with-schema-evolution-and-adv-data-types/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

How to quickly deliver data to business users? #1. Adv Data types & Schema evolution - 1. Introduction - 2. Use Schema evolution & advanced data types...

🔍 View Similar Articles

https://www.startdataengineering.com/post/de_best_practices/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

Data Engineering Best Practices - #1. Data flow & Code - 1. Introduction - 2. Sample project - 3. Best practices - 3.1. Use standard patterns that pro...

🔍 View Similar Articles

https://www.startdataengineering.com/post/data-engineering-project-for-beginners-batch-edition/

www.startdataengineering.com 2025-08-13

www.startdataengineering.com

Data Engineering Project for Beginners - Batch edition - 1. Introduction - 2. Objective - 3. Run Data Pipeline - 4. Architecture - 5. Code walkthrough...

🔍 View Similar Articles 🟠 HN