Skip to main content

Introduction

Welcome to the EMR Best Practices Guides. The goal of this project is to offer a set of best practices, templates and guides for operating Amazon EMR. We elected to publish this guidance to GitHub so we could iterate quickly, provide timely and effective recommendations for variety of concerns, and easily incorporate suggestions from the broader community.

We currently have published guides for the following topics:

  • Cost Optimizations
  • Reliability
  • Security
  • Features
    • Managed Scaling
    • Spot Usage
  • Applications
    • Hadoop
    • HBase
    • Hive
    • Spark
  • Architecture
    • Batch
    • Ad Hoc
    • Notebooks
    • Datalake Storage
  • Amazon EMR utilities github here
  • Amazon EMR Benchmarking Guide

Contributing

We encourage you to contribute to these guides. If you have implemented a practice that has proven to be effective, please share it with us by opening an issue or a pull request. Similarly, if you discover an error or flaw in the guidance we've already published, please submit a PR to correct it.