Skip to main content

Introduction

This section provides best practice guidance and tools to migrate data processing applications from self-managed environments to Amazon EMR.

  1. EMR Migration Guide : This is a comprehensive technical document that provides guidance for migrating various components including data, application, security configurations etc from self-managed data processing applictions to Amazon EMR

  2. Data Migration: We recommend using AWS Datasync for migrating HDFS to S3. Start with this Data Sync support for HDFS blog to review Datasync capabilities and how to get started with Data migrations

  3. Data pipelines Migrations: The following tools can be useful in migrating your current data pipelines to AWS

    1. Oozie to MWAA
    2. Oozie to stepfunctions
  4. Data Governance: The following tools can helpful in migrating your current data catalogs to AWS

    1. Migrate metadata between Hive metastore and AWS Glue Data Catalog
    2. Hive Glue Catalog Sync Agent

For further assistance reach out to aws-bdms-emr@amazon.com