Skip to content

Can I use Amazon S3 for Hadoop storage instead of HDFS?

1 minute read
0

I want to configure Amazon EMR to use Amazon Simple Storage Service (Amazon S3) as the Apache Hadoop storage system instead of the Hadoop Distributed File System (HDFS).

Resolution

Although the EMR File System (EMRFS) uses Amazon S3 as storage, you can't configure Amazon EMR to use Amazon S3 as the Hadoop storage layer. HDFS is an implementation of the Hadoop FileSystem API that models POSIX file system behavior. EMRFS is an object store, not a file system. For more information, see Object Stores vs. FileSystems on the Apache Hadoop website.

For more information about the storage layer in Amazon EMR, see Storage.

For information about when to use each file system, see Work with storage and file systems.

AWS OFFICIALUpdated a year ago
2 Comments

EMRFS has "file system" in the name but you are saying it is not a file system. This is very confusing and counter intuitive for users.

replied 2 years ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

AWS
MODERATOR
replied a year ago