Which EC2 instance type to use for a data warehouse running PostgreSQL


Let's say I want to run a data warehouse on an EC2 instance.

It is PostgreSQL-based.

The reason we cannot use RDS, is because it does not support the extensions that we require.

What would be the best instance type to use for this use case?

I think we need something with a fast local disk, and not using EBS. But I am open to any suggestions.

Accepted Answer

Usually Storage Optimized EC2 Instances(Specifically I3 and I3en) would be best choice for data warehousing use cases. If you look at Storage Optimized Instances section at Amazon EC2 Instance Types, these provide NVMe SSD storage which would be way faster than traditional SSD storage option generally available in EBS.

There may be other factors and considerations as well, which can come into play but as a general guidance first choice should be Storage Optimized Instances, if they fulfill your use case requirements. Also, you can consider D2 and D3 instances under Storage Optimized Instances category if they fit better than I3 and I3en for your use cases.

  • Just ensure you have a backup/bcp strategy as local storage isn’t highly available.

