Hello,
As part of a SaaS solution, I'm currently setting up the structure for a S3 bucket which will contian multiple clients' data.
The idea is to use one access point per client, in order to isolate the different client's data.
To be clear, the data is not made accessible to the client (not directly at least). The bucket is only used to absorb data to be used for processing and analysis purposes.
This data is saved into different folders depending on the source type, so for example in a given access point one could have
/images/
/logs/
etc.
However, I'm unsure whether I should add extra partitioning to that, for a few reasons.
For example, one is file collision. Suppose access point A has a file /images/tree.png, and then access point B tries to add a file with the same path, how is the collision handled?
That could be solved with a something like hash suffix, but I'd still like to know what would happen.
Then there is the question of scalability. This is not an issue per se, but I'm trying to think about what could happen in the future. It seems to me that having an extra partition on top of the access point would make it easier in the future, if there's any migration / refactoring that are needed.
My solution would be to add the organisation id as prefix. Each access point would only have access (through the policy) to files in a specific subdirectory, like /12345/*
However, this means that the callers to the access point need to add that prefix too, which is adds an extra step for all inputs pushing data to the access point, instead of using access point like it were a bucket directly.
I'm not sure which way to go, if I'm complicating things or if there is a simpler solution, hence my question.
Any advice would be greatly appreciated!