When creating a data set within ADX via S3 access certain directories do not load automatically and I have to force a reload to see them.

0

Hi Everyone,

When I create a data set with the data set type "Amazon S3 data access", then go to configure the data access I'm noticing an issue in selecting the S3 directory that I intend to use.

The top level directories and the directories nested within those load automatically for me, but upon browsing in a certain 3rd level directory, the subdirectories are not automatically displayed. I can force them to display by clicking on the page number. Directories below this level work after I reload them as well.

Button in question

I do not believe this is a permissions issue as I can load the directory after the forced refresh. So far I have not found any issues on other directories, just the one in particular.

I'm not sure where to start on an issue like this that seems to be on the console side, so feel free to let me know if any other information would be helpful. Any suggestions for things to check on this would be much appreciated!

asked 2 months ago40 views
1 Answer
1
Accepted Answer

This behavior you're experiencing is likely related to how Amazon S3 handles object listings and how AWS Data Exchange (ADX) interacts with S3 when browsing directories.

S3 doesn't have a true hierarchical folder structure. Instead, it uses a flat structure where the folder-like appearance is created by using key name prefixes with the '/' delimiter. When you're browsing through the S3 structure in ADX, it's essentially performing list operations on these prefixes.

For performance reasons, S3 list operations typically return a limited number of results per request. If a directory contains a large number of objects or subdirectories, not all of them may be returned in the initial request. This could explain why you're not seeing all subdirectories automatically loaded at the 3rd level.

When you click on the page number or force a reload, you're likely triggering additional list requests to S3, which then return the remaining subdirectories.

To address this issue, you could try the following:

  1. Ensure that your S3 bucket structure is optimized for performance. Having a very large number of objects in a single prefix can slow down list operations.

  2. If possible, reorganize your data structure to reduce the depth of nesting or the number of objects at each level.

  3. If reorganization isn't feasible, you may need to account for this behavior in your workflow, understanding that manual reloads might be necessary for certain directories.

  4. Check if there are any S3 bucket policies or permissions that might be affecting how ADX can list objects in certain prefixes.

If the issue persists or significantly impacts your work, it would be worth contacting AWS support for further investigation, as there could be specific console-side optimizations that could be made to improve this experience.
Sources
Why is the folder/prefix getting deleted when deleting the objects inside using the s3 lifecycle policy? | AWS re:Post
Hierarchical structure of S3 KB data source | AWS re:Post

profile picture
answered 2 months ago
profile picture
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions