- Newest
- Most votes
- Most comments
I've used Processing Job ManifestFile inputs successfully in the past with multiple relative folders (for e.g. the "Extract clean input images" section of this notebook - sorry for citing a large/sprawling sample, there are probably simpler ones out there).
Not sure exactly what could be going wrong here so I'll try to describe using the feature as I think of it and hope that helps:
Given an S3 bucket containing:
s3://customer_bucket/some/prefix/relative/path1/custdata-1
s3://customer_bucket/some/prefix/relative/path2/custdata-2
...and a manifest file like:
[ { "prefix": "s3://customer_bucket/some/prefix/" }, "relative/path1/custdata-1", "relative/path2/custdata-2" ]
...for a processing input something like the below (or equivalent if you're using boto3/etc instead of the SageMaker Python SDK):
ProcessingInput( destination="/opt/ml/processing/input/mycoolinput", input_name="mycoolinput", s3_data_type="ManifestFile", source="s3://path-to-your-manifest-file", )
...I'd expect your processing job to see files:
/opt/ml/processing/input/mycoolinput/relative/path1/custdata-1
/opt/ml/processing/input/mycoolinput/relative/path2/custdata-2
So in this sense it is possible to have files under the same prefix with different subfolders. In the above mentioned sample, the raw_s3uri
prefix contains credit card agreement PDFs categorized into folders by bank/provider - e.g. {raw_s3uri}/Bank1/Card1.pdf
, {raw_s3uri}/CreditUnion2/Disclosures.pdf
, etc.
To my knowledge it's not possible to have multiple { "prefix": "..." }
entries in your manifest, but as I understood it didn't sound like you were trying to do that.
Apart from double-checking this overall setup (and maybe using Python os.walk()
to recursively print()
out the folder contents as your Processing job sees them), the only other thing I could suggest is to check if your S3 object keys have any special characters in them that could be causing issues when mapping to a local filesystem - such as files/folders with spaces at the end, or characters that aren't usually allowed in filenames?
Relevant content
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 6 months ago
- AWS OFFICIALUpdated a year ago
- AWS OFFICIALUpdated 3 years ago