I'm trying to use dask in sagemaker because I have over 1B+ rows in a single dataset. Creating a dask.dataframe works fine, when I create a client through dask, it also works:

client = Client(n_workers=6, threads_per_worker=20)

However when I try to use dask_ml.preprocessing.Categorizer, I get the error NoCredentialsError: Unable to locate credentials. I understand the issue might be dask distributed client doesn't have authorization to sagemaker client. Its confusing but how do I have them connect somehow? Code below:

import dask.dataframe as dd
df3 = dd.read_parquet('s3://bucket/parquetfiles2/data_*.parquet',
                     storage_options={'token': 'anon'})
if __name__ == "__main__":
    obj_df = df3.select_dtypes(include=['object','datetime64'])
    num_df = df3.select_dtypes(exclude=['object','datetime64'])
    ce = Categorizer()
    obj_df_1 = ce.fit_transform(obj_df)
Thank you for contacting us and for using Amazon Sagemaker.

I understand that you encountered a "NoCredentialsError: Unable to locate credentials" when trying to use dask_ml.preprocessing.Categorizer.

It looks like you're running the code locally on your machine. Please feel free to correct me if I have misunderstood anything here. The error is usually seen when you don’t have AWS credentials correctly configured on your machine.

However, I see you're using Dask Distributed client to use Amazon SageMaker Processing. Please have a look here : where we have built Dask enabled containers for SageMaker Processing.

You might need to run aws configure so set up IAM Credentials (User) on your machine. 1 However, to be able to replicate and investigate into this further, we'd need your IAM role arn and other details. Hence, for further investigation on this issue, I recommend to cut a support case and provide more detail about your account information and script/config. Due to security reason, we cannot discuss account specific issue in the public posts.

Please open a support case with AWS using the link:

Thank you

answered a month ago

