Installing s3fs causes errors in JupyterLab on SageMaker
On a freshly created JupyterLab Space instance (image: SageMaker Distribution 1.6) on SageMaker Studio in AWS, I open the terminal and run pip install s3fs
, and it returns some errors during install:
sagemaker-user@default:~$ pip install s3fs
Collecting s3fs
Downloading s3fs-2024.3.1-py3-none-any.whl.metadata (1.6 kB)
Requirement already satisfied: aiobotocore<3.0.0,>=2.5.4 in /opt/conda/lib/python3.10/site-packages (from s3fs) (2.12.1)
Collecting fsspec==2024.3.1 (from s3fs)
Downloading fsspec-2024.3.1-py3-none-any.whl.metadata (6.8 kB)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /opt/conda/lib/python3.10/site-packages (from s3fs) (3.9.3)
Requirement already satisfied: botocore<1.34.52,>=1.34.41 in /opt/conda/lib/python3.10/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs) (1.34.51)
Requirement already satisfied: wrapt<2.0.0,>=1.10.10 in /opt/conda/lib/python3.10/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs) (1.16.0)
Requirement already satisfied: aioitertools<1.0.0,>=0.5.1 in /opt/conda/lib/python3.10/site-packages (from aiobotocore<3.0.0,>=2.5.4->s3fs) (0.11.0)
Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (1.3.1)
Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (23.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (1.4.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (6.0.5)
Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (1.9.4)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (4.0.3)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /opt/conda/lib/python3.10/site-packages (from botocore<1.34.52,>=1.34.41->aiobotocore<3.0.0,>=2.5.4->s3fs) (1.0.1)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /opt/conda/lib/python3.10/site-packages (from botocore<1.34.52,>=1.34.41->aiobotocore<3.0.0,>=2.5.4->s3fs) (2.9.0)
Requirement already satisfied: urllib3<2.1,>=1.25.4 in /opt/conda/lib/python3.10/site-packages (from botocore<1.34.52,>=1.34.41->aiobotocore<3.0.0,>=2.5.4->s3fs) (1.26.18)
Requirement already satisfied: idna>=2.0 in /opt/conda/lib/python3.10/site-packages (from yarl<2.0,>=1.0->aiohttp!=4.0.0a0,!=4.0.0a1->s3fs) (3.6)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.34.52,>=1.34.41->aiobotocore<3.0.0,>=2.5.4->s3fs) (1.16.0)
Downloading s3fs-2024.3.1-py3-none-any.whl (29 kB)
Downloading fsspec-2024.3.1-py3-none-any.whl (171 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 172.0/172.0 kB 18.4 MB/s eta 0:00:00
Installing collected packages: fsspec, s3fs
Attempting uninstall: fsspec
Found existing installation: fsspec 2023.6.0
Uninstalling fsspec-2023.6.0:
Successfully uninstalled fsspec-2023.6.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
jupyter-ai 2.11.0 requires faiss-cpu, which is not installed.
datasets 2.18.0 requires fsspec[http]<=2024.2.0,>=2023.1.0, but you have fsspec 2024.3.1 which is incompatible.
jupyter-scheduler 2.5.1 requires fsspec==2023.6.0, but you have fsspec 2024.3.1 which is incompatible.
Successfully installed fsspec-2023.6.0 s3fs-2024.3.1
After installation, it is no longer possible to import transformers.Trainer
or datasets
:
TypeError Traceback (most recent call last)
File [/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1099](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1098), in _LazyModule._get_module(self, module_name)
1098 try:
-> 1099 return importlib.import_module("." + module_name, self.__name__)
1100 except Exception as e:
File [/opt/conda/lib/python3.10/importlib/__init__.py:126](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/importlib/__init__.py#line=125), in import_module(name, package)
125 level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)
File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)
File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)
File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)
File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)
File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)
File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)
File [/opt/conda/lib/python3.10/site-packages/transformers/trainer.py:162](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/transformers/trainer.py#line=161)
161 if is_datasets_available():
--> 162 import datasets
164 if is_torch_tpu_available(check_device=False):
File [/opt/conda/lib/python3.10/site-packages/datasets/__init__.py:18](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/datasets/__init__.py#line=17)
16 __version__ = "2.18.0"
---> 18 from .arrow_dataset import Dataset
19 from .arrow_reader import ReadInstruction
File [/opt/conda/lib/python3.10/site-packages/datasets/arrow_dataset.py:66](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/datasets/arrow_dataset.py#line=65)
64 from tqdm.contrib.concurrent import thread_map
---> 66 from . import config
67 from .arrow_reader import ArrowReader
File [/opt/conda/lib/python3.10/site-packages/datasets/config.py:41](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/datasets/config.py#line=40)
40 DILL_VERSION = version.parse(importlib.metadata.version("dill"))
---> 41 FSSPEC_VERSION = version.parse(importlib.metadata.version("fsspec"))
42 PANDAS_VERSION = version.parse(importlib.metadata.version("pandas"))
File [/opt/conda/lib/python3.10/site-packages/packaging/version.py:54](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/packaging/version.py#line=53), in parse(version)
46 """Parse the given version string.
47
48 >>> parse('1.0.dev1')
(...)
52 :raises InvalidVersion: When the version string is not a valid version.
53 """
---> 54 return Version(version)
File [/opt/conda/lib/python3.10/site-packages/packaging/version.py:198](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/packaging/version.py#line=197), in Version.__init__(self, version)
197 # Validate the version and parse it into pieces
--> 198 match = self._regex.search(version)
199 if not match:
TypeError: expected string or bytes-like object
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from transformers import Trainer
File <frozen importlib._bootstrap>:1075, in _handle_fromlist(module, fromlist, import_, recursive)
File [/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1089](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1088), in _LazyModule.__getattr__(self, name)
1087 value = self._get_module(name)
1088 elif name in self._class_to_module.keys():
-> 1089 module = self._get_module(self._class_to_module[name])
1090 value = getattr(module, name)
1091 else:
File [/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py:1101](https://cxvx4off4bxsa80.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.10/site-packages/transformers/utils/import_utils.py#line=1100), in _LazyModule._get_module(self, module_name)
1099 return importlib.import_module("." + module_name, self.__name__)
1100 except Exception as e:
-> 1101 raise RuntimeError(
1102 f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
1103 f" traceback):\n{e}"
1104 ) from e
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
expected string or bytes-like object
Because of this, I can't use datasets.load_from_disk
or datasets.save_to_disk
to an S3 bucket. I'm trying to create a Huggingface Estimator training job from JupyterLab by first preprocessing the data, saving the data to S3, then calling huggingface_estimator.fit(), similar to this tutorial. Without s3fs
, datasets.save_to_disk
says it requires it, but after installing s3fs
, I can no longer import datasets
.
- Neueste
- Die meisten Stimmen
- Die meisten Kommentare
Sagemaker offers a number of input modes for accessing training data on S3.
File mode presents a file system view of the dataset to the training container, which is similar to functionality of s3fs
Edit: if you are using SageMaker Estimator, it supports reading and writing to S3 buckets Refer to examples at Create and Run a Training Job
Relevanter Inhalt
- AWS OFFICIALAktualisiert vor 2 Jahren
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor einem Jahr
- AWS OFFICIALAktualisiert vor 7 Monaten
Thank you, I will try that. Should I make a bug report somewhere about s3fs though?
s3fs is not a AWS product. You can check the issues log at their Github at https://github.com/fsspec/s3fs/issues
Oh ok, I actually already did create an issue there before this post, I didn't know if it'd be better on AWS.
Also, with the input modes page you provided, it seems to be about the training script that is called by the Estimator and how it accesses the files, which is helpful. But I also need to save the preprocessed dataset to S3 directly from the notebook that creates the job, before it runs the Estimator, and I don't see how to turn on "File mode" independently of an Estimator.
sagemaker.estimator.Estimator supports S3 buckets. I have updated my post
Sorry, I don't think I was clear. I need to be able to pull data from S3 even without an Estimator or training job. Just grab raw data into a notebook for processing.