OOM when generating embedding in Jupyter Lab


The notebook instance is ml.m5d.2xlarge with 32GB of memory. However, we are encountering some OOM errors in SageMaker Notebooks when generating embeddings with Tensorflow:

tf_hub_embedding_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
embeddings = tf_hub_embedding_layer(lr_cleaned_df.text.values)

When retrieving the universal sentence encoder, the virtual memory size is about 7GB. The size of lr_cleaned_df.text.values contains 60K snippet of texts and is about 122MB in memory.

Is there a default allocation of memory to a Jupyter notebook? If so, can this be overwritten?

已提問 2 年前檢視次數 396 次
1 個回答

Hi there,

I was able to reproduce this behavior on a ml.m5d.2xlarge notebook instance using similar code.

tf_hub_embedding_layer = hub.KerasLayer("https://tfhub.dev/google/universal-sentence-encoder/4",
embeddings = tf_hub_embedding_layer(train_examples)

In my case, I was able to run it with 25K lines of text. However, when I ran it with 50K lines of text (train_examples.repeat(2)), I also experienced OOM errors. Running free -h in terminal also showed that the notebook instance did in fact run out of free memory while running the code above, and hence the OOM errors.

              total        used        free      shared  buff/cache   available
Mem:            30G         22G        900M        676K        7.2G        7.6G
Swap:            0B          0B          0B

In order to run code similar to this, please consider choosing a bigger instance size with more memory.

已回答 2 年前
  • Hi @Peter_X, I ended up running the experiment on a ml.md5.4xlarge instance and was successful. Having said that, it does not answer the question whether the allocation of memory to a Jupyter Notebook (or kernel) can be configured.

您尚未登入。 登入 去張貼答案。

