By using AWS re:Post, you agree to the Terms of Use
/GC overhead limit exceeded/

GC overhead limit exceeded

0

I have a modest size dataset, and I am running Jupyter Notebook in Sagemaker (instance type ml.c5.xlarge with 200G instance size). I receive the error message " GC overhead limit exceeded" Everything ran fine with small data size. BTW, I need to go through the dataframe one row at a time using df.collect(), which seems t be an expensive operation... Would you suggest another way of accomplishing this? I would appreciate your kind help.

  • What is the size of your dataset? The instance you chose has 8GB of RAM.

    Additionally, based on your error you seem to be running Spark, am I right to assume you are running Spark "locally" on that notebook? Please be mindful that Spark allocates memory between Reserved Memory, User Memory and Spark Memory so not all 8GB are available to handle your data at any given time.

    For advice on how to avoid iterating through one line at a time, it's hard to advise without knowing what you want to achieve, but in general the first thing to look at is to vectorize your operations (https://www.geeksforgeeks.org/vectorization-in-python/)

1 Answers
0

The GC overhead limit exceeded error indicates that the JVM spent a lot of time on garbage collection but recovered very little memory, so it throws this error to let you know that your program is not making much progress but wasting time on doing useless garbage collection task. Iterating through the dataframe might be the problem, because you might be creating a lot of temporary objects when you go through each line, and they couldn't be garbage collected. What is the framework that you are using? And what are you trying to do by going through the dataframe row-by-row? Maybe you can think about processing multiple lines in a batch? For example using some vectorization or matrix operation as georgios_s suggested in the comment.

answered 23 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions