1回答
- 新しい順
- 投票が多い順
- コメントが多い順
0
The GC overhead limit exceeded error indicates that the JVM spent a lot of time on garbage collection but recovered very little memory, so it throws this error to let you know that your program is not making much progress but wasting time on doing useless garbage collection task. Iterating through the dataframe might be the problem, because you might be creating a lot of temporary objects when you go through each line, and they couldn't be garbage collected. What is the framework that you are using? And what are you trying to do by going through the dataframe row-by-row? Maybe you can think about processing multiple lines in a batch? For example using some vectorization or matrix operation as georgios_s suggested in the comment.
回答済み 2年前
関連するコンテンツ
- AWS公式更新しました 1年前
What is the size of your dataset? The instance you chose has 8GB of RAM.
Additionally, based on your error you seem to be running Spark, am I right to assume you are running Spark "locally" on that notebook? Please be mindful that Spark allocates memory between Reserved Memory, User Memory and Spark Memory so not all 8GB are available to handle your data at any given time.
For advice on how to avoid iterating through one line at a time, it's hard to advise without knowing what you want to achieve, but in general the first thing to look at is to vectorize your operations (https://www.geeksforgeeks.org/vectorization-in-python/)