- Newest
- Most votes
- Most comments
Hello,
As per the stacktrace, I believe it encountered issue while trying to read one of the documents of the collection. I checked through various external sources to get to know the possible cause of this error and through this link, it can be understood that the BsonInvalidOperationException would occur in either of the two scenarios:
- if the document does not contain the expected key
- the value is not of the expected type
Now as per the stack trace, seems like one of the documents of the particular collection does not contain the key 'avgObjSize'.
As you are saying that the issue is transient, it might be that whenever this particular document was encountered, it fails with the error. It might as well be an unexpected behavior from the side of the connector. You could try using the latest JDBC connector for your job and see if the issue still repeats.
Hi, I'm the original question asker but for some reason (skill issue) can't sign into my re:Post account. Thanks for the replies.
@Gonzales Herreros - Yes I think you are correct based on some experimenting I did yesterday. It seems to happen when the collection is empty (someone else on my team was emptying it in the testing environment and I didn't catch it). And yes I think it's a bug in the connector - it happens in this set up (using a glue connection object with mongodb atlas), but doesn't happen with for example using a URI string + username + password (which is what is being used in my local/testing pipeline setup, and why it wasn't caught there). In general, both from my experience and from what I've read from others, the glue connector seems to have a few issues with mongo (for example it can't do pushdown predicates), although I'm not sure if they are issues with the underlying pyspark tools or the glue implementations. Either way, I would probably suggest not using glue if you're getting data from mongodb until these things are resolved.
@Chaitu avgObjSize is some meta data for mongo collections - https://www.mongodb.com/docs/manual/reference/command/dbStats/ It isn't a key that we intentionally added onto the documents (which I also thought this meant originally which confused me!). Yes I think you're absolutely correct, using another connector is a good idea, there seem to be a few issues with the glue connector and mongodb.
Relevant content
- Accepted Answerasked 2 years ago
- asked 8 months ago
- Accepted Answerasked 2 years ago
- AWS OFFICIALUpdated 3 years ago
- AWS OFFICIALUpdated 2 months ago
- AWS OFFICIALUpdated 2 years ago
to me it sounds like a bug in the Spark connector, is it possible that collection is empty and doesn't handle it correctly?