Cant read DF from DocumentDB connector: Partitioner calling collStats command failed through spark-mongo connector

0

I am trying to connect to my documentDB trhough the spark-mongodb connector, but it looks like DocumentDB does not support Collstats. How disable the collstats command so i can do my transformations with Spark.

dataFrame = spark.read.format("mongodb").option("spark.mongodb.database","testdb").option("spark.mongodb.collection", "collection1").load()

dataFrame.show()

It gives the following error:

Py4JJavaError: An error occurred while calling o83.showString. : com.mongodb.spark.sql.connector.exceptions.MongoSparkException: Partitioning failed. Partitioner calling collStats command failed

But dataFrame.printSchema() gives the result with the schema, i already find out that the collStats is not supported on DocDB, but how can i turn this function off with the mongo-connector for spark

질문됨 5달 전382회 조회
1개 답변
3

Hello,

Yes, Collstats is different in AWS DocumentDB which is unsupported as per the doc. If you are using default values for partitioning, perhaps it is using the partitioner helper because no values are being passed. If you included specific partitioning values, perhaps it might skip calling the partitioner helper and won’t call upon $collStats. (as per this logic). Recommend to test this at your end as I havent tested before.

Alternatively, you might use a different partitioner, such as the HashPartitioner. or disable partitioning by setting the spark.mongodb.input.partitioner configuration property to none.

AWS
지원 엔지니어
답변함 5달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠