Cant read DF from DocumentDB connector: Partitioner calling collStats command failed through spark-mongo connector

0

I am trying to connect to my documentDB trhough the spark-mongodb connector, but it looks like DocumentDB does not support Collstats. How disable the collstats command so i can do my transformations with Spark.

dataFrame = spark.read.format("mongodb").option("spark.mongodb.database","testdb").option("spark.mongodb.collection", "collection1").load()

dataFrame.show()

It gives the following error:

Py4JJavaError: An error occurred while calling o83.showString. : com.mongodb.spark.sql.connector.exceptions.MongoSparkException: Partitioning failed. Partitioner calling collStats command failed

But dataFrame.printSchema() gives the result with the schema, i already find out that the collStats is not supported on DocDB, but how can i turn this function off with the mongo-connector for spark

質問済み 5ヶ月前382ビュー
1回答
3

Hello,

Yes, Collstats is different in AWS DocumentDB which is unsupported as per the doc. If you are using default values for partitioning, perhaps it is using the partitioner helper because no values are being passed. If you included specific partitioning values, perhaps it might skip calling the partitioner helper and won’t call upon $collStats. (as per this logic). Recommend to test this at your end as I havent tested before.

Alternatively, you might use a different partitioner, such as the HashPartitioner. or disable partitioning by setting the spark.mongodb.input.partitioner configuration property to none.

AWS
サポートエンジニア
回答済み 5ヶ月前

ログインしていません。 ログイン 回答を投稿する。

優れた回答とは、質問に明確に答え、建設的なフィードバックを提供し、質問者の専門分野におけるスキルの向上を促すものです。

質問に答えるためのガイドライン

関連するコンテンツ