Cant read DF from DocumentDB connector: Partitioner calling collStats command failed through spark-mongo connector

0

I am trying to connect to my documentDB trhough the spark-mongodb connector, but it looks like DocumentDB does not support Collstats. How disable the collstats command so i can do my transformations with Spark.

dataFrame = spark.read.format("mongodb").option("spark.mongodb.database","testdb").option("spark.mongodb.collection", "collection1").load()

dataFrame.show()

It gives the following error:

Py4JJavaError: An error occurred while calling o83.showString. : com.mongodb.spark.sql.connector.exceptions.MongoSparkException: Partitioning failed. Partitioner calling collStats command failed

But dataFrame.printSchema() gives the result with the schema, i already find out that the collStats is not supported on DocDB, but how can i turn this function off with the mongo-connector for spark

posta 5 mesi fa379 visualizzazioni
1 Risposta
3

Hello,

Yes, Collstats is different in AWS DocumentDB which is unsupported as per the doc. If you are using default values for partitioning, perhaps it is using the partitioner helper because no values are being passed. If you included specific partitioning values, perhaps it might skip calling the partitioner helper and won’t call upon $collStats. (as per this logic). Recommend to test this at your end as I havent tested before.

Alternatively, you might use a different partitioner, such as the HashPartitioner. or disable partitioning by setting the spark.mongodb.input.partitioner configuration property to none.

AWS
TECNICO DI SUPPORTO
con risposta 5 mesi fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande