Neptune takes forever to create 5000 nodes with opencypher

0

with data an array of 5000 elements, the following script will take FOREVER to be done, but with neo4j, I can run it within 30 seconds, what is wrong with the AWS Neptune Opencypher interface??

            UNWIND $data AS param
            MERGE (n:{label} {{id: param.id}})
            ON CREATE SET n = param
            ON MATCH SET n += param

The python code:

        def merge_nodes(tx):
            query = f"""
            UNWIND $data AS param
            MERGE (n:{label} {{id: param.id}})
            ON CREATE SET n = param
            ON MATCH SET n += param
            """
            # print(query)
            tx.run(query, data=data)
        with self.driver.session() as session:
            return session.write_transaction(merge_nodes)
已提問 6 個月前檢視次數 285 次
1 個回答
1
已接受的答案

Any mutation query in Neptune will get executed single-threadedly. If looking for a more performant method for writing data to Neptune in batches, I would suggest using Neptune's bulk loader [1], or issuing concurrent/parallel write requests in smaller batches. In our testing, multiple parallel write requests in batches of 100 to 200 objects per request and with parallelization that matches the execution thread count on Neptune's writer instance (2x the number of vCPUs) would get you the most optimal write-throughput.

Both the bulk loader and the use of parallel requests will scale linearly with the size of the writer instance (more vCPUs means more query execution threads). So if you need to perform batch loads temporarily, you should scale up the writer instance for the loads and then you can scale it back down afterwards for steady state write workloads.

[1] https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html

profile pictureAWS
已回答 6 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南