Neptune takes forever to create 5000 nodes with opencypher


with data an array of 5000 elements, the following script will take FOREVER to be done, but with neo4j, I can run it within 30 seconds, what is wrong with the AWS Neptune Opencypher interface??

            UNWIND $data AS param
            MERGE (n:{label} {{id:}})
            ON CREATE SET n = param
            ON MATCH SET n += param

The python code:

        def merge_nodes(tx):
            query = f"""
            UNWIND $data AS param
            MERGE (n:{label} {{id:}})
            ON CREATE SET n = param
            ON MATCH SET n += param
            # print(query)
  , data=data)
        with self.driver.session() as session:
            return session.write_transaction(merge_nodes)
asked 7 months ago303 views
1 Answer
Accepted Answer

Any mutation query in Neptune will get executed single-threadedly. If looking for a more performant method for writing data to Neptune in batches, I would suggest using Neptune's bulk loader [1], or issuing concurrent/parallel write requests in smaller batches. In our testing, multiple parallel write requests in batches of 100 to 200 objects per request and with parallelization that matches the execution thread count on Neptune's writer instance (2x the number of vCPUs) would get you the most optimal write-throughput.

Both the bulk loader and the use of parallel requests will scale linearly with the size of the writer instance (more vCPUs means more query execution threads). So if you need to perform batch loads temporarily, you should scale up the writer instance for the loads and then you can scale it back down afterwards for steady state write workloads.


profile pictureAWS
answered 7 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions