Skip to content

Getting Neptune Stream records using 'neptunedata' Python boto3 client

0

Here is my code:

neptuneData = boto3.client('neptunedata', endpoint_url='https://mycluster.asdf.amazon.com:8182')

response = neptuneData.get_propertygraph_stream(
        limit=100,
        iteratorType='AT_SEQUENCE_NUMBER',
        commitNum=int(startingCommitNum)
    )

The above is run via a Lambda, and I have already set up the neptune cluster/instance (and enabled streams). When I run the lambda, it times out and errors. No other messages are shown.

If the usage is incorrect, please let me know the correct way of using the neptunedata client to read neptune stream records.

Edit 1: In addition to the question above, I noticed that I've only set up a neptune cluster and a writer instance. Can I use the writer instance's endpoint to read from the stream? I read in the docs that the stream doesn't have its own nodes i.e. it runs on the provisioned instances so my intuition is that this is fine. Please confirm.

Referenced this documentation to write the code above: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/neptunedata/client/get_propertygraph_stream.html

asked a year ago362 views
3 Answers
1

Hello.

How about increasing the timeout value of your Lambda function?
https://docs.aws.amazon.com/lambda/latest/dg/configuration-timeout.html

boto3 executes AWS API, so Lambda needs to be able to access AWS service endpoints.
If Lambda is connected to a VPC, try setting up a NAT Gateway to access AWS service endpoints.
https://repost.aws/knowledge-center/internet-access-lambda-function

EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • I increased the Lambda timeout.

    The Lambda isn't in a VPC, but I'm now getting the following timeout error: "ConnectTimeoutError: Connect timeout on endpoint URL: "

  • Since the Neptune cluster is running inside a VPC, I think that unless I set up a public endpoint, I would need to connect Lambda inside the VPC. https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html

  • That makes sense. I've put the Lambda into the same VPC as the neptune cluster.

    Do you have any resources you'd recommend for setting up NAT gateway for Lambda via CDK?

1

The most likely issue here is the Lambda function not being hosted in the same VPC as the Neptune cluster. Neptune endpoints are not public. Nor can they be made public. You can only access them from within the VPC from which the cluster is hosted.

Secondarily, the security group associated with each of the Neptune instances needs to allow traffic from the Lambda function. This is typically configured by creating a new security group assigned to the Lambda function. Then, from within the security group assigned to the Neptune instances, allow traffic (on port 8182, by default) from Lambda function's security group.

The /streams API can be accessed from any instance within the cluster. Also ensure that you have the streams feature enabled (https://docs.aws.amazon.com/neptune/latest/userguide/streams-using.html#streams-using-enabling).

AWS
answered a year ago
0

Hi,

Could you try to use the Apache TinkerPop Python Gremlin client, gremlinpython, to connect to your db?

See https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-python.html for a full code sample

from __future__  import print_function  # Python 2/3 compatibility

from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

graph = Graph()

remoteConn = DriverRemoteConnection('wss://your-neptune-endpoint:8182/gremlin','g')
g = graph.traversal().withRemote(remoteConn)

print(g.V().limit(2).toList())
remoteConn.close()

Best,

Didier

EXPERT
answered a year ago
EXPERT
reviewed a year ago
  • I am trying to run the gremlin python code in my Lambda. I have all the code / imports included.

    However, am getting "errorMessage": Unable to import module 'myLambdaHandler': No module named 'gremlin_python'

    I added gremlinpython to a lambda layer for my Lambda, but still getting same error.

    Do you know if there's any additional steps I should follow to get the gremlin python example working?

    Also, looking one step ahead- does the gremlin python module have support for reading from Neptune streams?

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.