Vertex Filtering Significant Performance Issues

0

Hello,

let me share a very simple query which I am getting poor performance on. I'll share the profile and hopefully someone can explain why it is taking over 10 seconds. Additional filters after the .HasLabel() are significantly hurting performance even further (2 to 3 times worse).

"gremlin":"g.V().has('__graph','123').hasLabel('MyType').count()"


            Neptune Gremlin Profile  

Query String

g.V().has('__graph','123').has(label,'MyType').count()

Original Traversal

[GraphStep(vertex,[]), HasStep([__graph.eq(123), ~label.eq(MyType)]), CountGlobalStep]

Optimized Traversal

Neptune steps:
[
NeptuneCountGlobalStep {
JoinGroupNode {
PatternNode[(?1, <label>, ?2=<MyType>, <>) . project ?1 .], {estimatedCardinality=1273722, expectedTotalOutput=1273722, indexTime=0, joinTime=1131, numSearches=1, actualTotalOutput=830889}
PatternNode[(?1, <__graph>, "123", ?) . project ask .], {estimatedCardinality=2854796, indexTime=887, joinTime=8371, numSearches=830889}
}, annotations={path=[Vertex(?1):GraphStep], joinStats=true, optimizationTime=0, maxVarId=3, executionTime=10614}
}
]

Physical Pipeline

NeptuneCountGlobalStep
|-- StartOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, <label>, ?2=<MyType>, <>) . project ?1 .], {estimatedCardinality=1273722, expectedTotalOutput=1273722})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, <__graph>, "123", ?) . project ask .], {estimatedCardinality=2854796})

Runtime (ms)

Query Execution: 10614.135

Traversal Metrics

Step Count Traversers Time (ms) % Dur

NeptuneCountGlobalStep 1 1 10614.027 100.00
>TOTAL - - 10614.027 -

Predicates

of predicates: 189

Results

Count: 1
Output: 414752

Index Operations

Query execution:
# of statement index ops: 830890
# of unique statement index ops: 830890
Duplication ratio: 1.0
# of terms materialized: 0

Edited by: austinmalpede on Jul 7, 2020 2:02 PM

Edited by: austinmalpede on Jul 7, 2020 2:03 PM

Edited by: austinmalpede on Jul 8, 2020 7:55 AM

asked 4 years ago367 views
5 Answers
0

Looking at the profile, here’s the relevant part:

PatternNode[(?1, <~label>, ?2=<MyType>, <~>) . project ?1 .], {estimatedCardinality=1273722, expectedTotalOutput=1273722, indexTime=0, joinTime=1131, numSearches=1, actualTotalOutput=830889}

PatternNode[(?1, <__graph>, "123", ?) . project ask .], {estimatedCardinality=2854796, indexTime=887, joinTime=8371, numSearches=830889}

Neptune first evaluates the scan for all patterns of "MyType", which returns 830k nodes. This scan takes about 1.1s. The resulting 830k node IDs are then probed against the index in the second pattern, and all of them happen to satisfy the filter "__graph" == "123". Generally, such "probing" lookups are more expensive (in this case, ~.8.3s) than the first, continuous index scan.

A couple more thoughts:

  • Adding more filters may or may not help. If you add a selective filter (using your version of the query), you should see Neptune starting out with the most selective filter(s), so the overall amount of processed solutions might become much lower, resulting better execution time; if, on the other hand, none of your additional filters are selective, then each filter would induce additional overhead. So when you’re saying that you added additional FILTERs and those were decreasing your execution time, I would be curious to understand whether you’re observing performance problems also in case of adding more selective filters?
  • As a general callout, Neptune evaluates the Gremlin query using a single thread. So while you’re seeing high latency in answering, your server (depending on the instance type) would be underutilized when running a single query alone. The type of query that you’re running looks more like an analytical query; if you’d evaluate a set of such analytical queries together, the observed overall server throughput/utilization could become much better.

Edited by: awsmichaels on Jul 8, 2020 4:21 PM (provided example was not working / semantically wrong)

AWS
answered 4 years ago
0

One thing you could try is using a query optimizer hint that enforces a hash join:

g.withSideEffect('Neptune#hashJoin', '__graph').V().has('__graph','123').has(label,'MyType').count()

When using this query, the second (i.e., '__graph' == '123') filter should be evaluated as a hash join. This means that Neptune would not "substitute in" solutions from the first pattern (and performing 880k lookups) but instead evaluate the pattern for the graph filter stand-alone, and then perform a hash join.

This might be more efficient as long as the triples with '__graph' == '123' is of comparable size to the triples with type 'MyType'. The reason why, in your case, Neptune is not automatically switching to this strategy is that the estimated sizes in your case are quite large, which is why the optimizer choses to opt for a more memory efficient plan (trade-off between performance and memory).

AWS
answered 4 years ago
0

Hi Michael,

Thanks for the response. I've tried your suggestion to use the hashJoin sideEffect. Unfortunately the results were about the same. The profile can be found at the bottom of this post.

Our customers are going to want to get a "totalCount" when querying for their data. The results of this query aren't even half the results our customers are wanting to store in the future. I've been under the impression that filtering vertices by properties would have consistent performance as the DB grows. If we cannot meet their requirements we may need to look at alternative DB solutions.

To put things into perspective, our __graph property is how we are partitioning our data. Each one of these __graph partitions could have around 4 million vertices. The "myType" vertex label is very common across the entire database, taking up almost 1/4 of an entire __graph (e.g. 1 million hits). We're planning to have hundreds of these __graphs in our DB.

I'm looking for direction from you and your team on how we can prevent simple filtering operations from taking 10+ seconds and degrading as our data grows. Keeping it under 1 second is our goal. I'm hoping you can help us meet our requirements


            Neptune Gremlin Profile  

Query String

g.withSideEffect('Neptune#hashJoin', '__graph').V().has('__graph','123').hasLabel('MyType').count()

Original Traversal

[GraphStep(vertex,[]), HasStep([__graph.eq(123), ~label.eq(MyType)]), CountGlobalStep]

Optimized Traversal

Neptune steps:
[
NeptuneCountGlobalStep {
JoinGroupNode {
PatternNode[(?1, <label>, ?2=<MyType>, <>) . project ?1 .], {estimatedCardinality=1274196, expectedTotalOutput=1274196, indexTime=0, joinTime=1239, numSearches=1, actualTotalOutput=830890}
PatternNode[(?1, <__graph>, "123", ?) . project ask .], {estimatedCardinality=2625528, indexTime=837, joinTime=9734, numSearches=830890}
}, annotations={path=[Vertex(?1):GraphStep], joinStats=true, optimizationTime=1, maxVarId=3, executionTime=12023}
}
]

Physical Pipeline

NeptuneCountGlobalStep
|-- StartOp
|-- JoinGroupOp
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, <label>, ?2=<MyType>, <>) . project ?1 .], {estimatedCardinality=1274196, expectedTotalOutput=1274196})
|-- SpoolerOp(1000)
|-- DynamicJoinOp(PatternNode[(?1, <__graph>, "123", ?) . project ask .], {estimatedCardinality=2625528})

Runtime (ms)

Query Execution: 12024.553

Traversal Metrics

Step Count Traversers Time (ms) % Dur

NeptuneCountGlobalStep 1 1 12024.375 100.00
>TOTAL - - 12024.375 -

Predicates

of predicates: 189

Results

Count: 1
Output: [414752]

Index Operations

Query execution:
# of statement index ops: 830891
# of unique statement index ops: 830891
Duplication ratio: 1.0
# of terms materialized: 0

answered 4 years ago
0

I've followed the following principle, which Kelvin taught me awhile ago:

"Take a sip from Neptune before drinking"

My interpretation of this phrase was that when fetching data, do so in two queries:

Query #1: perform whatever filters or traversals to find the vertex id's you need.

Query #2: retrieve the needed data using the vertex Id's from query #1 as your starting points.

The prior of the two is giving us the most trouble. It's impossible to see how many results, or even paginate the results are when the result set is large. The joins are very expensive and are causing CPU to spike over 50%.

The IO of Neptune is performing well. the fetching vertex ids / count is faltering when many results are found.

Perhaps there are more techniques we could use when "sipping"? I look forward to hearing from your team.

Edited by: austinmalpede on Jul 10, 2020 1:32 PM

Edited by: austinmalpede on Jul 10, 2020 1:35 PM

answered 4 years ago
0

I was able to meet with the Neptune team yesterday. They were able to answer all of our questions. To summarize the discussion:

  • Gremlin is fast at filtering however it's inherently limited due being single threaded. This can pose issues when the result set is extraordinarily large. The example provided in this thread returned more than 400k results. SPARQL is inherently multi-threaded so some performance can be gained by using this language.

  • Combining commonly used property filters into a single property (e.g. multi-key) can boost performance when filtering for large amounts of results

  • Neptune is working on some upcoming features which can help optimize the problem discussed in this forum post

Hopefully this is helpful, @NeptuneTeam correct me if I mis-spoke on anything.

Thanks,
Austin

answered 4 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions