Occasionally getting "MongoServerSelectionError: Server selection timed out..." errorslg...
Hi,
We have a lambda application that uses DocumentDB as the database layer. The lambdas are set in the same VPC as the DocumentDB cluster, and we're able to connect, and do all query (CRUD) operations as normal. The cluster is a simple cluster with 1 db.t4g.medium instance.
One of the lambdas is triggered by an SNS queue and gets executed ~1M times over a 24h period. There is a database query involved in each one, and the vast majority of these executions go fine.
The MongoClient is created outside of the handler in a separate file as detailed here: https://www.mongodb.com/docs/atlas/manage-connections-aws-lambda/ so that "warm" lambda executions will re-use the same connection. Our lambdas are executed as async handlers, not using a callback. The MongoClient itself is created in its own file as so:
```
const uri = `mongodb://${process.env.DB_USER}:${process.env.DB_PASSWORD}@${process.env.DB_ENDPOINT}:${process.env.DB_PORT}/?tls=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false`;
const client = new MongoClient(uri, {
tlsCAFile: 'certs/rds-combined-ca-bundle.pem'
});
export const mongoClient = client.connect()
```
A sample handler would be something like this (TypeScript):
````
import { mongoClient } from "./mongo.client";
const DB_NAME = 'MyDB';
export const snsHandler = async (event: SNSEvent): Promise<void> => {
const notif = JSON.parse(event.Records[0].Sns.Message);
const item = await mongoClient
.then(client => client.db(DB_NAME).collection(notif.collection).findOne({ _id: notif.id }))
.catch(err => {
console.error(`Couldn't find item with id ${notif.id} from collection ${notif.collection}`, err)
return null;
})
// do something with item
}
````
---
---
Every so often (~100 times a day), we get specific errors along the lines of:
```
MongoServerSelectionError: Server selection timed out after 30000 ms
at Timeout._onTimeout (/var/task/src/settlement.js:5:157446)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7)
```
or
```
[MongoClient] Error when connecting to mongo xg [MongoServerSelectionError]: Server selection timed out after 30000 ms
at Timeout._onTimeout (/var/task/src/settlement.js:5:157446)
at listOnTimeout (internal/timers.js:557:17)
at processTimers (internal/timers.js:500:7) {
reason: To {
type: 'ReplicaSetNoPrimary',
servers: Map(1) {
'[REDACTED].docdb.amazonaws.com:[REDACTED]' => [ry]
},
stale: false,
compatible: true,
heartbeatFrequencyMS: 10000,
localThresholdMS: 15,
setName: 'rs0',
logicalSessionTimeoutMinutes: undefined
},
code: undefined,
[Symbol(errorLabels)]: Set(0) {}
}
```
or
```
Lambda exited with error: exit status 128 Runtime.ExitError
```
---
---
In the Monitoring tab of the DocumentDB instance, the CPU doesn't go higher than 10%, and the database connections peak at ~170 (the connection limit on the tg4.medium is 500, unless I'm mistaken), with an average of around 30-40. For the lambda itself, the max concurrent executions peak at ~100. The errors aren't correlated to the peaks - they can happen at any time of the day, throughout the day.
Can anyone provide any insight as to why the connection might be timing out from time to time, please? The default parameters of the MongoClient should keep the connection alive as long as the lambda is still active, and we don't seem to be close enough to the max connection limit.
I'm assuming the way we have it set it is wrong, but I'm not sure how to go about fixing it
Thankslg...