Problem Statement: I would like to use sqlite3 over EFS concurrently from Lambda, as a provisionless SQL database. I'm avoiding the term "serverless" here because Aurora Serverless exists, but you still have to provision ACUs. I'm looking for a "provisionless" SQL solution, i.e. no ACUs to provision or manage, and only pay for the compute you actually use.
There is ample anecdotal evidence throughout the Internet, that says sqlite3 does not work well when multiple sqlite3 clients concurrently share the same database file on a remote filesystem. But this is exactly what I'm trying to do. I found the mountain of anecdotal evidence unconvincing and unsatisfying, and decided to test it myself.
It took only a few minutes of testing to get the error:
Error: database is locked
It appears that some stale locks are left behind, as the database is NOT locked at the point where the error is encountered.
My question is WHY does this happen?
Here's the source code of the unix vfs module where Sqlite accesses the filesystem: https://www.sqlite.org/src/file?name=src/os_unix.c
Ctrl-F to the comment section "Posix Advisory Locking" which begins with the phrase "POSIX advisory locks are broken by design" (insert facepalm emoji here) ... and you'll see that this clearly should work, as all known edge-cases and shortcomings of NFS locking have been thoroughly analyzed and handled in the code.
Some users have reported that EFS locking works as expected: https://stackoverflow.com/questions/53177938/is-it-safe-to-use-flock-on-aws-efs-to-emulate-a-critical-section
So, what could be the actual root cause of sqlite apparently failing to properly acquire and release locks over EFS?
Is there something broken in the way the Lambda container mounts the EFS volume? My test Lambda used python zip packaging, so it relies on the Lambda platform's container behind the scenes.
Can someone from AWS internal engineering please chime in on whether the issue could be caused by the mount options used by the Lambda platform when mounting EFS volumes? See: https://stackoverflow.com/questions/43914819/file-locks-support-in-docker-volumes-of-nfs4-shares
Are you running multiple lambdas (or same lambda but it runs concurrently) that are accessing the same database file? Only one process can be making changes to a file at the same time.