By using AWS re:Post, you agree to the Terms of Use
/Amazon DocumentDB/

Questions tagged with Amazon DocumentDB

Sort by most recent
  • 1
  • 90 / page

Browse through the questions and answers listed below or filter and sort to narrow down your results.

Advice for best database/datastorage for historical data

Hi, I´m doing some reasearch to find the best place to centralize lots of data logs generated by my application considering pricing ,performance and scalabilty. Today all my application data including logs are stored on an Oracle database, but I´m thinking to move all the LOG related data outside it to reduce it´s size and not to worry about storage performance etc... Just put everything on a "infinite" storage apart from my actual database using CDC or a regular batch process **Below are some needs:** - Only inserts are necessary (no updates or deletes) - Customers will need access to this historic data - Well defined pattern of access (one or two indexes at maximum) - Latencies of few seconds is ok - Avoid infrastrucure, DBA, perfomance bottleneck log term... - Infinite Retentiton period (means I don´t want to worry about performance issues, storage size in long term. But something that can handle a few terabytes of data ) **Use case example: ** Historical Sales order by items ( id_item | id_customer | qty_sold | date_inserted ... ), aprox 50 millions records per day Where I would need to see the historical data by item, and by customer for example (two dimensions) I´ve done some research with the options below **S3 + Athena **-> Put everthing on s3, no worries about infrastructure perfomance issues, however as I need query by item and customer, probably it´would be necessary to break files by item or customer , generate millions of partitions to avoid high costs searching on every file etc.. **Postgre** -> Not sure if could be performance bottleneck once tables gets too big even with partition strategies **DynamoDB **-> Not sure if it´s a good alternative to historical data regarding pricing once seconds latency is ok **MongoDB/ DocumentDB **-> Not very familiar with it (I´d prefer SQL language type) but I know it´s has a good scalability **Cassandra**-> dont´know very much **Timeseries db as influxDB, timestream etc..**-> dont´know very much, but it seems appropriate for timeseries What option would you choose ? Sorry in advance if I saying something wrong or impossible :) Thank you!
1
answers
0
votes
5
views
asked 2 months ago

mongodb-org-4.0.repo : No such file or directory al instalar el mongo shell en mi AWS Cloud9

I try to connect to my cluster DocumentDB on AWS from AWS C9 with [this tutorial][1]. But every time I try to connect I get connection failed after 6 attempts: (scr_env) me:~/environment/sephora $ mongo --ssl --host xxxxxxxxxxxxx:xxxxx --sslCAFile rds-combined-ca-bundle.pem --username username --password mypassword MongoDB shell version v3.6.3 connecting to: mongodb://xxxxxxxxxxxxx:xxxxx/ 2022-03-22T23:12:38.725+0000 W NETWORK [thread1] Failed to connect to xxx.xx.xx.xxx:xxxxx after 5000ms milliseconds, giving up. 2022-03-22T23:12:38.726+0000 E QUERY [thread1] Error: couldn't connect to server xxxxxxxxxxxxx:xxxxx, connection attempt failed : connect@src/mongo/shell/mongo.js:251:13 @(connect):1:6 exception: connect failed Indeed it seems to be missing the VPC configuration. So I tried to do with [this documentation][2]. But I do not know how to install the mongo shell on my AWS Cloud9. Indeed, it seems that I cannot create the repository file with the `echo -e "[mongodb-org-4.0] \name=MongoDB repository baseurl=...`. returns: `mongodb-org-4.0.repo: No such file or directory`. Also, when I tried to install the mongo shell with `sudo yum install -y mongodb-org-shell` which I did not have, and which I installed, it returns `repolist 0`. [1]: https://www.youtube.com/watch?v=Ild9ay9U_vY [2]: https://stackoverflow.com/a/17793856/4764604
2
answers
0
votes
2
views
asked 2 months ago

"Connect failed" when trying to connect to the DocumentDB cluster from AWS C9

In order to insert data to a cluster with a python script, I try to connect to my cluster DocumentDB on AWS from AWS C9 with [this tutorial][1]. But every time I try to connect I get connection failed after 6 attempts: ``` (scr_env) me:~/environment/sephora $ mongo --ssl --host xxxxxxxxxxxxx:xxxxx --sslCAFile rds-combined-ca-bundle.pem --username username --password mypassword MongoDB shell version v3.6.3 connecting to: mongodb://xxxxxxxxxxxxx:xxxxx/ 2022-03-22T23:12:38.725+0000 W NETWORK [thread1] Failed to connect to xxx.xx.xx.xxx:xxxxx after 5000ms milliseconds, giving up. 2022-03-22T23:12:38.726+0000 E QUERY [thread1] Error: couldn't connect to server xxxxxxxxxxxxx:xxxxx, connection attempt failed : connect@src/mongo/shell/mongo.js:251:13 @(connect):1:6 exception: connect failed ``` I tried to remove the lock file to repair the instance as they do in [this answer][2]: ``` (scr_env) me:~/environment/sephora $ sudo rm /var/lib/mongodb/mongod.lock rm: cannot remove '/var/lib/mongodb/mongod.lock': No such file or directory ``` I know that the "fail to connect" error can be caused by a lot of reasons, but usually they are: - the service (mongo) might not be *running* on the destination server; - the service (mongo) might be *listening* on a different port number; - the service (mongo) might be *protected* by a firewall somewhere on the destination. So how can I ensure that: - mongo is *running* on the destination host? - mongo is *listening* on the defined port? - I've *allowed* to connect to the destination host on the defined port from the IP address or network range you're running the command to connect? [1]: https://www.youtube.com/watch?v=Ild9ay9U_vY [2]: https://stackoverflow.com/a/17793856/4764604
1
answers
0
votes
12
views
asked 2 months ago

DMS Ignore Duplicate key errors while migrating data between DocumentDB instances

We need to replicate data between two collections in AWS documentDB to get rid of duplicate documents. Source and Target is AWS documentDB instances version 4.0.0. I've created a unique index in target table to only allow non-duplicate values. I needed to create index before migrating the data to new target, because our data size in ~1TB and index creation on source collection is impossible. Full load fails after the following error. Task status becomes table error and no data is migrated further to that collection. ``` 2022-03-23T03:13:57 [TARGET_LOAD ]E: Execute bulk failed with errors: 'Multiple write errors: "E11000 duplicate key error collection: reward_users_v4 index: lockId", "E11000 duplicate key error collection: reward_users_v4 index: lockId"' [1020403] (mongodb_apply.c:153) 2022-03-23T03:13:57 [TARGET_LOAD ]E: Failed to handle execute bulk when maximum events per bulk '1000' was reached [1020403] (mongodb_apply.c:433) ``` ``` "ErrorBehavior": { "FailOnNoTablesCaptured": false, "ApplyErrorUpdatePolicy": "LOG_ERROR", "FailOnTransactionConsistencyBreached": false, "RecoverableErrorThrottlingMax": 1800, "DataErrorEscalationPolicy": "SUSPEND_TABLE", "ApplyErrorEscalationCount": 1000000000, "RecoverableErrorStopRetryAfterThrottlingMax": true, "RecoverableErrorThrottling": true, "ApplyErrorFailOnTruncationDdl": false, "DataTruncationErrorPolicy": "LOG_ERROR", "ApplyErrorInsertPolicy": "LOG_ERROR", "ApplyErrorEscalationPolicy": "LOG_ERROR", "RecoverableErrorCount": 1000000000, "DataErrorEscalationCount": 1000000000, "TableErrorEscalationPolicy": "SUSPEND_TABLE", "RecoverableErrorInterval": 10, "ApplyErrorDeletePolicy": "IGNORE_RECORD", "TableErrorEscalationCount": 1000000000, "FullLoadIgnoreConflicts": true, "DataErrorPolicy": "LOG_ERROR", "TableErrorPolicy": "SUSPEND_TABLE" }, ``` How can I configure AWS DMS to continue even if such duplicate key errors keep on happening. I tried modifying the TableErrorEscalation count and many other error counts but loading always stops at first duplicate key error. I have 580k Documents in test workload for this task.
1
answers
0
votes
4
views
asked 2 months ago

DocumentDB 'ReplicaSetNoPrimary' error

While using AWS Lambda with Node and Mongoose 5.x, we are experiencing **randomly** (=a group or errors every 10-15 minutes) the following error. Sometimes connection establishes just fine, but other times throws a 'replica set no primary' error. The DocDB service is in the same VPC with the Lambdas. Have tried with Mongoose 6.x as well. It performs less well. As far as I can tell this cannot be a firewall issue (since it works most of the time). Profiler / audit logs do not seem to offer any hints either. Any ideas how to troubleshoot this? ``` ReplicaSetNoPrimary MongooseServerSelectionError: Server selection timed out after 5000 ms at NativeConnection.Connection.openUri (/opt/nodejs/node_modules/mongoose/lib/connection.js:847:32) at /opt/nodejs/node_modules/mongoose/lib/index.js:351:10 at /opt/nodejs/node_modules/mongoose/lib/helpers/promiseOrCallback.js:32:5 at new Promise (<anonymous>) at promiseOrCallback (/opt/nodejs/node_modules/mongoose/lib/helpers/promiseOrCallback.js:31:10) at Mongoose._promiseOrCallback (/opt/nodejs/node_modules/mongoose/lib/index.js:1149:10) at Mongoose.connect (/opt/nodejs/node_modules/mongoose/lib/index.js:350:20) at connectToMongoDB (/var/task/app/init/db.js:68:20) at Object.<anonymous> (/var/task/app/init/db.js:109:26) at Module._compile (internal/modules/cjs/loader.js:1085:14) at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10) at Module.load (internal/modules/cjs/loader.js:950:32) at Function.Module._load (internal/modules/cjs/loader.js:790:12) at Module.require (internal/modules/cjs/loader.js:974:19) at require (internal/modules/cjs/helpers.js:93:18) at Object.<anonymous> (/var/task/app/init/init.js:7:26) at Module._compile (internal/modules/cjs/loader.js:1085:14) at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10) at Module.load (internal/modules/cjs/loader.js:950:32) at Function.Module._load (internal/modules/cjs/loader.js:790:12) at Module.require (internal/modules/cjs/loader.js:974:19) at require (internal/modules/cjs/helpers.js:93:18) at Object.<anonymous> (/var/task/app/init/index.js:1:18) at Module._compile (internal/modules/cjs/loader.js:1085:14) at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10) at Module.load (internal/modules/cjs/loader.js:950:32) at Function.Module._load (internal/modules/cjs/loader.js:790:12) at Module.require (internal/modules/cjs/loader.js:974:19) ``` Our configuration looks like this: ```js url: 'mongodb://**********.cluster-*************.********.docdb.amazonaws.com:27017/', opts: { dbName: '************', user: '***************', pass: '************', tls: true, tlsCAFile: caPemFile, useNewUrlParser: true, useUnifiedTopology: true, replicaSet: 'rs0', readPreference: 'secondaryPreferred', retryWrites: false, monitorCommands: true, maxPoolSize: 5, minPoolSize: 1, serverSelectionTimeoutMS: 5000, connectTimeoutMS: 5000, bufferCommands: false, autoCreate: false, autoIndex: false, authSource: 'admin', }, ```
2
answers
0
votes
70
views
asked 4 months ago

Athena QuickSight query timesout

I am using Athena to connect to my DocumentDB database with the intention of visualising the data in QuickSight. I can see the connection is correct as the database name and table name load properly. When I go to create an analysis or even preview the data it timesout. A refresh just causes the same result. The table only contains 40 or so documents so its not an issue of the amount of data. I have tried with direct query and SPICE. This is the error returned when attempting to reconnect from QuickSight: ``` Error Details Error type: SQL_EXCEPTION Learn more Error details: [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. GENERIC_USER_ERROR: Encountered an exception[com.amazonaws.SdkClientException] from your LambdaFunction[arn:aws:lambda:eu-west-1:717816899369:function:documentdb-connector] executed in context[retrieving meta-data] with message[Unable to execute HTTP request: Connect to s3.eu-west-1.amazonaws.com:443 [s3.eu-west-1.amazonaws.com/52.218.57.91] failed: connect timed out] [Execution ID: 56486351-0cba-4e81-9d9f-3ff4722c198f] Ingestion Id: 061d9b7e-c7f3-49a7-88ba-531551f6cc9a ``` The SQL_EXCEPTION errors states that it can be caused by a timeout so I figure thats the issue. `SQL_EXCEPTION – A general SQL error occurred. This error can be caused by query timeouts, resource constraints, unexpected data definition language (DDL) changes before or during a query, and other database errors. Check your database settings and your query, and try again.` Could there be an issue with the Lambda function being used by Athena to connect to the DocumentDB? Looking at the logs the connection seems to be working. I thought QuickSight would just be able to pull in the data now that everything is linked up. Any advice is appreciated.
0
answers
0
votes
9
views
asked 4 months ago
0
answers
0
votes
5
views
asked 4 months ago
2
answers
0
votes
39
views
asked 2 years ago

Mongoose attempting to connect to instance instead of just cluster endpoint

We have our documentdb instance in a private VPC so use a bastion with port forwarding. I have the cluster endpoint setup in my SSH config and am able to connect via mongo shell: ``` $ mongo --ssl --host localhost:27018 --sslCAFile rds-combined-ca-bundle.pem --sslAllowInvalidHostnames MongoDB shell version v3.6.3 connecting to: mongodb://localhost:27018/ 2020-07-15T16:14:11.063-0400 D NETWORK [thread1] creating new connection to:localhost:27018 2020-07-15T16:14:11.266-0400 W NETWORK [thread1] The server certificate does not match the host name. Hostname: localhost does not match SAN(s): <information redacted> 2020-07-15T16:14:11.266-0400 D NETWORK [thread1] connected to server localhost:27018 (127.0.0.1) 2020-07-15T16:14:11.296-0400 D NETWORK [thread1] connected connection! MongoDB server version: 3.6.0 rs0:PRIMARY> ``` But when I try connecting via mongoose programmatically it attempts to connect to the instance directly instead of just the cluster endpoint. With useUnifiedTopology enabled: ``` const connOpts = { replicaSet: 'rs0', readPreference: 'secondaryPreferred', loggerLevel: 'debug' ha: false, connectWithNoPrimary: true, useNewUrlParser: true, useUnifiedTopology: true } mongoose.createConnection('mongodb://localhost:27018/mydb', connOpts) MongooseServerSelectionError: connection timed out reason: TopologyDescription { type: 'ReplicaSetNoPrimary', setName: 'rs0', maxSetVersion: null, maxElectionId: null, servers: Map { 'mydocdb-inst-1.[id redacted].[region redacted].docdb.amazonaws.com:27017' => [ServerDescription] }, stale: false, compatible: true, compatibilityError: null, logicalSessionTimeoutMinutes: null, heartbeatFrequencyMS: 10000, localThresholdMS: 15, commonWireVersion: 6 } ``` With useUnifiedTopology disabled: ``` const connOpts = { replicaSet: 'rs0', readPreference: 'secondaryPreferred', loggerLevel: 'debug' ha: false, connectWithNoPrimary: true, useNewUrlParser: true, useUnifiedTopology: false } mongoose.createConnection('mongodb://localhost:27018/mydb', connOpts) At the end of the debug output: [INFO-Server:9749] 1595262374081 server mydocdb-inst-1.[id redacted].[region redacted].docdb.amazonaws.com:27017 fired event error out with message {"name":"MongoNetworkError"} { type: 'info', message: 'server mydocdb-inst-1.[id redacted].[region redacted].docdb.amazonaws.com:27017 fired event error out with message {"name":"MongoNetworkError"}', className: 'Server', pid: 9749, date: 1595262374081 } ``` Is this due to some change in later versions of mongoose or the mongodb driver that aren't backwards compatible with mongodb 3.6.x / documentdb? Anyone on a specific version of mongoose and have it working without needing to connect directly to the instances? Thanks
3
answers
0
votes
58
views
asked 2 years ago

race condition on array $push

Hello -- At work we are trying out DocumentDB and I believe I have found a race condition involving arrays and the **$push** operator. I've made a simplified example in the form of the following shell script (note that it relies on <https://www.gnu.org/software/parallel/>): ``` mongoExec () { cmd="mongo <CONNECTION CONFIG REDACTED> --quiet --eval $1" $cmd } pushElem () { mongoExec "db.arrayTest.update({id:'test_id',array:{\$not:{\$elemMatch:{name:'$1'}}}},{\$push:{array:{name:'$1'}}})" > /dev/null } export -f pushElem mongoExec mongoExec "db.createCollection('arrayTest')" > /dev/null mongoExec "db.arrayTest.insert({id:'test_id',array:[]})" > /dev/null parallel --jobs 9 pushElem ::: A A A B B B C C C result=`mongoExec "db.arrayTest.find({id:'test_id'})"` echo $result mongoExec "db.arrayTest.drop()" > /dev/null if [ `echo $result | grep -o name | wc -l | xargs` -ne 3 ] then exit 1 fi ``` The script creates a collection **arrayTest** and inserts the following document containing an empty array: ``` { "id": "test_id", "array": [] } ``` It then attempts to conditionally push an element to the array inside that document, using the following query, in parallel 9 times -- 3 times each with **A**, **B**, and **C** in place of **$1**: ``` db.arrayTest.update( {id:'test_id', array:{$not:{$elemMatch:{name:'$1'}}}}, {$push:{array:{name:'$1'}}} ) ``` At the end, it prints out the final state of the document. An example of a successful result (the order of the array entries is irrelevant): ``` { "_id" : ObjectId("5c895f5053527cc28b122451"), "id" : "test_id", "array" : [ { "name" : "B" }, { "name" : "C" }, { "name" : "A" } ] } ``` An example of an unsuccessful result: ``` { "_id" : ObjectId("5c89616a4470486421678fa7"), "id" : "test_id", "array" : [ { "name" : "A" }, { "name" : "B" }, { "name" : "B" }, { "name" : "C" } ] } ``` We've found that, in practice, we're seeing successes about half of the time (in batches of a few hundred runs). Running the same script against the Docker image **mongo:latest** has not yet ever failed. Please let me know if you need any more info. Thanks so much for looking into this!
3
answers
0
votes
0
views
asked 3 years ago
  • 1
  • 90 / page