Questions tagged with Amazon DocumentDB

Content language: English

Sort by most recent

Browse through the questions and answers listed below or filter and sort to narrow down your results.

DocumentDB mongorestore index fails to create with error 303: Field currently not supported

I have a DocDB instance running the 4.0 engine version. As stated here: [https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-index](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html#mongo-apis-index), 2dsphere indexes are supported in this engine version, and that is correct, I have been able to create 2dsphere indexes. The problem comes when the index gets created. Default option fields will be added to the index metadata when created, and between those, the `v` field and the `2dsphereIndexVersion` field will be added to the index metadata options. These are the fields that are not supported when trying to restore an index definition from a mongodump containing a 2dsphere index. Steps to reproduce: Create the index: ``` db.collection.createIndex({ loc : "2dsphere" }) ``` Default option fields will be added to the index: ``` { "v" : 4, "name" : "loc_2dsphere", "ns" : "db.field", "2dsphereIndexVersion" : 1 } ``` So, when running a mongodump, the dump will contain this index metadata. Then, when trying to run a mongorestore of that dump, you get an error like this: ``` { "ok" : 0.0, "code" : 303.0, "errmsg" : "Field '2dsphereIndexVersion' is currently not supported", "operationTime" : Timestamp(1657665107, 1) } ``` or ``` { "ok" : 0.0, "code" : 303.0, "errmsg" : "Field 'v' is currently not supported", "operationTime" : Timestamp(1657665203, 1) } ``` mongorestore can be run with `--noIndexRestore` option but is there a way to restore the indexes?
1
answers
0
votes
36
views
puchito
asked 4 months ago

AWS 2dsphere limitation

Hi all, I am using DocumentDB with Mongo support on AWS and we are having documents that include geolocation. We have read the documentation of AWS for mongo support [here](https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html) but despite it says that it's supported we are receiving error during the creation of the index. The error we are getting when creating the index is : "*Command createIndexes failed : Index type not supported : 2dsphere*" The c# code that should generate the index is the below : ``` var prefixIndexName = nameof(Account.Address)+"."+nameof(Account.Address.Geolocation); if (!accountCollection.ExistIndex(prefixIndexName+"_2dsphere")) { Console.WriteLine("Seeding Geolocation Geo2DSphere Index ..."); var geoLocation = new StringFieldDefinition<Account>(prefixIndexName); var indexDefinition = new IndexKeysDefinitionBuilder<Account>().Geo2DSphere(geoLocation); var indexModel = new CreateIndexModel<Account>(indexDefinition, new CreateIndexOptions { Background = false }); accountCollection.CreateIndex(indexModel); } ``` The field that we are trying to add in the index is the "Address" and it looks like this : ``` "Address": { "CountryId": number, "PostCode": string, "AddressLine1": string, "AddressLine2": string, "City": string, "State": string, "Geolocation": { "type": "Point", "coordinates": decimal[] // e.g. [xx.xxxxxxx, xx.xxxxxxx] } } ``` The code is working on my local MongoDB installation, so I believe I am missing something to make it run on AWS. Any help you could provide is valuable, thanks in advance for your time!
0
answers
2
votes
39
views
asked 5 months ago

Quicksight Athena - analysis error: "HIVE_UNSUPPORTED_FORMAT: Unable to create input format"

Hello. I'm trying to create an analysis from my DocumentDB instance. I'm using the aws services Glue, Athena and Quicksight. In Glue I have created a connection to the DocumentDB and a crawler for auto creating tables. This works as expected and tables are created and displayed in glue. Even though I specify that the crawler should not give the tables any prefixes, it does add the database name as a prefix. When I look at the Glue catalog in Athena (the default AwsDataCatalog) I do see the database that was created in glue, however it does not show any tables. If I click on edit, it takes me to the correct database in glue which displays the tables that have been created by the previously mentioned crawler. So my first question is **Why doesn't the tables show up in Athena?** This is blocking me from performing queries in Athena. When I go to QuickSight and select the default Athena glue catalog ("AwsDataCatalog") I DO get the tables created by the crawler, and I can create datasets. However, when I try to create an analysis using these datasets, I get the error: ``` sourceErrorCode: 100071 sourceErrorMessage: [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. HIVE_UNSUPPORTED_FORMAT: Unable to create input format ``` I have looked a bit around and some people said that this error is due to the table properties **"Input format"** and **"Output format"** being empty (which they indeed are for me). I have tried entering almost all the different formats to the table, but I keep on getting the Quicksight error above only now it has the input format at the end ``` HIVE_UNSUPPORTED_FORMAT: Unable to create input format json ``` **So my second questions is** I do not see anywhere in the crawler where I can specify input or output format. Does it have to be done manually? And What are the correct input and output formats for my setup?
0
answers
0
votes
98
views
asked 6 months ago

Occasionally getting "MongoServerSelectionError: Server selection timed out..." errors

Hi, We have a lambda application that uses DocumentDB as the database layer. The lambdas are set in the same VPC as the DocumentDB cluster, and we're able to connect, and do all query (CRUD) operations as normal. The cluster is a simple cluster with 1 db.t4g.medium instance. One of the lambdas is triggered by an SNS queue and gets executed ~1M times over a 24h period. There is a database query involved in each one, and the vast majority of these executions go fine. The MongoClient is created outside of the handler in a separate file as detailed here: https://www.mongodb.com/docs/atlas/manage-connections-aws-lambda/ so that "warm" lambda executions will re-use the same connection. Our lambdas are executed as async handlers, not using a callback. The MongoClient itself is created in its own file as so: ``` const uri = `mongodb://${process.env.DB_USER}:${process.env.DB_PASSWORD}@${process.env.DB_ENDPOINT}:${process.env.DB_PORT}/?tls=true&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false`; const client = new MongoClient(uri, { tlsCAFile: 'certs/rds-combined-ca-bundle.pem' }); export const mongoClient = client.connect() ``` A sample handler would be something like this (TypeScript): ```` import { mongoClient } from "./mongo.client"; const DB_NAME = 'MyDB'; export const snsHandler = async (event: SNSEvent): Promise<void> => { const notif = JSON.parse(event.Records[0].Sns.Message); const item = await mongoClient .then(client => client.db(DB_NAME).collection(notif.collection).findOne({ _id: notif.id })) .catch(err => { console.error(`Couldn't find item with id ${notif.id} from collection ${notif.collection}`, err) return null; }) // do something with item } ```` --- --- Every so often (~100 times a day), we get specific errors along the lines of: ``` MongoServerSelectionError: Server selection timed out after 30000 ms at Timeout._onTimeout (/var/task/src/settlement.js:5:157446) at listOnTimeout (internal/timers.js:557:17) at processTimers (internal/timers.js:500:7) ``` or ``` [MongoClient] Error when connecting to mongo xg [MongoServerSelectionError]: Server selection timed out after 30000 ms at Timeout._onTimeout (/var/task/src/settlement.js:5:157446) at listOnTimeout (internal/timers.js:557:17) at processTimers (internal/timers.js:500:7) { reason: To { type: 'ReplicaSetNoPrimary', servers: Map(1) { '[REDACTED].docdb.amazonaws.com:[REDACTED]' => [ry] }, stale: false, compatible: true, heartbeatFrequencyMS: 10000, localThresholdMS: 15, setName: 'rs0', logicalSessionTimeoutMinutes: undefined }, code: undefined, [Symbol(errorLabels)]: Set(0) {} } ``` or ``` Lambda exited with error: exit status 128 Runtime.ExitError ``` --- --- In the Monitoring tab of the DocumentDB instance, the CPU doesn't go higher than 10%, and the database connections peak at ~170 (the connection limit on the tg4.medium is 500, unless I'm mistaken), with an average of around 30-40. For the lambda itself, the max concurrent executions peak at ~100. The errors aren't correlated to the peaks - they can happen at any time of the day, throughout the day. Can anyone provide any insight as to why the connection might be timing out from time to time, please? The default parameters of the MongoClient should keep the connection alive as long as the lambda is still active, and we don't seem to be close enough to the max connection limit. I'm assuming the way we have it set it is wrong, but I'm not sure how to go about fixing it Thanks
1
answers
0
votes
241
views
asked 6 months ago

Advice for best database/datastorage for historical data

Hi, I´m doing some reasearch to find the best place to centralize lots of data logs generated by my application considering pricing ,performance and scalabilty. Today all my application data including logs are stored on an Oracle database, but I´m thinking to move all the LOG related data outside it to reduce it´s size and not to worry about storage performance etc... Just put everything on a "infinite" storage apart from my actual database using CDC or a regular batch process **Below are some needs:** - Only inserts are necessary (no updates or deletes) - Customers will need access to this historic data - Well defined pattern of access (one or two indexes at maximum) - Latencies of few seconds is ok - Avoid infrastrucure, DBA, perfomance bottleneck log term... - Infinite Retentiton period (means I don´t want to worry about performance issues, storage size in long term. But something that can handle a few terabytes of data ) **Use case example: ** Historical Sales order by items ( id_item | id_customer | qty_sold | date_inserted ... ), aprox 50 millions records per day Where I would need to see the historical data by item, and by customer for example (two dimensions) I´ve done some research with the options below **S3 + Athena **-> Put everthing on s3, no worries about infrastructure perfomance issues, however as I need query by item and customer, probably it´would be necessary to break files by item or customer , generate millions of partitions to avoid high costs searching on every file etc.. **Postgre** -> Not sure if could be performance bottleneck once tables gets too big even with partition strategies **DynamoDB **-> Not sure if it´s a good alternative to historical data regarding pricing once seconds latency is ok **MongoDB/ DocumentDB **-> Not very familiar with it (I´d prefer SQL language type) but I know it´s has a good scalability **Cassandra**-> dont´know very much **Timeseries db as influxDB, timestream etc..**-> dont´know very much, but it seems appropriate for timeseries What option would you choose ? Sorry in advance if I saying something wrong or impossible :) Thank you!
1
answers
0
votes
35
views
asked 8 months ago