Passer au contenu

Time series records lost during AWS Timestream batch load job

0

We created a Timestream batch load job to backfill a table. The input was approximately 70 CSV files with sizes between 7MB and 15MB. The job finished successfully with indicating that 0 records were failed to be ingested.

However, during manual sanity check, we discovered some records in the CSV files were not processed.

We first thought that it might take time even after the job indicates a success completion until the records can be queried but that is not true as we waited couple of days but the records were not there.

We tried this with another bigger table 100 CSV files each 150MB approx. and the same issue of dropping records happened.

What in interesting is that the same records get dropped every time we ingest the same CSV files via a batch load job.

Not all records are dropped though, only a relatively small number of records.

We did not experience records dropping when ingesting to magnetic memory instead of a batch load job.

Any ideas?

Thank you.

demandé il y a 2 ans400 vues
2 réponses
0
Réponse acceptée

Tried it again and waited 48 hours, the records showed up. Perhaps more docs from AWS in this aspect can help.

répondu il y a 2 ans
0

This might be an idempotency issue, did you check if the mapping for batch load / time granularity keeps the time+dimesions+measure/partition unique. As you describe it as a deterministic problem, that would be my first idea.

AWS
répondu il y a 2 ans
  • The time, dimension, measure (no custom user defined partitions) are unique for the records that were dropped. In other words, there are no duplicate records with the same time, dimension and measure.

    Is that what you meant?

Vous n'êtes pas connecté. Se connecter pour publier une réponse.

Une bonne réponse répond clairement à la question, contient des commentaires constructifs et encourage le développement professionnel de la personne qui pose la question.