Hello, I have gotten the automatic revision publishing working, such that I can drop a file into the bucket and it will publish a revision to my product, however, I noticed during the last one that it published two revisions for a single file drop. I had the timeout value of the lambda set at 61 seconds.
So, at one point it timed out -
2022-12-02T00:30:55.929Z 5db788f3-7660-44b4-8ae0-e45b3d061bcb Task timed out after 61.06 seconds
Then, looking at the cloudwatch logs, I can see that it seemingly resumed and finished successfully but I noticed that I have two revisions to my data product and I only dropped the file into the bucket once.
I have since increased the timeout on the lambda to be 15 minutes, however, I may have larger revisions in the future that may exceed the new time I have set and I need to make sure that I do not get two revisions published if I drop a file only one time. Thank you.
As you have observed, the time it takes to publish a new revision is a function of the size of the files included in the revision. Something you can explore is the publisher coordinator (https://github.com/awslabs/aws-data-exchange-publisher-coordinator), which creates a new revision when a new manifest file is created. With that manifest file, you control what files should be contained in the revision.
Alternatively, you can explore AWS Data Exchange for Amazon S3, where you grant access to assets contained in your bucket (without having to publish new revisions); instead, you specify which bucket and, optionally, which keys/prefixes to share in the data set. You can find out more at https://aws.amazon.com/data-exchange/why-aws-data-exchange/s3/ and https://docs.aws.amazon.com/data-exchange/latest/userguide/publishing-products.html#publish-s3-data-access-product