Searching S3 objects

0

A customer wants to search s3 objects by filename, metadata and tags. Is there any solutions that can help with this?

Thank you.

gefragt vor 3 Jahren2923 Aufrufe
2 Antworten
0
Akzeptierte Antwort

The problem using API calls is that the client will be charged for the list requests (which can go up if they have a large number of objects). Details about s3 related costs here: https://aws.amazon.com/s3/pricing/

The easiest way is to use S3 Inventory and run your searches from there. More details about this feature and which fields are available on the report here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-inventory.html

AWS
beantwortet vor 3 Jahren
0

Although not an AWS native service, there is Mixpeek, which runs text extraction like Tika, Tesseract and ImageAI on your S3 files then places them in a Lucene index to make them searchable.

You integrate it as follows:

  1. Download the module: https://github.com/mixpeek/mixpeek-python

  2. Import the module and your API keys:

     from mixpeek import Mixpeek, S3
     from config import mixpeek_api_key, aws
    
  3. Instantiate the S3 class (which uses boto3 and requests):

     s3 = S3(
         aws_access_key_id=aws['aws_access_key_id'],
         aws_secret_access_key=aws['aws_secret_access_key'],
         region_name='us-east-2',
         mixpeek_api_key=mixpeek_api_key
     )
    
  4. Upload one or more existing S3 files:

         # upload all S3 files in bucket "demo"            
         s3.index(bucket_name="demo")
    
         # upload one single file called "prescription.pdf" in bucket "demo"
         s3.index_one(s3_file_name="prescription.pdf", bucket_name="demo")
    
  5. Now simply search using the Mixpeek module:

         # mixpeek api direct
         mix = Mixpeek(
             api_key=mixpeek_api_key
         )
         # search
         result = mix.search(query="Heartgard")
         print(result)
    
  6. Where result can be:

     [
         {
             "_id": "REDACTED",
             "api_key": "REDACTED",
             "highlights": [
                 {
                     "path": "document_str",
                     "score": 0.8759502172470093,
                     "texts": [
                         {
                             "type": "text",
                             "value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞  "
                         },
                         {
                             "type": "hit",
                             "value": "Heartgard"
                         },
                         {
                             "type": "text",
                             "value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. "
                         }
                     ]
                 }
             ],
             "metadata": {
                 "date_inserted": "2021-10-07 03:19:23.632000",
                 "filename": "prescription.pdf"
             },
             "score": 0.13313256204128265
         }
     ] 
    

Then you parse the results

beantwortet vor einem Jahr

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen