Searching S3 objects

0

A customer wants to search s3 objects by filename, metadata and tags. Is there any solutions that can help with this?

Thank you.

asked 3 years ago2886 views
2 Answers
0
Accepted Answer

The problem using API calls is that the client will be charged for the list requests (which can go up if they have a large number of objects). Details about s3 related costs here: https://aws.amazon.com/s3/pricing/

The easiest way is to use S3 Inventory and run your searches from there. More details about this feature and which fields are available on the report here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/configure-inventory.html

AWS
answered 3 years ago
0

Although not an AWS native service, there is Mixpeek, which runs text extraction like Tika, Tesseract and ImageAI on your S3 files then places them in a Lucene index to make them searchable.

You integrate it as follows:

  1. Download the module: https://github.com/mixpeek/mixpeek-python

  2. Import the module and your API keys:

     from mixpeek import Mixpeek, S3
     from config import mixpeek_api_key, aws
    
  3. Instantiate the S3 class (which uses boto3 and requests):

     s3 = S3(
         aws_access_key_id=aws['aws_access_key_id'],
         aws_secret_access_key=aws['aws_secret_access_key'],
         region_name='us-east-2',
         mixpeek_api_key=mixpeek_api_key
     )
    
  4. Upload one or more existing S3 files:

         # upload all S3 files in bucket "demo"            
         s3.index(bucket_name="demo")
    
         # upload one single file called "prescription.pdf" in bucket "demo"
         s3.index_one(s3_file_name="prescription.pdf", bucket_name="demo")
    
  5. Now simply search using the Mixpeek module:

         # mixpeek api direct
         mix = Mixpeek(
             api_key=mixpeek_api_key
         )
         # search
         result = mix.search(query="Heartgard")
         print(result)
    
  6. Where result can be:

     [
         {
             "_id": "REDACTED",
             "api_key": "REDACTED",
             "highlights": [
                 {
                     "path": "document_str",
                     "score": 0.8759502172470093,
                     "texts": [
                         {
                             "type": "text",
                             "value": "Vetco Prescription\nVetcoClinics.com\n\nCustomer:\n\nAddress: Canine\n\nPhone: Australian Shepherd\n\nDate of Service: 2 Years 8 Months\n\nPrescription\nExpiration Date:\n\nWeight: 41.75\n\nSex: Female\n\n℞  "
                         },
                         {
                             "type": "hit",
                             "value": "Heartgard"
                         },
                         {
                             "type": "text",
                             "value": " Plus Green 26-50 lbs (Ivermectin 135 mcg/Pyrantel 114 mg)\n\nInstructions: Give one chewable tablet by mouth once monthly for protection against heartworms, and the treatment and\ncontrol of roundworms, and hookworms. "
                         }
                     ]
                 }
             ],
             "metadata": {
                 "date_inserted": "2021-10-07 03:19:23.632000",
                 "filename": "prescription.pdf"
             },
             "score": 0.13313256204128265
         }
     ] 
    

Then you parse the results

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions