How can I set custom index in opensearch serverless pipeline?

0

Hello, I am building a opensearch pipeline that reads csv data from s3 bucket (using sqs event) and stores that in a opensearch serverless collection. I am using the following configuration to create this and it works as expected. My problem is I want to create dynamic index - primary based on the filename of the parsed file. I tried passing an additional field in the SQS message, but data-prepper rejects that field.

My architecture currently is S3 -> SQS -> Opensearch serverless. In Opensearch pipeline, source is S3(SQS) with CSV processor and opensearch collection as the sink.

I was able to use {key} which is the S3 key, but my key is formatted as dt=2023-10/filename_202310.csv. I just want 202310 as the index. Is there a way to dynamically generate this?

Configuration (using data-prepper 2)

version: "2"
log-pipeline:
  source:
    s3:
      codec:
        newline:
      compression: "none"
      aws:
        region: "my-region"
        sts_role_arn: "my-role"
      acknowledgments: true
      scan:
        buckets:
          - bucket:
              name: "my-bucket"
  processor:
    - csv:
        source: "message"
        delimiter: "\t"
        delete_header: false
  sink:
    - opensearch:
        hosts: [ "my-serverless-host" ]
        aws:
          sts_role_arn: "my-role"
          region: "my-region"
          serverless: true
          serverless_options:
             network_policy_name: "my-network-policy"
        index: "vector_index" <--- want to make this dynamic, not sure how. 

https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/

1 Answer
0

Hello, I understand that you need to have dynamic index naming on the basis of file name in s3 bucket available.

I would like to inform, that you would have to extract the name and can perform testing different configuration available below in order to implement your usecase.

Here are some relevant links please refer below-

  1. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/grok/
  2. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/mutate-string/
  3. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/split-string/
  4. Features index management : https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-features-overview.html#osis-features-index-management

Moreover, I would request you to perform testing at your end by referring above available processors which are relevant for your usecase.

Hope the above information and documentation helps!

Mahek_M
answered 4 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions