How can I set custom index in opensearch serverless pipeline?

0

Hello, I am building a opensearch pipeline that reads csv data from s3 bucket (using sqs event) and stores that in a opensearch serverless collection. I am using the following configuration to create this and it works as expected. My problem is I want to create dynamic index - primary based on the filename of the parsed file. I tried passing an additional field in the SQS message, but data-prepper rejects that field.

My architecture currently is S3 -> SQS -> Opensearch serverless. In Opensearch pipeline, source is S3(SQS) with CSV processor and opensearch collection as the sink.

I was able to use {key} which is the S3 key, but my key is formatted as dt=2023-10/filename_202310.csv. I just want 202310 as the index. Is there a way to dynamically generate this?

Configuration (using data-prepper 2)

version: "2"
log-pipeline:
  source:
    s3:
      codec:
        newline:
      compression: "none"
      aws:
        region: "my-region"
        sts_role_arn: "my-role"
      acknowledgments: true
      scan:
        buckets:
          - bucket:
              name: "my-bucket"
  processor:
    - csv:
        source: "message"
        delimiter: "\t"
        delete_header: false
  sink:
    - opensearch:
        hosts: [ "my-serverless-host" ]
        aws:
          sts_role_arn: "my-role"
          region: "my-region"
          serverless: true
          serverless_options:
             network_policy_name: "my-network-policy"
        index: "vector_index" <--- want to make this dynamic, not sure how. 

https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/

1 個回答
0

Hello, I understand that you need to have dynamic index naming on the basis of file name in s3 bucket available.

I would like to inform, that you would have to extract the name and can perform testing different configuration available below in order to implement your usecase.

Here are some relevant links please refer below-

  1. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/grok/
  2. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/mutate-string/
  3. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/split-string/
  4. Features index management : https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-features-overview.html#osis-features-index-management

Moreover, I would request you to perform testing at your end by referring above available processors which are relevant for your usecase.

Hope the above information and documentation helps!

Mahek_M
已回答 5 個月前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南