How can I set custom index in opensearch serverless pipeline?

0

Hello, I am building a opensearch pipeline that reads csv data from s3 bucket (using sqs event) and stores that in a opensearch serverless collection. I am using the following configuration to create this and it works as expected. My problem is I want to create dynamic index - primary based on the filename of the parsed file. I tried passing an additional field in the SQS message, but data-prepper rejects that field.

My architecture currently is S3 -> SQS -> Opensearch serverless. In Opensearch pipeline, source is S3(SQS) with CSV processor and opensearch collection as the sink.

I was able to use {key} which is the S3 key, but my key is formatted as dt=2023-10/filename_202310.csv. I just want 202310 as the index. Is there a way to dynamically generate this?

Configuration (using data-prepper 2)

version: "2"
log-pipeline:
  source:
    s3:
      codec:
        newline:
      compression: "none"
      aws:
        region: "my-region"
        sts_role_arn: "my-role"
      acknowledgments: true
      scan:
        buckets:
          - bucket:
              name: "my-bucket"
  processor:
    - csv:
        source: "message"
        delimiter: "\t"
        delete_header: false
  sink:
    - opensearch:
        hosts: [ "my-serverless-host" ]
        aws:
          sts_role_arn: "my-role"
          region: "my-region"
          serverless: true
          serverless_options:
             network_policy_name: "my-network-policy"
        index: "vector_index" <--- want to make this dynamic, not sure how. 

https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/sinks/opensearch/

1개 답변
0

Hello, I understand that you need to have dynamic index naming on the basis of file name in s3 bucket available.

I would like to inform, that you would have to extract the name and can perform testing different configuration available below in order to implement your usecase.

Here are some relevant links please refer below-

  1. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/grok/
  2. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/mutate-string/
  3. https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/split-string/
  4. Features index management : https://docs.aws.amazon.com/opensearch-service/latest/developerguide/osis-features-overview.html#osis-features-index-management

Moreover, I would request you to perform testing at your end by referring above available processors which are relevant for your usecase.

Hope the above information and documentation helps!

Mahek_M
답변함 5달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠