- Newest
- Most votes
- Most comments
Hello.
It is difficult to parse a tsv file separated by a tab using the CSV Processor. Generally, the separator supports the CSV separator (,). However, testing the tab separator(\t) using the CSV Processor confirms that it is not supported.
[+] CSV processor : Using the processor
https://opensearch.org/docs/latest/ingest-pipelines/processors/csv/
TEST : Below are the sample test results. Details of the information below can be found in the document above.
PUT _ingest/pipeline/csv-processor
{
"description": "Split resource usage into individual fields",
"processors": [
{
"csv": {
"field": "resource_usage",
"target_fields": ["cpu_usage", "memory_usage", "disk_usage"],
"separator": "/t"
}
}
]
}
PUT testindex1/_doc/1?pipeline=csv-processor
{
"resource_usage": "60 70 80"
}
Result: TAB(\t) is not properly recognized .
{
"_id" : "137",
"_score" : 1.0,
"_source" : {
"resource_usage" : "60 70 80",
"cpu_usage" : "60 70 80"
}
If you want to use the CSV Processor, it seems the best way to use the CSV Processor after converting the TSV to CSV. Additionally, if you want to use TSV source to bring it to open search, you can also use Data Prepper.
[+] Announcing Data Prepper 2.0.0
https://opensearch.org/blog/Announcing-Data-Prepper-2.0.0/
Data Prepper can now import CSV or TSV formatted files from Amazon Simple Storage Service (Amazon S3) sources.
This is useful for systems like Amazon CloudFront, which write their access logs as TSV files. Now you can parse these logs using Data Prepper.
[+] Data Prepper
https://opensearch.org/docs/latest/data-prepper/index/
Thank you.
I figured it out. I had to use csv codec to detect and use header -
codec:
csv:
detect_header: true
separator: "\t"
quote_character: "\""
Relevant content
- asked 2 years ago
- asked 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated a month ago
Hello Hyunjoong, Im using CSV processor as part of data-prepper 2.0. Its using \t as a valid delimiter. My problem is the inability to use TSV file's header.
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/pipeline-config-reference.html and https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/csv/
This is how im using it
This is generating the output as
instead of my required format