Hello,
Approximately 10 days ago my elasticsearch cluster stopped accepting new documents to some indices.
I am writing log data to elasticsearch from an ECS task using https://github.com/internetitem/logback-elasticsearch-appender, which formats log messages for submission to the Elasticsearch /_bulk REST API. I am using the AWS java SDK version 1.11.632 to access the default credential provider chain and sign requests. The task is assigned a specific task role in the task definition, and this role has been added explicitly to my ES domain's access policy.
The ES cluster is managed in a "shared" AWS account while the ECS task is running in a separate "dev" account. The domain name is "logs".
The access policy looks like this (account numbers redacted)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::DEV_ACCT_NUM:role/ecsTaskExecutionRole",
"arn:aws:iam::DEV_ACCT_NUM:role/MyTaskRole"
//Plus some more roles for other tasks
]
},
"Action": "es:*",
"Resource": "arn:aws:es:ca-central-1:SHARED_ACCT_NUM:domain/logs/*"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::SHARED_ACCT_NUM:role/Cognito_Kibana_IdentitiesAuth_Role"
},
"Action": "*",
"Resource": "arn:aws:es:ca-central-1:SHARED_ACCT_NUM:*"
}
]
}
Logs are written to an index named {environment}-{date}, e.g. "dev-2020-04-23". I expect that if the so named index doesn't exist it will be created when I write to it. This has worked so far.
THE PROBLEM:
When I write to ES, I sometimes receive a 403 response indicating that the request signature is invalid. Nothing has changed about the way I retrieve credentials or sign requests, nor has my access policy changed.
More often the response is 200 OK, but the logs do not appear in Kibana and the index does not appear in the AWS management console.
I can still write to existing indices which leads me to believe the problem is in creating new indices and not in writing documents. I was not able to find any documented limit to the number of indices.
This behaviour was first observed when the cluster was ES version 7.1. Upgrading to 7.4 did not help.
Thanks in advance for any assistance.