Skip to content

AWS Support: Bedrock Kimi K2 / K2.5 Service Regression

0

AWS Support: Bedrock Kimi K2 / K2.5 Service Regression

Date: 2026-04-23
Severity: High — model unusable for production workloads
AWS Service: Amazon Bedrock, us-east-1
Model IDs affected:

  • moonshotai.kimi-k2.5
  • moonshot.kimi-k2-thinking

Problem Description

Both moonshotai.kimi-k2.5 and moonshot.kimi-k2-thinking consistently return all-padding output (!!!!...) via the Bedrock Converse API in us-east-1, regardless of prompt content or inference parameters. The model exhausts its maxTokens budget on internal reasoning and emits no usable text — only the padding character ! repeated to fill the token limit.

Critically: the model was working correctly earlier on the same day. A successful job was recorded at approximately 09:53 UTC on 2026-04-23 (Bedrock-side outputTokens=566, valid JSON response). The regression appears to have occurred during the day.


Steps to Reproduce

Minimal reproduction using only the AWS CLI (no application code involved):

# Test 1: kimi-k2.5 via Converse API
aws bedrock-runtime converse \
  --model-id moonshotai.kimi-k2.5 \
  --messages '[{"role":"user","content":[{"text":"Return valid JSON in the format {\"word\": \"HELLO\"}. Nothing else."}]}]' \
  --inference-config '{"maxTokens":200,"temperature":0.1}' \
  --region us-east-1

# Test 2: kimi-k2 via Converse API
aws bedrock-runtime converse \
  --model-id moonshot.kimi-k2-thinking \
  --messages '[{"role":"user","content":[{"text":"Return valid JSON in the format {\"word\": \"HELLO\"}. Nothing else."}]}]' \
  --inference-config '{"maxTokens":200,"temperature":0.1}' \
  --region us-east-1

Observed Behaviour

kimi-k2.5 response (Converse API)

{
  "output": {
    "message": {
      "role": "assistant",
      "content": [
        {
          "text": "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
        }
      ]
    }
  },
  "stopReason": "max_tokens",
  "usage": { "inputTokens": 34, "outputTokens": 200, "totalTokens": 234 }
}

kimi-k2 response (Converse API)

{
  "output": {
    "message": {
      "role": "assistant",
      "content": [
        {
          "reasoningContent": {
            "reasoningText": {
              "text": "!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
            }
          }
        }
      ]
    }
  },
  "stopReason": "max_tokens"
}

Both models exhaust maxTokens with ! padding. stopReason: "max_tokens" confirms the model is running but producing no real output.


Expected Behaviour

The model should return a valid assistant response — for the trivial prompt above, the expected output would be:

{ "word": "HELLO" }

Mitigations Attempted (All Ineffective)

All the following were tested via direct AWS CLI calls:

AttemptParameterResult
Disable thinking (Moonshot native)additionalModelRequestFields: { "enable_thinking": false }Silently accepted, no effect
Disable thinking (Claude-style)additionalModelRequestFields: { "thinking": { "type": "disabled" } }Silently accepted, no effect
Reduce reasoning effortadditionalModelRequestFields: { "reasoning_effort": "low" }Accepted, still produces !!!
Reduce reasoning effortadditionalModelRequestFields: { "reasoning_effort": "minimal" }ValidationException: "not supported"
Assistant prefillAppend { "role": "assistant", "content": [{ "text": "{" }] } as last messageAccepted; model produces {{{{ continuation
InvokeModel APIOpenAI-compatible body with max_tokens: 200Same !!! output, finish_reason: "length"

Note on reasoning_effort: The ValidationException message for an invalid value lists high, low, medium, minimal as valid values. However minimal is separately rejected with "not supported". In practice only high, low, and medium are accepted, and none suppress the !!! output.


Evidence of Prior Working State

The following successful invocation was recorded at approximately 09:53:23 UTC on 2026-04-23, demonstrating the model was functional earlier the same day:

  • Model: moonshotai.kimi-k2.5
  • Job ref: p-56742cf5-...
  • Output tokens: 566
  • Response: valid JSON

Impact

  • Production API failures with user-facing errors ("The model couldn't produce a valid response" / "Analysis timed out")
  • Lambda workers timeout (240s) because Bedrock hangs before returning the !!! response
  • All kimi-k2 and kimi-k2.5 requests are affected; fallback to alternative models required
  • The issue resolve by "itself" the next day. Most likely something was fixed on the provider side

asked a month ago260 views
3 Answers
1

Hello.

I ran the following command in my AWS account and confirmed a normal response.
This suggests a possible issue, such as a temporary shortage of Bedrock capacity on the AWS side.
As of 2026/04/23 01:57 PM, it was executed successfully.

~ $ aws bedrock-runtime converse \
>   --model-id moonshotai.kimi-k2.5 \
>   --messages '[{"role":"user","content":[{"text":"Return valid JSON in the format {\"word\": \"HELLO\"}. Nothing else."}]}]' \
>   --inference-config '{"maxTokens":200,"temperature":0.1}' \
>   --region us-east-1
{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "text": " {\"word\": \"HELLO\"}"
                }
            ]
        }
    },
    "stopReason": "end_turn",
    "usage": {
        "inputTokens": 44,
        "outputTokens": 8,
        "totalTokens": 52
    },
    "metrics": {
        "latencyMs": 596
    }
}
~ $ aws bedrock-runtime converse \
>   --model-id moonshot.kimi-k2-thinking \
>   --messages '[{"role":"user","content":[{"text":"Return valid JSON in the format {\"word\": \"HELLO\"}. Nothing else."}]}]' \
>   --inference-config '{"maxTokens":200,"temperature":0.1}' \
>   --region us-east-1
{
    "output": {
        "message": {
            "role": "assistant",
            "content": [
                {
                    "reasoningContent": {
                        "reasoningText": {
                            "text": " The user wants me to return valid JSON in a specific format: {\"word\": \"HELLO\"}.\nThey explicitly say \"Nothing else.\"\n\nSo I should just output that exact JSON string and nothing else. No extra text, no explanations, no markdown formatting.\n\nThe JSON is simple and valid. I will just return it as is."
                        }
                    }
                },
                {
                    "text": " {\"word\": \"HELLO\"}"
                }
            ]
        }
    },
    "stopReason": "end_turn",
    "usage": {
        "inputTokens": 24,
        "outputTokens": 77,
        "totalTokens": 101
    },
    "metrics": {
        "latencyMs": 1468
    }
}
EXPERT
answered a month ago
EXPERT
reviewed a month ago
  • Thanks for checking. It seems like it does not happen on every request, but I am still getting the same issue with more complex prompts.

0

Based on the information available about Amazon Bedrock's supported models, I can confirm that both Moonshot AI models you're referencing are indeed supported in Amazon Bedrock:

Kimi K2.5 (Model ID: moonshotai.kimi-k2.5) is Moonshot AI's multimodal model with improved reasoning, coding, and multilingual capabilities. It supports text and image inputs, text outputs, and streaming, and is available in us-east-1 among other regions.

Kimi K2 Thinking (Model ID: moonshot.kimi-k2-thinking) is Moonshot AI's reasoning model with chain-of-thought capabilities designed for complex problem solving in math, coding, and logic. It supports text input and output with streaming capabilities, and is also available in us-east-1.

However, the specific runtime behavior you're describing—where the models are returning only padding characters (!) and exhausting token budgets—represents an operational issue that is not documented in the available model specifications. The models are listed as having "N/A" for inference parameters, which means specific parameter controls may be limited or handled differently than other model families.

Given that you've documented a working state earlier the same day followed by this regression, and you've systematically ruled out configuration issues through your testing, this appears to be a service-side issue that requires investigation by AWS Support. I recommend opening a support case with AWS, providing the detailed reproduction steps, timestamps of the working vs. non-working states, and the specific model invocation logs you've collected. This will allow the Bedrock service team to investigate potential model deployment or infrastructure issues in the us-east-1 region.
Sources
Supported foundation models in Amazon Bedrock - Amazon Bedrock
Moonshot AI - Amazon Bedrock

answered a month ago
EXPERT
reviewed a month ago
0

Running into the exact same problem with Kimini K2.5. First it was only occurring sporadically on larger conversations. Often retrying it could resolve the issue, with the disadvantage of additional input costs. For around 24 hours, Kimini K2.5 has been completely unusable for me, as it affects now every conversation, no matter how small or how many tries. Could even reproduce it in the playground: playground test

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.