Questions tagged with Amazon Simple Workflow (Amazon SWF)
Content language: English
Sort by most recent
Browse through the questions and answers listed below or filter and sort to narrow down your results.
We are using Java Flow framework for swf workflow and activities. The current workflow will execute two activities, now we will need to register a new activity, and update the workflow implementation to conditionally run another activity when the workflow input meet a certain condition, so there is no change to other two activities, no change to the workflow interface itself, but will only update the workflow implementation to invoke another activity, now my question is if we deploy the change, will the in-fly workflow execution that run on the old version failed or timed out because of the reply process. I am not sure if this falls into https://docs.aws.amazon.com/amazonswf/latest/awsflowguide/java-flow-making-changes-solutions.html#use-feature-flags, so the in-fly execution won't be impacted when deploy the changes to the workflow. Please see my below code before and after
```
// Before Change
@Workflow(dataConverter = ManualOperationSwfDataConverter.class)
@WorkflowRegistrationOptions(defaultExecutionStartToCloseTimeoutSeconds = MAX_WAIT_TIME_SECONDS,
defaultTaskStartToCloseTimeoutSeconds = DEFAULT_TASK_START_TO_CLOSE_TIMEOUT_SECONDS)
public interface MyWorkflowDefinition {
@Execute(version = "1.0")
void MyWorkflow(Input input);
}
@Override
@Asynchronous
public void MyWorkflow(Input input) {
new TryCatch() {
@Override
protected void doTry() {
final Promise<Input> promise = client.runActivity1(input);
final Promise<Void> result2 = client.runActivity2(promise);
}
@Override
protected void doCatch(final Throwable e) throws Throwable {
handleError(e);
throw e;
}
};
}
```
```
// After Change
@Workflow(dataConverter = ManualOperationSwfDataConverter.class)
@WorkflowRegistrationOptions(defaultExecutionStartToCloseTimeoutSeconds = MAX_WAIT_TIME_SECONDS,
defaultTaskStartToCloseTimeoutSeconds = DEFAULT_TASK_START_TO_CLOSE_TIMEOUT_SECONDS)
public interface MyWorkflowDefinition {
@Execute(version = "1.0")
void MyWorkflow(Input input);
}
@Override
@Asynchronous
public void MyWorkflow(Input input) {
new TryCatch() {
@Override
protected void doTry() {
if (input.client == eligibleClient) {
final Promise<Input> promise1 = client.runActivity3(input);
final Promise<Input> promise2 = client.runActivity1(promise1);
final Promise<Void> result2 = client.runActivity2(promise2);
} else {
final Promise<Input> promise = client.runActivity1(input);
final Promise<Void> result2 = client.runActivity2(promise);
}
}
@Override
protected void doCatch(final Throwable e) throws Throwable {
handleError(e);
throw e;
}
};
}
```
Hi,
Our team in Amazon, uses a CloudWatch metric that tracks all active/pending/scheduled SWF workflows. The metric has been working fine since last year and was accurately able to track ongoing SWF workflows. However, since Sept 6th, the metric is always in ALARM (meaning it always says that there is a pending SWF workflow), although there is no active workflow from the SWF console.
Additionally, the 'PendingTasks' CloudWatch metric count keeps on increasing since Sept 6th till now. I couldn't find any active workflows from the SWF console to terminate. Appreciate if any support that can bring back the CloudWatch monitor to accurately track active SWF workflows.
Thanks
I am using @ExponentialRetry on the activity, but it is not retrying the activity. Here is my code in Intellij IDEA.
This is my Activities interface
```java
@ExponentialRetry(
initialRetryIntervalSeconds = 5,
exceptionsToRetry = IllegalStateException.class,
maximumAttempts = 5)
String printHello();
```
Implementation class:
```java
private boolean check = true; // on the class level
@Override
public String printHello() {
System.out.println("In activity 1111");
if (check) {
check = false;
System.out.println("121212121");
throw new IllegalStateException("showing this for the first time from this activity");
}
return "Hello World";
}
```
This is my Workflow implementation class from where I am calling this activity through the activities client, means workflow entry point
```java
@Override
public void helloWorld() {
handleUnreliableActivity();
}
private void handleUnreliableActivity() {
new TryCatch() {
@Override
protected void doTry() throws Throwable {
Promise<String> result = client.printHello();
client.printBye(result);
}
@Override
protected void doCatch(Throwable throwable) throws Throwable {
System.out.println("In doCatch **************************");
throw throwable;
}
};
}
```
Can anyone help?
1)I have 150 ETL jobs need to be moved from on-premise to AWS cloud. What is the best way to schedule them in AWS? AWS batch or SWF or STEP function or any other service?
Which tool/service has all the capability of a full time job scheduler?
2) As per AWS Batch documentation, it says that it supports all jobs that are Docker container. Does it mean that AWS batch is only meant for Docker container jobs? In my case all jobs are either bash/ksh scripts and normal ETL jobs that read from files from EC2 and load data into RDS DB.
I know there's a major outage going on in us-east-1 right now, per https://status.aws.amazon.com/. The notice at the top of the page says "Services impacted include: EC2, Connect, DynamoDB, Glue, Athena, Timestream, and Chime and other AWS Services in US-EAST-1."
Per that page's status table (and this RSS feed: https://status.aws.amazon.com/rss/swf-us-east-1.rss ), SWF is currently working.
However, we're seeing tons of 500s in our app logs, and none of our processes that use SWF seem to be making progress.
Is SWF one of those "other impacted services"?
I don't know of any successful requests going through, so I think it is, but an official confirmation would be nice.
An edited example error from our PHP log follows:
```
Error: start_workflow_execution: exception 'AwsSwfExceptionSwfException' with message 'Error executing "StartWorkflowExecution" on "https://swf.us-east-1.amazonaws.com";
AWS HTTP error: Server error: `POST https://swf.us-east-1.amazonaws.com` resulted in a `500 Internal Server Error` response:
{"__type":"com.amazon.coral.service#InternalFailure","message":"All attempts to invoke operation: StartWorkflowExecution (truncated...)
```
Hi,
We currently have long running processes/jobs that are being executed via workflows. A workflow implementation calls the jobs to run. The call and the job runs asynchronously. By default, we have set the activity workflow to time-out after an hour.
We recently saw an issue where if a job has been running for more than an hour, SWF throws an ActivityTaskTimedOutException in our workflow implementation. We expected that this should also cause the long running job to stop, but what happens is that only the workflow implementation catches this exception but job still continued running in the background.
My question is, why did this not terminate the long running process? Is this an issue on how swf times-out processes, or do we need to have an implementation on our end that checks these time-out exceptions and terminate the process? If it is the latter, is there any guidance on how we should implement this?
We badly need some help on this as we have been searching for a solution on this for some time now.
Thank you!
Hello,
We have hit the limit "Maximum workflow and activity types" Although we can still register activities in other domains, switching domains is too complicated in out case because the domain name is used in many places outside SWF :-)
We use Java framework and use same version across all WFs and activities. So whenever we change activities, we bump up the version number that leads to hundreds of new activities to be registered.
While we are waiting on the limit increase, what would be the recommended practice to minimize the number of activities registered?
Thank you
Hello,
I would like to wait on a list of promises, but the list is dynamically updated (items are added to) by a signal handler.
I tried two approaches - @Wait in the function and waiting on AndPromise. In case of AndPromise - it is a member variable of WF class and gets recreated when a new promise is added to the list by signal handler.
In both cases the function starts execution as soon as the first promise in the list is ready.
How to fix it?
Hello
We are designing a long-running workflow. We have a "parent" workflow that only does setting up and cleanup, the majority of work is done in child WF.
Complete task may run for up to a year, so we thinking to implement child workflow to continue as
continuous WF (https://docs.aws.amazon.com/amazonswf/latest/awsflowguide/continuous.html) continuing as new after completing each "chunk", but keep the parent WF running (setting timeout to one year as max allowed)
We have tested the approach and it seems to work. The parent WF seems to be unaware of child restarting (which is exactly what we need) and history stays small.
Is there any drawback to this approach? Anything we should consider?
Thank you,
Akbar
As there is 32K limit on activity result I see activities just timing out when result is larger while it is OK for normal cases, in case when activity is throwing an exception I cannot see the error reason.
Specifically we are using SWF Java Flow Framework. Besides we use Hibernate with Spring. When data access layer throws exception, its text (including all the stack) is longer than 32K. In this case we see no activity failure, but just timeout. We could resolve it using custom serializer, but I would prefer a solution using fewer custom components.
Is there such a solution?