Workflows
Amorphic workflows helps you to visualize and orchestrate complex analytical pipelines using amorphic jobs (ETL), machine learning model inference tasks, email notification tasks, Textract, Translate, Comprehend and Medical Comprehend tasks.
Amorphic workflows manages execution and monitoring of all its components. You can create a dependency chain (Directed acyclic graph) of several components of types Jobs, ML model inference jobs, Email notifications, Textract, Translate, Comprehend and Medical Comprehend to perform complex analytical tasks.
Amorphic Workflows page consists of options to list or create a new Workflow. You can sort through the workflows list using entities like name, created by , creation time etc.
Create Workflow
You can create new workflows in Amorphic by using the "Create Workflow" functionality of Amorphic application.
In order to create a new workflow, you would need at least one node.
To create a node, you can import a pre-existing module provided by Amorphic. Following are the fields needed to create a node:
Attribute | Description |
---|---|
Module Type | Module is a pre-defined entity on which a node is built. As of now Amorphic supports these module types: ETL Jobs, ML model inference jobs, Email notifications, Textract, Translate, Comprehend, Medical Comprehend, Sync To S3 and File Load Validation. |
Resource | Based on the module type selected a list of resources are shown. For example if module type ETL Job is selected, all the etl jobs that a user has access are displayed for the user to choose from. |
Node Name | Name given to the node for quick and easy identification. |
Input Configurations | Arguments which can be used in the job. |
Below image shows how to create a new workflow:
User can also create a workflow by using the "Navigator" which would direct the user to workflow Creation page from any where in the application. To get the option displayed, the user need to double tap on "Ctrl" button in the keyboard.
Below is a simple graphic to demonstrate Navigator.
Workflow execution properties
Workflow execution properties are the key-value properties that can be defined while creating a workflow or editing an existing workflow. The properties can be retrieved and optionally modified programmatically during the workflow execution.
Retrieving workflow execution properties:
import sys
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from pyspark.context import SparkContext
glue_client = boto3.client("glue")
args = getResolvedOptions(sys.argv, ['JOB_NAME','WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
workflow_name = args['WORKFLOW_NAME']
workflow_run_id = args['WORKFLOW_RUN_ID']
workflow_params = glue_client.get_workflow_run_properties(Name=workflow_name,
RunId=workflow_run_id)["RunProperties"]
email_to = workflow_params['email_to']
email_body = workflow_params['email_body']
email_subject = workflow_params['email_subject']
file_name_ml_model_inference = workflow_params['file_name_ml_model_inference']
Modifying workflow execution properties:
import sys
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from pyspark.context import SparkContext
glue_client = boto3.client("glue")
args = getResolvedOptions(sys.argv, ['JOB_NAME','WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
workflow_name = args['WORKFLOW_NAME']
workflow_run_id = args['WORKFLOW_RUN_ID']
workflow_params = glue_client.get_workflow_run_properties(Name=workflow_name,
RunId=workflow_run_id)["RunProperties"]
workflow_params['email_subject'] = 'Coupon: Grab and go!'
glue_client.put_workflow_run_properties(Name=workflow_name, RunId=workflow_run_id, RunProperties=workflow_params)
Workflow nodes
Edit Workflow
Workflow metadata can be changed, nodes can be added/deleted from workflow by clicking edit workflow button.
Run Workflow
Amorphic workflows can be triggered on-demand or based on a schedule. Run Workflow button can be found on workflow details page.
On-demand execution
Workflow can be triggered on demand using run button and executions are listed under Executions tab as shown below:
Scheduled execution
A schedule can be created to trigger workflow periodically. Schedule can be enabled/disabled anytime.
Stop Workflow execution
Workflow execution can be stopped by using the 'Stop Execution' option in more options icon (... vertical ellipses)
Once the workflow execution is stopped successfully, user can again restart the workflow execution using the 'Resume Execution' option in more options icon (... vertical ellipses)
Flexibility to trigger nodes
With this latest feature users will be able to choose whether a node runs on preceding node's success or failure. This feature provides much needed fexibility to solve wide vareity of use cases. A simple use case like triggering email node on failure of an etl job to ochestrating complex etl workflows are all possible use cases.
The following graphic shows how a sample workflow is created: Here SendPromotionalEmails etl job is confured to run after success of ReadCustomerDetails. If ReadCustomerDetails job fails an email node called FailureAlertGenerator fires up emails to concerned entities. In the following workflow execution since the job ReadCustomerDetails failed the email node FailureAlertGenerator got triggered and job SendPromotionalEmails stays in not_started state.
Below graphic shows a sample complex etl process: In the above workflow:
- node_one runs when workflow starts.
- node_two runs only when all the following cases are true: node_one succeeds.
- node_three runs only when all the following cases are true: node_two succeeds.
- node_four runs only when all the following cases are true: node_six succeeds, node_two succeeds and node_three fails.
- node_five runs when workflow starts.
- node_six runs only when all the following cases are true: node_five fails and node_one succeeds.
All existing workflows and workflow executions will work as usual without user intervention. Users will be able to edit existing workflows and setup node triggers of their choice.
Execution Logs
Amorphic supports retrieval of node logs for only nodes of type ETL Job. Logs are available to download from execution details of the workflow. User can download execution output logs(if any) and error logs(if any) through more (3 dots) option. The logs option is of 3 types.
If the user opts to download the full logs then it initiates the log file creation and the status will be 'triggered'. Status will be changed to 'available' and user will receive the email once log file is created. User can download the full logs using the same 'Output Logs (All)' option.
- Output Logs (latest 1 MB): The latest 1 MB of the output logs for the job execution. This option will download the latest 1 MB of output logs immediately. If logs are not available, It'll display that 'No output logs available for the execution' message.
- Output Logs (All): All of the output logs for the job execution. This option will initiate log file creation. If there are no output logs then the log file will be empty.
- Error Logs: Error logs for the job execution. If logs are not available, It'll display that 'No error logs available for the execution' message.
View Workflow executions
All the executions of a workflow are listed under "Executions" tab in workflow details page.
Clicking on execution details shown details of a particular execution like visual workflow, execution statistics, execution status of each node.
Node level details like node execution time, error messages, workflow Id (in case of workflow node), start time, end time etc., can be found by clicking on more details in the node list grid below the visual workflow. Also, ChildResourceName will be displayed as WorkflowName and ChildResourceExecutionId as WorkflowExecutionId.
List Workflows
Users will be able to see the list of workflows they have access to. They can also limit the results shown per page using Results Per Page option, and can sort the them based on desired field and its order.
View Workflow Details
Authorized Users
This tab shows the list of users authorized to perform operations on the workflows. The owner, user who created or have owner access to the workflow, can provide workflow access to any other user in the system.
There are two type of access types:
Access Type | Description |
---|---|
Owner | This User has permissions to edit the workflow and provide access to other user for the workflow. |
Read-only | This user has limited permission to worlflow, such as view the details of the selected workflow. |
Authorized Groups
This tab shows the list of groups authorized to perform operations on workflows. A group is a list of users given access to a resource. Groups are created by going to User Profile -> Profile & Settings --> Groups
There are two type of access types:
Access Type | Description |
---|---|
Owner | This group of users has permissions to edit the resources and provide access to other user/groups for the resources. |
Read-only | This group has limited permission to resources, such as view the details. |
Clone Workflow
User can clone a workflow in Amorphic by clicking on clone button on the top right corner of the workflow Details page.
Clone workflow page auto-populates with the metadata of workflow from which it is being cloned, reducing the effort to fill every field required for registering the workflow.
The only field user needs to input/change is the "Workflow Name", as workflow with the existing workflow Name can not be created. User can edit any field if he wants to before clicking the "submit" button at the bottom right corner of the form.
Below is the graphic pointing to the populated fields in clone workflow form.
Once the user clicks the "Submit" button, a new workflow will be created. The created workflow will show up in the workflows page.
Delete Workflow
Workflow can be deleted using the "Delete" (trash) icon on the right corner of the page. Once workflow deletion is triggered, it'll immediately delete all the related metadata.