Skip to main content
Version: v2.5 print this page

Workflows

info

From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically.

Amorphic Workflows allow you to organize and visualize complex analytical processes using Amorphic Jobs (ETL), machine learning model inference tasks, email notifications, Textract, Translate, Comprehend, and Medical Comprehend tasks.

The Workflows feature facilitates the management, execution, and monitoring of all its components, enabling the creation of a chain of dependencies (Directed Acyclic Graph) among various types of tasks. These tasks include Jobs, ML model inference jobs, email notifications, Textract, Translate, Comprehend, and Medical Comprehend, allowing for the execution of intricate analytical tasks.

The Amorphic Workflows page offers options to view and create new workflows, and sort through existing workflows using filters such as name, creator, and creation time.

How to create workflows?

Create workflow

To create a node, you can import a pre-existing module provided by Amorphic. Following are the fields needed to create a node:

attributeDescription
Module TypeA module in Amorphic is a pre-set building block for creating nodes, it supports various types of tasks like ETL Jobs, machine learning model inference jobs, email notifications, Textract, Translate, Comprehend, Medical Comprehend, syncing to S3, and File Load Validation.
ResourceBased on the module type selected a list of resources are shown. For example, if the module type ETL Job is selected, all the ETL jobs that a user has access to are displayed for the user to choose from.
Node NameName given to the node for quick and easy identification.
Input ConfigurationsArguments which can be used in the job.

Workflow execution properties

Workflow execution properties are key-value properties that you can define while creating a workflow or editing an existing one. These properties can be retrieved during the workflow execution, and if necessary, the code can be optionally modified based on their values.

Workflow execution properties

Retrieving workflow execution properties:

import sys
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_client = boto3.client("glue")
args = getResolvedOptions(sys.argv, ['JOB_NAME','WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
workflow_name = args['WORKFLOW_NAME']
workflow_run_id = args['WORKFLOW_RUN_ID']
workflow_params = glue_client.get_workflow_run_properties(Name=workflow_name,
RunId=workflow_run_id)["RunProperties"]

email_to = workflow_params['email_to']
email_body = workflow_params['email_body']
email_subject = workflow_params['email_subject']
file_name_ml_model_inference = workflow_params['file_name_ml_model_inference']

Modifying workflow execution properties:

import sys
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_client = boto3.client("glue")
args = getResolvedOptions(sys.argv, ['JOB_NAME','WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
workflow_name = args['WORKFLOW_NAME']
workflow_run_id = args['WORKFLOW_RUN_ID']
workflow_params = glue_client.get_workflow_run_properties(Name=workflow_name,
RunId=workflow_run_id)["RunProperties"]

workflow_params['email_subject'] = 'Coupon: Grab and go!'
glue_client.put_workflow_run_properties(Name=workflow_name, RunId=workflow_run_id, RunProperties=workflow_params)

Workflow nodes

Edit_workflow

View list of workflow nodes

In Amorphic, a node is constructed using a pre-defined module and can encompass various tasks, including ETL jobs, machine learning model inference, email notifications, Textract, Translate, Comprehend, Medical Comprehend, syncing to S3, and File Load Validation. Nodes are interconnected to form a workflow, which represents a sequence of tasks executed in a specific order to accomplish a particular objective. Each node can be individually configured, monitored, and managed. Furthermore, nodes can be dependent on other nodes within the workflow.

Node NameDescription
ETL Job NodeThis node takes up an ETL job and takes in arguments to be used within the job. For instance, if we have an ETL job that finds the highest paying job and its salary, we can use this node to run that job as part of a larger workflow and can also use it to carry out subsequent jobs based on the output.
ML Model NodeThis node is used to run machine learning models and take in arguments to be used within the model.
Email NodeThis node is used to send out an email when arguments for the recipient, subject and body is given.
Textract NodeThis node is used to extract text from documents, images and other types of files.
Rekognition NodeThis node is used to analyze images and videos to detect and identify objects, people, and text.
Translate NodeThis node is used to translate text from one language to another.
Comprehend NodeThis node is used to extract insights and relationships from text.
Medical Comprehend NodeThis node is used for natural language processing of medical text to extract insights and relationships.
Transcribe NodeThis node is used to transcribe audio files into text.
Medical Transcribe NodeThis node is used to transcribe audio files of medical content into text.
Workflow NodeThis node is used to combine previously created workflows and work in parallel or sequentially.
Sync to S3 NodeThis node is used to synchronize data to the S3 storage.
File Load Validation NodeThis node is used to validate and check the data before loading into the system.

Run Workflow

Amorphic workflows can be initiated either immediately or scheduled to run at specific times. If you wish to automate the execution of a workflow at predefined times, you can configure a schedule. You have the flexibility to enable or disable the schedule as needed. Should you need to halt the execution of a workflow that is currently in progress, you can utilize the Stop Execution option available in the menu. This option can be accessed by clicking on the three dots associated with the workflow.

Workflow_execution

Flexibility to trigger nodes

Flexibility to trigger nodes allows you to specify whether a node should run based on the success or failure of the preceding node. This added flexibility enables users to solve a wide range of use cases. For example, you can set up an email node to trigger when an ETL job fails, or create complex ETL workflows.

The following diagram illustrates how this feature can be used in a sample workflow:

In this example, the "SendPromotionalEmails" ETL job is configured to run after the "ReadCustomerDetails" job is successful. If the "ReadCustomerDetails" job fails, an email node called "FailureAlertGenerator" will send emails to the appropriate recipients.

Workflow demonstrating flexible node trigger

In this workflow, when the "ReadCustomerDetails" job failed, the email node "FailureAlertGenerator" was activated and the "SendPromotionalEmails" job remained in the "not_started" state.

Workflow execution demonstrating flexible node trigger

Below graphic shows a sample complex ETL process: Workflow demonstrating complex ETL process In the above workflow:

  • node_one runs when workflow starts.
  • node_two runs only when all the following cases are true: node_one succeeds.
  • node_three runs only when all the following cases are true: node_two succeeds.
  • node_four runs only when all the following cases are true: node_six succeeds, node_two succeeds and node_three fails.
  • node_five runs when workflow starts.
  • node_six runs only when all the following cases are true: node_five fails and node_one succeeds.

All existing workflows and workflow executions will continue to function as usual without requiring manual intervention. Users will have the ability to edit existing workflows and establish node triggers of their choice as needed.

Workflow use case

Workflow usecase

Let's say you want to create a workflow where certain tasks are completed in a specific order, and each task is dependent on the successful completion of the previous task. The workflow consists of three tasks:

  1. Extract text from a picture using the Textract node
  2. Perform data extraction on a text file using the Comprehend Node
  3. Send an email indicating that the work has been completed using an Email Node.

These three tasks are arranged in a sequential manner, which implies that the execution of each task relies on the successful completion of the preceding task. Specifically, the Textract node must finish without encountering errors before the Comprehend node can be initiated, and in turn, the Comprehend node must successfully complete before the Email node can be triggered. This arrangement guarantees that the workflow progresses only when each preceding task has executed successfully.

Execution Logs

Download workflow execution logs

Amorphic allows you to retrieve execution logs for ETL Job nodes. You can download these logs from the execution details page of the workflow. The "more" (3 dots) option provides three types of log options:

AttributeDescription
Full LogsThis option initiates the creation of a log file and the status changes to "triggered".
Output Logs (latest 1 MB)This option immediately downloads the latest 1 MB of output logs for the job execution.
Output Logs (All)This option initiates the creation of a log file containing all of the output logs for the job execution.
Error LogsThis option allows to download the error logs for the job execution. Please note that the log options are only available for nodes of type ETL Job.