Skip to main content
Version: v2.5 print this page

Schedules

Amorphic Schedule is for automating data ingestion, you can schedule batch and streaming data ingestion on regular basis. This eliminates the need for manual intervention and ensures that data is always up-to-date. You can set up custom schedules based on your specific needs.

How to create a Schedule?

Schedules Home Page

Click on + New Schedule to create schedule and fill in the information shown below.

TypeDescription
Schedule NameA unique name that identifies schedules' specific purpose
Job TypeYou can pick a specific type type from the dropdown list (Details given in the Job type table below)
Schedule TypeThere are are two schedule types
  • Time based - You can execute the schedules based on specific time as per requirement
  • On Demand - Run schedules as per the need.
Schedule expressionTime based schedules require a schedule expression. i.e., Every 15 min, daily, etc.
info

If the schedule job type is 'Data Ingestion' and the dataset is of 'reload' type, then schedule execution will load the data and reload the data automatically.

TypeDescription
ETL JobThis option is used to schedule an ETL job.
JDBC CDCThis option is utilized to synchronize data between a data warehouse and S3 for tasks related to Change Data Capture (CDC). It's important to note that only tasks with the "SyncToS3" option set to "yes" will be visible and can be scheduled.
Data IngestionThis option is used to schedule a data ingestion job for normal JDBC, S3 and external API connections.
JDBC FullLoadThis option is used to schedule a JDBC Bulk Data Load full-load task.
Forecast PredictorsThis option is used to schedule a forecast predictor.
Forecast ReportsThis option is used to schedule a forecast report.
WorkflowsThis option is used to schedule a workflow.
HCLS-StoreThis option is used to schedule an import job for Healthlake Store, Omics Storage: Sequence Store, Omics Analytics: Variant Store, Annotation Store, HealthImaging store
Health Image Data ConversionThis option is used schedule a job which converts DICOM files in a dataset to NDJSON format and store it in a different dataset.

Health Image Data Conversion

This type of schedule job is used to convert DICOM files in a dataset to NDJSON format in order to upload in to Healthlake store. Healthlake store only support NDJSON file formats while importing data. Input dataset of these jobs are the datasets which contains DICOM files. User have to specify output dataset id in arguments with key outputDatasetId and its value should be id of a valid s3 other type dataset. Converted NDJSON files will be stored into the specified output dataset. An optional argument selectFiles with value all will select all files in the input dataset during data conversion. Default value of this key will latest which only selects the files in the dataset that are uploaded after last job run during data conversion.

info

If the schedule job type is 'Data Ingestion':

  • An argument 'MaxTimeOut' can be provided during creation to override the timeout setting of the connection for the specific schedule. It accepts values from 1 to 2880.
  • If the dataset is of 'reload' type then schedule execution will load the data and also reload the data automatically.

Schedule details

Schedule details

Once you have created a schedule, you can view it on the schedules listing page, and perform various actions on it, such as running, disabling, enabling, editing, cloning, or deleting the schedule.

Run Schedule

Schedule run

To schedule a job, you can utilize the Run Schedule option located in the top right corner of the page. After running the schedule, you can review its status in the Execution Status tab. This tab will indicate whether the job is currently running, or if it has completed either successfully or with a failure.

Schedule execution

info
  • Schedule execution will error out if the related S3 connection is using any of Amorphic S3 buckets as source. For ex: <projectshortname-region-accountid-env-dlz>
  • For Data Ingestion Schedules, the following arguments can be provided during schedule runs:
    • MaxTimeOut: This argument allows users to override the timeout setting of the connection for the specific run. It accepts values from 1 to 2880.
    • FileConcurrency: This argument enables users to configure the number of parallel file ingestion that occur for S3 connections. It accepts values from 1 to 100 and has a default value of 20.

Schedule use case

When the schedule execution is completed, an email notification will be sent out, based on the notification setting and schedule execution status. You can also view the execution logs of each schedule run, which includes Output Logs, Output Logs (Full), and Error Logs.

For example, if you need to create a schedule that runs an ETL job and sends out important emails every 4 hours, you can create a workflow with an ETL Job Node followed by a Mail Node. This workflow can then be scheduled to run every 4 hours, every day.

Schedule details