Schedules
Amorphic Schedule is for automating data ingestion, you can schedule batch and streaming data ingestion on regular basis. This eliminates the need for manual intervention and ensures that data is always up-to-date. You can set up custom schedules based on your specific needs.
How to create a Schedule?
Click on + New Schedule
to create schedule and fill in the information shown below.
Type | Description |
---|---|
Schedule Name | A unique name that identifies schedules' specific purpose |
Job Type | You can pick a specific type type from the dropdown list (Details given in the Job type table below) |
Schedule Type | There are are two schedule types
|
Schedule expression | Time based schedules require a schedule expression. i.e., Every 15 min, daily, etc. |
If the schedule job type is 'Data Ingestion' and the dataset is of 'reload' type, then schedule execution will load the data and reload the data automatically.
Type | Description |
---|---|
ETL Job | This option is used to schedule an ETL job. |
JDBC CDC | This option is utilized to synchronize data between a data warehouse and S3 for tasks related to Change Data Capture (CDC). It's important to note that only tasks with the "SyncToS3" option set to "yes" will be visible and can be scheduled. |
Data Ingestion | This option is used to schedule a data ingestion job for normal JDBC, S3 and external API connections. |
JDBC FullLoad | This option is used to schedule a JDBC Bulk Data Load full-load task. |
Forecast Predictors | This option is used to schedule a forecast predictor. |
Forecast Reports | This option is used to schedule a forecast report. |
Workflows | This option is used to schedule a workflow. |
HCLS-Store | This option is used to schedule an import job for Healthlake Store, Omics Storage: Sequence Store, Omics Analytics: Variant Store, Annotation Store, HealthImaging store |
Health Image Data Conversion | This option is used schedule a job which converts DICOM files in a dataset to NDJSON format and store it in a different dataset. |
Health Image Data Conversion
This type of schedule job is used to convert DICOM files in a dataset to NDJSON format in order to upload in to Healthlake store. Healthlake store only support NDJSON file formats while importing data.
Input dataset of these jobs are the datasets which contains DICOM files. User have to specify output dataset id in arguments with key outputDatasetId
and its value should be id of a valid s3 other type dataset.
Converted NDJSON files will be stored into the specified output dataset. An optional argument selectFiles
with value all
will select all files in the input dataset during data conversion. Default value of this key will latest
which only
selects the files in the dataset that are uploaded after last job run during data conversion.
If the schedule job type is 'Data Ingestion':
- An argument 'MaxTimeOut' can be provided during creation to override the timeout setting of the connection for the specific schedule. It accepts values from 1 to 2880.
- If the dataset is of 'reload' type then schedule execution will load the data and also reload the data automatically.
Schedule details
Once you have created a schedule, you can view it on the schedules listing page, and perform various actions on it, such as running, disabling, enabling, editing, cloning, or deleting the schedule.
Run Schedule
To schedule a job, you can utilize the Run Schedule option located in the top right corner of the page. After running the schedule, you can review its status in the Execution Status tab. This tab will indicate whether the job is currently running, or if it has completed either successfully or with a failure.
- Schedule execution will error out if the related S3 connection is using any of Amorphic S3 buckets as source. For ex:
<projectshortname-region-accountid-env-dlz>
- For Data Ingestion Schedules, the following arguments can be provided during schedule runs:
- MaxTimeOut: This argument allows users to override the timeout setting of the connection for the specific run. It accepts values from 1 to 2880.
- FileConcurrency: This argument enables users to configure the number of parallel file ingestion that occur for S3 connections. It accepts values from 1 to 100 and has a default value of 20.
Schedule use case
When the schedule execution is completed, an email notification will be sent out, based on the notification setting and schedule execution status. You can also view the execution logs of each schedule run, which includes Output Logs, Output Logs (Full), and Error Logs.
For example, if you need to create a schedule that runs an ETL job and sends out important emails every 4 hours, you can create a workflow with an ETL Job Node followed by a Mail Node. This workflow can then be scheduled to run every 4 hours, every day.