Skip to main content
Version: v2.7 print this page

Schedules

Amorphic Schedule is for automating data ingestion, you can schedule batch and streaming data ingestion on regular basis. This eliminates the need for manual intervention and ensures that data is always up-to-date. You can set up custom schedules based on your specific needs.

How to create a Schedule?

Schedules Home Page

Click on + New Schedule to create schedule and fill in the information shown below.

TypeDescription
Schedule NameA unique name that identifies schedules' specific purpose
Job TypeYou can pick a specific type type from the dropdown list (Details given in the Job type table below)
Schedule TypeThere are are two schedule types
  • Time based - You can execute the schedules based on specific time as per requirement
  • On Demand - Run schedules as per the need.
Schedule expressionTime based schedules require a schedule expression. i.e., Every 15 min, daily, etc.
info

If the schedule job type is 'Data Ingestion' and the dataset is of 'reload' type, then schedule execution will load the data and reload the data automatically.

TypeDescription
ETL JobThis option is used to schedule an ETL job.
JDBC CDCThis option is utilized to synchronize data between a data warehouse and S3 for tasks related to Change Data Capture (CDC). It's important to note that only tasks with the "SyncToS3" option set to "yes" will be visible and can be scheduled.
Data IngestionThis option is used to schedule a data ingestion job for normal JDBC, S3 and external API connections.
JDBC FullLoadThis option is used to schedule a JDBC Bulk Data Load full-load task.
Forecast PredictorsThis option is used to schedule a forecast predictor.
Forecast ReportsThis option is used to schedule a forecast report.
WorkflowsThis option is used to schedule a workflow.
HCLS-StoreThis option is used to schedule an import job for Healthlake Store, Omics Storage: Sequence Store, Omics Analytics: Variant Store, Annotation Store, HealthImaging store
Health Image Data ConversionThis option is used schedule a job which converts DICOM files in a dataset to NDJSON format and store it in a different dataset.
  • Data Ingestion

    This type schedule is used to schedule a data ingestion job for normal data load connection type. The supported arguments for this schedule are

    • For JDBC connection schedules

      • NumberOfWorkers : This parameter specifies the number of worker nodes allocated for your Glue job. Allowed values are between 2 and 100.
      • WorkerType: This parameter specifies the type of worker (computing resources) you want to use for the jobs. The worker type determines the amount of memory, CPU, and overall processing power allocated to each worker. Allowed values are Standard, G.1X, G.2X only.
      • query : User can use this argument to specify a SELECT SQL query and ingest the data from source database retrieved by that SQL command.
      • prepareQuery : This argument is used to is specify a prefix that will form the final sql query together with query argument. This argument offers a way to run such complex queries. Read here for more information.
    • For S3 and Ext-API connection schedules

      • MaxTimeOut : Can be provided during creation to override the timeout setting of the connection for the specific schedule. It accepts values from 1 to 2880.
      • MaxCapacity : MaxCapacity is a parameter that defines the number of AWS Glue data processing units (DPUs) that can be allocated when the job runs. Allowed values are 1 and 0.0625 only.
      • FileConcurrency : This argument is unique to S3 connection. This argument determines the no. of parallel file uploads that happens during S3 ingestion.
info

If the dataset is of 'reload' type then schedule execution will load the data and also reload the data automatically.

  • Health Image Data Conversion

    This type of schedule job is used to convert DICOM files in a dataset to NDJSON format in order to upload in to Healthlake store. Healthlake store only support NDJSON file formats while importing data. Input dataset of these jobs are the datasets which contains DICOM files. User have to specify output dataset id in arguments with key outputDatasetId and its value should be id of a valid s3 other type dataset. Converted NDJSON files will be stored into the specified output dataset. An optional argument selectFiles with value all will select all files in the input dataset during data conversion. Default value of this key will latest which only selects the files in the dataset that are uploaded after last job run during data conversion.

Schedule details

Schedule details

Once you have created a schedule, you can view it on the schedules listing page, and perform various actions on it, such as running, disabling, enabling, editing, cloning, or deleting the schedule.

Run Schedule

Schedule run

To schedule a job, you can utilize the Run Schedule option located in the top right corner of the page. After running the schedule, you can review its status in the Execution Status tab. This tab will indicate whether the job is currently running, or if it has completed either successfully or with a failure.

Schedule execution

info
  • Schedule execution will error out if the related S3 connection is using any of Amorphic S3 buckets as source. For ex: <projectshortname-region-accountid-env-dlz>
  • For Data Ingestion Schedules, the following arguments can be provided during schedule runs:
    • MaxTimeOut: This argument allows users to override the timeout setting of the connection for the specific run. It accepts values from 1 to 2880.
    • FileConcurrency: This argument enables users to configure the number of parallel file ingestion that occur for S3 connections. It accepts values from 1 to 100 and has a default value of 20.

Schedule use case

When the schedule execution is completed, an email notification will be sent out, based on the notification setting and schedule execution status. You can also view the execution logs of each schedule run, which includes Output Logs, Output Logs (Full), and Error Logs.

For example, if you need to create a schedule that runs an ETL job and sends out important emails every 4 hours, you can create a workflow with an ETL Job Node followed by a Mail Node. This workflow can then be scheduled to run every 4 hours, every day.

Schedule details