Version: v2.4
ML Models
Amorphic's ML Models portal is a tool that helps you create and import machine learning models using Amazon Web Services (AWS) SageMaker. The resulting machine learning model object can be used to make predictions on datasets.
How to Create a ML Model?
To create ML Models:
- Click on
+ New ML Model
- Fill in the details shown in the table:
Attribute | Description |
---|---|
Model Name | This is the model name in the Amorphic portal. |
Description | Describe the notebook purpose and important details. |
Model Resource | There are three ways to integrate Sagemaker model with the Amorphic portal |
Existing Model Resource | To import a model from the Amazon SageMaker Marketplace into the Amorphic Portal, submit a request to the administrator. The administrator will create a support ticket for the AWS Marketplace model using support@amorphicdata.com. The Amorphic team will ensure that the model is then available for selection. |
Artifact Location | You can use Notebooks to create models in an S3 location. To upload a model file directly from this location, refer to the Notebook section for the respective bucket details. |
Select file | by selecting this option you can upload a SageMaker model tar file directly into the Amorphic portal. You can upload any tar or tar.gz file into the portal. |
Output Type | You have two options: Dataset Data or Metadata. Select Dataset Data when you need to run a model on a dataset file. Select Metadata when you want to view AI/ML Results, such as metadata on dataset files (which will be explained later). Most of the time, you should use Dataset Data. |
Dataset Data would require two additional inputs - Input and Output
Schema.
Attribute | Description |
---|---|
Input Schema | This schema is used to identify the schema of the dataset on which the pre-processing type of ETL job or model will be run. |
Output Schema | This schema identifies the dataset where the post-processing type of job or model output will be saved. |
Both the schema should have the same following format matching the
respective Datasets:
[
{
"type": "Date",
"name": "CheckoutDate",
"Description": "description"
},
{
"type": "String",
"name": "MajorProdDesc",
"Description": "description"
},
{
"type": "Double",
"name": "counts",
"Description": "description"
}
]
:::info Note
You can import the schema of an Amorphic dataset using the "Import from Dataset" functionality
:::
Attribute | Description |
---|---|
Algorithm Used | The platform currently supports all the major AWS Sagemaker models |
Supported file formats | Select the appropriate file type for predictions. If you need a file format other than the available options, select "Others". This will default to no file type required for batch predictions. Note: if a model is selected as "Others" file type, it can only be run on a "Others" file type dataset. |
Preprocess Glue Job | Select the pre-processing ETL jobs created using Amorphic ETL functionality. |
Postprocess Glue Job | Select the post-processing ETL jobs created using Amorphic ETL functionality. |
Apply Amorphic Model
Once an Amorphic model object is created, you can run a model on a Dataset file in the Amorphic portal by following these steps:
- Select a Dataset in the Amorphic portal.
- Go to the
Files
tab and select the file on which you want to run the model. - Click on the top right options for the file.
- Click on
Apply ML
. - Select the ML-model from the model dropdown. All Amorphic model objects that match the corresponding input schema of the Dataset will be available for selection.
- Select the required instance types. Note: certain AWS Marketplace subscribed models run on specific instance family types.
- Select the Target Dataset. The Datasets matching the output schema of the Amorphic model object will be available for selection.
- Click on “Submit”.
How ML Pipeline works in the Amorphic?
The below figure shows how a typical ML pipeline of Amorphic platform looks like. During the Amorphic model object creation process, the pre-processing and post-processing ETL job functionality provides a way to drag and drop ETL workflows for a smooth user access.