External API Connections
From version 2.2, encryption(in-flight, at-rest) for all jobs and catalog is enabled. All the existing jobs(User created, and also system created) were updated with encryption related settings, and all the newly created jobs will have encryption enabled automatically.
External API connections are used to import data from APIs to Amorphic Dataset. Only API Authentication of type BASIC is supported as of now.
Below are the ways to create an External API
BASIC
To create an External API connection, user has to enter API Endpoint, HTTP Method and Query String Parameters. Below image shows how to create an External API Connection
Attribute | Description |
---|---|
Connection Name | Name of the connection Amorphic |
Connection Type | Type of connection. In this case it is ExternalAPI |
Description | Connection related information user wants to store |
Authorized Users | Amorphic users to whom user wants to have access to this connection |
API Endpoint | Endpoint URL from which data needs to be extracted |
API Authentication | As of version 1.1.3 only BASIC is supported |
Method | HTTP Method, as of version 1.1.3 only GET and POST are allowed |
Query Params | Query string parameters which the API URL takes as input. |
Version | Enables the user to select what version of ingestion scripts to use (Amorphic specific). For any new feature/Glue version that gets added to the underlying ingestion script, new version will be added to the Amorphic. |
Additionally, the timeout for the ingestion process can be set during connection creation by adding a key IngestionTimeout to ConnectionDetails in the input payload. The value should be between 1 and 2880 and is expected in minutes. If the value is not provided the default value of 480(8hours) would be used. Please note that this feature is available exclusively via API.
{
"ConnectionDetails": {
"url": "https://example.com/datafile.csv",
"auth_mechanism": "basic",
"query_parameters": {},
"method": "GET",
"IngestionTimeout": 222
},
}
This timeout can be overridden during schedule creation and schedule run by providing an argument MaxTimeOut.
External API details
In the details page, Estimated Cost of the Connection is also displayed to show the approximate cost incurred since creation.
Edit
There is an option to edit an External API Connection. To edit an External API Connection, click the edit button which on the right corner.
Description and Authorised users of an External API Connection can be changed.
Upgrade
Users have the option to upgrade a connection if it's available, and this upgrade option will be displayed in the available options. The upgrade option is visible only when a new version is available; otherwise, it won't be shown.
Connection upgrade, upgrades the underlying Glue version and the data ingestion script with new features.
Downgrade
Users have the capability to downgrade a connection to a previous version if they believe the upgrade isn't meeting their requirements. It's important to note that a connection can only be downgraded if it has previously been upgraded. For connections that have been newly created, the option to downgrade is not available. If a connection is compatible with downgrading, you will find the downgrade option in the top right corner.
Deletion
In the upper right corner, there is a button featuring an icon of a trash can. Click on it to delete.
Connection Versions
1.1
In this version of external api connections, we added auto-reload feature for datasets of type reload.
From this version onwards, data reloads process will trigger automatically as soon as the file upload finishes through the external api connections. So that users don't need to manually trigger reload process after completion of file upload when ingesting data through external api ingestion connection.
1.2
In this version we made code changes in the underlying glue script for the support dataset custom partitioning.
From this version onwards, the data will be loaded into into a S3 LZ with the prefix containing the partition key(if you specified any) for the targets which supports dataset partitioning.
Eg. For the partition keys KeyA
, KeyB
with the values ValueA
, ValueB
respectively, the S3 prefix will be in the format
Domain/DatasetName/KeyA=ValueA
/KeyB=ValueB
/upload_date=Unix_Timestamp/UserName/FileType/.
To understand more about custom data partitioning, read the docs about dataset custom partitioning here.
1.3
In this version of external API connection, we added support of skip LZ feature.
This feature enables users to directly upload data to the data lake zone by skipping the data validation. Please refer Skip LZ related docs for more details.
1.4
No major changes were made to the underlying glue script or design, but the logging has been enhanced.
1.5
The update in this version is specifically to ensure FIPS compliance, with no changes made to the script.