Skip to main content
Version: v2.7 print this page

Resource Sync

Resource Sync enables selected resources, which are created from the AWS console to be synchronized with Amorphic Data Cloud. An automated process will run once per day, to identify the resources created from AWS console and add the metadata of the same in Amorphic. Amorphic relies on tags for identifying the resources created from AWS console, the mandatory tags for each resource created from AWS console are:

Tag KeyValue
SourceThis field must have the value AWSConsole.
OwnerThis field must define a valid Amorphic user name. Please ensure this user has the permissions to create the particular resource in Amorphic.
Note

As of version 2.0, the supported resources are Glue Tables(Datasets), Glue Jobs(ETL Jobs), Appflow Flows/Connector Profiles(Connections Apps), Redshift Tables(Datasets), Data streams(Streams), and Delivery streams(Consumers). This is an asynchronous process, so resources created from the AWS console can take up to 24 hours to reflect in Amorphic.

Glue Tables

Currently, Amorphic supports syncing of S3Athena type datasets only. As Glue Tables do not support tags in AWS, Amorphic relies on tags provided in specific formats in the description of the Glue Table to identify the tables to be synced. For synchronization of Glue Table resources created from the AWS console, the following points have to be noted:

  1. Tables created under databases corresponding to existing domains in Amorphic shall only be synced

  2. All tables that have to be synced have to be provided with tags in the description of the tables in the format {source: awsconsole, owner:<valid_amorphic_user_id>}. The tags are case insensitive. New line is not supported in between the tags and no other tags except source and owner are supported.

  3. The provided user should have enough permissions in Amorphic to create a new dataset and also should have owner permissions to the provided domain.

  4. The S3 bucket provided should be the Amorphic DLZ bucket and the prefix must follow the format /<domain_name>/<dataset_name>/.

  5. The table should be partitioned with upload_date as the partition key. Users can add more partition keys if required, but upload_key should be the last partition key.

  6. The Table Update property of synced datasets will be Append and other options like Data Profiling, Data Validation, Malware Detection etc... will be defaulted to Disabled and these options cannot be edited later.

  7. Synced datasets can be edited or deleted from the Amorphic console. Additionally, Dataset Repair operation can also be performed on synced datasets.

  8. Edits in the Glue table after the table has been synced to Amorphic will be synced back as long as the mandatory tags are present.

  9. If a Glue Table is deleted from the AWS Console:

    • If the Glue Table was created from the AWS console as well, then its metadata will be removed from Amorphic as well.
    • If the Glue Table was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Glue Table synchronization. This email will contain the Id, Type, Name and Owner of the resource as well as an error message. The possible causes could be:

  • Tagged Owner does not exist or does not have the roles with sufficient permissions
  • Tagged Owner does not have access to the domain under which the dataset is to be created.
  • Deletion of Amorphic created Jobs from AWS Console

Glue Jobs

Glue Jobs that are created from the AWS Console and synced to Amorphic can be executed, edited, and deleted from both Amorphic and the AWS Console. However, for jobs created within Amorphic, only job executions are supported from the AWS Console. To ensure successful synchronization of Glue Job resources created from the AWS Console, please take note of the following points:

  1. While creating a new Glue Job from AWS Console:

    • It must be tagged with the aforementioned tags.
    • Only jobs of type Spark and Python Shell are supported.
    • For the job to be edited from Amorphic, the attached role should follow Amorphic naming convention({PROJECT_SHORT_NAME}-custom-*)
    • Editing job script from Amorphic is allowed only for jobs with Script path in Amorphic ETL Bucket
    • Bookmark enabled/paused jobs can be run from Amorphic only if the Temporary path is defined in an Amorphic created bucket
  2. All the Job Runs triggered from AWS Console also get synchronized.

  3. If a Glue Job is deleted from the AWS Console:

    • If the Glue Job was created from the AWS console as well, then it's metadata will be removed from Amorphic as well.
    • If the Glue Job was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Glue Job synchronization. This email will contain Id, Type, Name and Owner of the resource as well as an error message. The possible causes could be:

  • Incorrect/Missing tags
  • Tagged Owner does not exist or do not have the roles with sufficient permissions
  • Deletion of Amorphic created Jobs from AWS Console

Appflow Flows/Connector Profiles

Appflow Flows that are synchronized with Amorphic can be executed, updated, or deleted from either Amorphic or the AWS Console. For the synchronization of Appflow resources created from the AWS console, various scenarios are possible, and the appropriate actions must be taken accordingly:

  1. When creating a new Appflow Flow, the following needs to be ensured:

    • It must be tagged with the aforementioned tags.
    • Currently, Amorphic only supports S3 as destination for Appflow Flows, so the destination must be selected as S3 in AWS Console as well.
    • Either the DLZ or LZ bucket can be selected as the S3 bucket, depending on whether desired dataset has SkipLZProcess = true or false.
    • The bucket prefix must follow the format: domain_name/dataset_name.
  2. If a new Appflow Connector Profile is also created, then Amorphic will synchronize the same provided the associated flows were correctly tagged. However, modification of the same after sync from the AWS Console is not recommended, and any change in metadata will not reflect in Amorphic.

  3. If a Flow/Connector Profile is deleted from the AWS Console:

    • If the Flow/Connector Profile was created from the AWS console as well, then it's metadata will be removed from Amorphic as well.
    • If the Flow/Connector Profile was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Appflow Flows/Connector Profiles synchronization. This email will contain Id, Name and Owner of the resource as well as an error message. The possible causes could be:

  1. Incorrectly tagged flows. For example, tagging a flow with an Owner who has no access to create Connection Apps resources in Amorphic.
  2. Specifying an incorrect dataset as the destination for the flow. The S3 bucket and prefix should point towards a dataset that the user has access to.

Apart from this, as mentioned above, if an Amorphic flow/connector profile is deleted from the AWS console, that will also trigger an email alert to all admins.

Redshift Tables

Amorphic relies on tags in specific formats within the description of Redshift Tables to identify which tables should be synchronized. To ensure the successful synchronization of Redshift Table resources created from the AWS console, please take note of the following points:

  1. Tables created under databases and schemas corresponding to existing tenants and domains in Amorphic shall only be synced.

  2. All tables that have to be synced should have tags in the description of the tables in the format {source: awsconsole, owner:<valid_amorphic_user_id>}. The tags are case insensitive. New line is not supported in between the tags and no other tags except source and owner are supported.

  3. The provided user should have enough permissions in Amorphic to create a new dataset and also should have owner permissions to the provided domain.

  4. The Table Update property of synced datasets will be Append, File Type will be csv, File Delimiter will be , and other options like Data Profiling, Data Validation, Malware Detection etc... will be defaulted to Disabled.

  5. Synced datasets can be edited and deleted from the Amorphic console. Additionally, Dataset Repair operation can also be performed on synced datasets.

  6. Any edits in the Redshift table after the table has been synced to Amorphic will not be synced.

  7. If a Redshift Table is deleted from the AWS Console:

    • If the Redshift Table was created from the AWS console, then its metadata will be removed from Amorphic as well.
    • If the Redshift Table was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Redshift Table synchronization. This email will contain the Id, Type, Name and Owner of the resource as well as an error message. The possible causes could be:

  • Tagged Owner does not exist or does not have the roles with sufficient permissions
  • Tagged Owner does not have access to the domain under which the dataset is to be created.

Streams

Kinesis Data streams that are created, updated, or deleted from the AWS console can be synced to Amorphic. For the successful synchronization of Data streams from AWS console, the following points have to be noted:

  1. While creating a new Data stream from the AWS Console:

    • It must be tagged with the aforementioned tags.
    • The provided user (tagged as "Owner") should have enough permissions in Amorphic to create new streams.
  2. Configuration details

    • Amorphic doesn't create any IAM credentials for the Data stream which is created via AWSConsole; this will show as N/A in Amorphic.
  3. Support bi-directional edits for the Data streams that are created from AWS console.

    • Amorphic provides editing functionality from both Amorphic and AWS console. That is, edits in the Data streams after the stream has been synced to Amorphic will be synced back as long as the mandatory tags are present. The important thing to remember is,
      • Edits made from AWSconsole only synchronize for the Data streams which are also created from AWSConsole.
      • That means, if a user modifies an Amorphic-created Data stream from AWSConsole, that changes will not be reflected in Amorphic, It's not recommended!
  4. If a Data stream is deleted from the AWS Console:

    • If it was created from the AWS console, then its metadata will be removed from Amorphic as well.
    • If it was created from Amorphic, then an email will be sent to all admins as this is not recommended, until the user deletes that Stream manually from Amorphic.

Admins will be notified via email about any errors that occur during Data stream synchronization. This email will contain Id, Name and Owner of the resource, as well as an error message. The possible causes could be:

  1. Incorrectly tagged Stream. For example, tagging a Data stream with an Owner who has no access to create Stream resources or does not exist in Amorphic.

If the cause of any synchronization failure is something that you cannot resolve, please do reach out to Amorphic's technical support team for assistance. Additionally, as previously mentioned, if an Amorphic Stream is deleted from the AWS console, this action will trigger an email alert regarding the inconsistency, which will be sent to all administrators.

Consumers

Kinesis Delivery streams(Consumers) that are created, updated, or deleted from AWSConsole can be synced to Amorphic.

Data streams and Delivery streams are distinct entities in AWS, but in Amorphic, they function as an almost integrated resource. In Amorphic, AWS Delivery streams are referred to as Consumers, and they are displayed within their associated Streams. It's important to note that if a stream does not exist in Amorphic, all associated consumers will also be absent. For successful synchronization of Delivery stream resources created from the AWS console, the following points have to be noted:

  1. While creating a new Delivery stream from the AWS Console:

    • It must be tagged with the aforementioned tags.
    • The provided user (tagged as "Owner") should have enough permissions in Amorphic to create a new Consumer and also should have owner permissions on its associated Stream.
    • If its associated Data stream was also created via AWSConsole, that should be properly tagged and synced to Amorphic.
  2. Configuration details

    • Any Delivery stream created with the proper tags will be reflected in Amorphic, but since Amorphic only supports target locations of S3 and Redshift types, the configuration details of synchronized Delivery streams other than these target locations will only show as N/A.
    • If streamed data need to be visible in an Amorphic dataset, the destination S3 bucket provided should be the Amorphic LZ or DLZ bucket, also:
      • The prefix must follow the format for the LZ bucket: domain_name/dataset_name/upload_date=epoch/user_id/file_type/random-string=!{timestamp:yyyy-MM-dd-HHmm}/.
      • The prefix must follow the format for the DLZ bucket: domain_name/dataset_name/upload_date=epoch/user_id/filename_with_extension/random-string=!{timestamp:yyyy-MM-dd-HHmm}/.
      • The error output prefix format like: domain_name/dataset_name/upload_date=epoch/user_id/file_type/error-type=!{firehose:error-output-type}/
    • If the dataset name wasn't provided, or the provided one couldn't be fetched from Amorphic, then this will also show as N/A in Amorphic.
  3. Support bi-directional edits for the Delivery streams that are created from AWS console.

    • Amorphic provides editing functionality from both Amorphic and the AWS console. That is, edits in the Delivery streams after it has been synced to Amorphic will be synced back as long as the mandatory tags are present. The important thing to remember is,
    • Edits made from AWSconsole only synchronize for the Delivery streams which are also created from AWSConsole.
    • That means, if a user modifies an Amorphic-created Delivery stream from AWSConsole, that changes will not be reflected in Amorphic, It's not recommended!
    • If the Delivery stream was created via AWS Console,
      • its associated dataset won't be able to edit from Amorphic.
      • it can only be edited via Amorphic if its target location is S3.
  4. If a Delivery stream is deleted from the AWS Console:

    • If it was created from the AWS console, then its metadata will be removed from Amorphic as well.
      • Users must create or select an IAM role when creating a Delivery stream from AWSConsole, this role won't be deleted by Amorphic, user will have to manually delete it from IAM.
    • If it was created from Amorphic, then an email will be sent to all admins as this is not recommended, until the user deletes that Consumer manually from Amorphic.

Admins will be notified via email about any errors that occur during Delivery stream synchronization. This email will contain Id, Name and Owner of the resource, as well as an error message. The possible causes could be:

  1. Incorrectly tagged Consumer. For example, tagging a Delivery stream with an Owner who has no access to create Consumer resources or does not exist in Amorphic.
  2. The user does not have sufficient permissions to access the amorphic dataset, which was specified in the destination configuration when the consumer was created. The user must have owner permissions on the dataset.
  3. The associated stream of this Consumer does not yet exist in amorphic due to incorrect tags or some other reason.
  4. The associated stream for this Consumer does not even exist in AWS. Then delete this Delivery stream from AWSConsole.

If you encounter a synchronization failure due to an issue you cannot resolve, please don't hesitate to reach out to Amorphic's technical support team for assistance. Furthermore, as previously mentioned, if an Amorphic Consumer is deleted from the AWS console, this action will trigger an email alert regarding the inconsistency, which will be sent to all administrators.

Reference

ComponentSub - ComponentIdentifierExampleComments
CreateUpdateDelete
Glue JobsTagsTags&TimestampTagsAttach tags as per the above tableIt has to be tagged properly. For Edits, checking the LastModifiedOn parameter from its boto3 call.
Job ExecutionsN/AN/AN/AN/AIt is dependent on the Glue Job. For each Glue Job, Amorphic simply compares its job runs to AWSConsole job runs and then syncs.
Appflow(Connector profile)N/ANot supportedN/AN/ATags are not required; it is dependent on Flows. Edits from AWSConsole after syncing will not be synced.
FlowsTagsTags&TimestampTagsAttach tags as per the above tableFor Edits, checking the lastUpdatedAt parameter from its boto3 call.
DatasetsGlue table
(S3Athena)
TagsTags&TimestampTags{source: awsconsole, owner:<valid_amorphic_user_id>}Tags should be provided in specific formats in the description. For Edits, checking the LastModifiedOn parameter from its boto3 call.
RedshiftTagsNot supportedTags{source: awsconsole, owner:<valid_amorphic_user_id>}Tags should be provided in specific formats in the description. Edits from AWSConsole after syncing will not be synced.
StreamsTagsTags onlyTagsAttach tags as per the above tableStream doesn't have any update timestamp parameter in its description. So syncing its metadata every time.
Consumer
(DeliveryStream)
TagsTags&TimestampTagsAttach tags as per the above tableFor Edits, checking the UpdateTimeStamp parameter from its boto3 call.