Skip to main content
Version: v1.14 print this page

Resource Sync

Resource Sync enables select resources which are created from the AWS console to be synchronized with Amorphic Data Cloud. An automated process will run once per day, to identify the resources created from AWS console and add the metadata of the same in Amorphic. Amorphic relies on tags for identifying the resources created from AWS console, so the mandatory tags for each resource created from AWS console are:

Tag KeyValue
SourceThis field must have the value AWSConsole.
OwnerThis field must define a valid Amorphic user name. Please ensure this user has permissions to create the particular resource in Amorphic.
Note

As of version 1.14, the supported resources are Glue Tables(Datasets), Glue Jobs(ETL Jobs), and Appflow Flows/Connector Profiles(Connections Apps). This is an asynchronous process, so resources created from the AWS console can take upto 24 hours to reflect in Amorphic.

Glue Tables

Currently, Amorphic supports syncing of S3Athena type datasets only. As Glue Tables do not support tags in AWS, Amorphic relies on tags provided in specific formats in the description of the Glue Table to identify the tables to be synced. For synchronization of Glue Table resources created from the AWS console, the following points have to be noted:

  1. Tables created under databases corresponding to existing domains in Amorphic shall only be synced

  2. All tables that have to be synced have to be provided with tags in the description of the tables in the format {source: awsconsole, owner:<valid_amorphic_user_id>}. The tags are case insensitive. New line is not supported in between the tags and no other tags except source and owner are supported.

  3. The provided user should have enough permissions in Amorphic to create a new dataset and also should have owner permissions to the provided domain.

  4. The S3 bucket provided should be the Amorphic DLZ bucket and the prefix must follow the format /<domain_name>/<dataset_name>/.

  5. The table should be partitioned with upload_date as the partition key. Users can add more partition keys if required, but upload_key should be the last partition key.

  6. The Table Update property of synced datasets will be Append and other options like Data Profiling, Data Validation, Malware Detection etc... will be defaulted to Disabled and these options cannot be edited later.

  7. Editing of synced datasets is not supported currently, however it can be deleted from the Amorphic console. Additionally, Dataset Repair operation can also be performed on synced datasets.

  8. Any edits in the Glue table after the table has been synced to Amorphic will not be synced.

  9. If a Glue Table is deleted from the AWS Console:

    • If the Glue Table was created from the AWS console as well, then its metadata will be removed from Amorphic as well.
    • If the Glue Table was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Glue Table synchronization. This email will contain the Id, Type, Name and Owner of the resource as well as an error message. The possible causes could be:

  • Tagged Owner does not exist or does not have the roles with sufficient permissions
  • Tagged Owner does not have access to the domain under which the dataset is to be created.
  • Deletion of Amorphic created Jobs from AWS Console

Glue Jobs

Glue Jobs which are synced to Amorphic, can be executed as well as deleted from Amorphic. However, updation of any sort(Edit Job Details, Edit Script, Update External Libs, Update Extra Resource Access) of the same is not supported. Also updating Glue Jobs from AWS Console after Sync is not recommended. For successful synchronization of Glue Job resources created from the AWS console, the following points have to be noted:

  1. While creating a new Glue Job from AWS Console:

    • It must be tagged with the aforementioned tags.
    • Only Amorphic created S3 buckets should be used for Script path and Temporary path.
    • Libraries and Job Parameters are not supported.
  2. All the Job Runs triggered from AWS Console also get synchronized.

  3. If a Glue Job is deleted from the AWS Console:

    • If the Glue Job was created from the AWS console as well, then it's metadata will be removed from Amorphic as well.
    • If the Glue Job was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Glue Job synchronization. This email will contain Id, Type, Name and Owner of the resource as well as an error message. The possible causes could be:

  • Incorrect/Missing tags
  • Tagged Owner does not exist or do not have the roles with sufficient permissions
  • Deletion of Amorphic created Jobs from AWS Console

Appflow Flows/Connector Profiles

Appflow Flows which are synced to Amorphic can be executed as well as deleted from Amorphic. However, updation of the same is not supported. For synchronization of Appflow resources created from the AWS console, the following scenarios are possible and appropriate action must be taken:

  1. When creating a new Appflow Flow, the following needs to be ensured:

    • It must be tagged with the aforementioned tags.
    • Currently, Amorphic only supports S3 as destination for Appflow Flows, so the destination must be selected as S3 in AWS Console as well.
    • Either the DLZ or LZ bucket can be selected as the S3 bucket, depending on whether desired dataset has SkipLZProcess = true or false.
    • The bucket prefix must follow the format: domain_name/dataset_name.
  2. If a new Appflow Connector Profile is also created, then Amorphic will synchronize the same provided the associated flows were correctly tagged.

  3. If a Flow/Connector Profile is deleted from the AWS Console:

    • If the Flow/Connector Profile was created from the AWS console as well, then it's metadata will be removed from Amorphic as well.
    • If the Flow/Connector Profile was created from Amorphic, then an email will be sent to all admins as this is not recommended.

Admins will be notified via email about any errors that occur during Appflow Flows/Connector Profiles synchronization. This email will contain Id, Name and Owner of the resource as well as an error message. The possible causes could be:

  1. Incorrectly tagged flows. For example, tagging flow with an Owner who has no access to create Connection Apps resources in Amorphic.
  2. Specifying an incorrect dataset as the destination for the flow. The S3 bucket and prefix should point towards a dataset the user has access to.

Apart from this, as mentioned above, if an Amorphic flow/connector profile is deleted from the AWS console, that will also trigger an email alert to all admins.