Skip to main content
Version: v1.13 print this page

Files

Upload

Based on the Update Method selected at the time of dataset creation there are different ways to add files to dataset. There are three types of update methods available for datasets with target location AuroraMySQL and Redshift and two types of update methods for datasets with target location S3 and S3-Athena

  • Reload: When reload is selected as update method, every time a new files are added the old files get deleted. Once files are uploaded, user has options to process the files or discard them.

    Below images shows the file upload mechanism:

    Reload File Upload

  • Append / Latest Record: When Append or LatestRecord is selected as update method, the newly added files are appended to the already present files in the dataset

    Below image shows the file upload mechanism for dataset with update method Latest Record and Append:

    Append/LatestRecord File Upload

User can also custom partition the data. Please check the documentation on Dataset Custom Partitioning.

If the Skip LZ (Validation) Process is enabled (True/Yes) for the specific dataset then file upload process will skip the whole validation (LZ) process and file gets directly uploaded to DLZ bucket. This will avoid unnecessary s3 copies and validations. It'll auto-disable MalwareDetection and IsDataValidationEnabled (for S3Athena and Lake formation datasets) functionality. It is applicable to only append and update type of datasets.

Note

As of Amorphic 1.13, This is ONLY applicable to dataset file upload process through Amorphic UI (Manual file upload) and ETL (file write) process. Not applicable to other file upload scenarios like Ingestion, Streams, Bulkloadv2, Appflow. This SkipLZ feature will be implemented eventually for other scenarios in the upcoming releases.

Truncate

All the files in the dataset will be deleted when truncate is selected. This functionality is applicable only to datasets of target location S3, S3-Athena.

Below images shows the functionality of truncate dataset:

Truncate Dataset

Delete

This option allows the user to delete set of files from a dataset. Files deleted can be restored using Restore functionality. This functionality is applicable only to dataset of target location S3, S3-Athena.

Below images shows the functionality of delete dataset files:

Delete Dataset Files

Note

All the files which are marked deleted will be permanently deleted in four weeks and the action is completed with eventual consistency. This action removes the file data along with its metadata and cannot be undone. Users can check the time remaining on the file under Deleted files section to take any subsequent action required.

Restore

This functionality allows the user to restore the files that were deleted from the dataset. Applicable to dataset of target location S3, S3-Athena.

Below image shows the functionality of restoring deleted files:

Restore Dataset Files

Restore files from Archive

This functionality allows the user to restore the files which are stored in the archival storage classes like Glacier and Deep Archive. Applicable to dataset of target location S3, S3-Athena and Lakeformation.

Below attributes are required to restore the archived files:

  • File Copy Type: Copy type of the restored file.
    • Temporary: Temporary copy will be available for specified number of days (restoration expiration).
    • Permanent: Restored file will be copied permanently to Standard storage class and will be always available.
  • Restore Expiration Days: Number of days for which temporary copy of the file should be available for use/download. Applicable only when File copy type is temporary.
  • Retrieval Option: Options to retrieve when restoring an archived object. Time and price to restore an archived object depends on the option. Please check Object retrieval options for more details.
Note

Temporarily restored objects cannot be queried. Please refer AWS Athena limitations.

Also, User cannot query or create views on transition related datasets.

Below image shows the functionality of restoring files from archive:

Restore Dataset Files

Permanent Delete

This option allows the user to delete files from dataset permanently. Applicable to dataset of target location S3, S3-Athena

Below image shows the functionality of deleting files permanently:

Premanent Delete Files