Skip to main content
Version: v1.14 print this page

Dataset Lifecycle (Beta)

Dataset Lifecycle policy helps to manage your objects so that they are stored cost effectively throughout their lifecycle. Users can easily control the transition and expiration of objects present in dataset with the help of set of rules that define actions that Amazon S3 applies to a group of objects. By default all objects present in dataset are stored in S3 Standard storage class. There are many other storage classes supported by S3 which helps to store objects in cost efficient manner.

There are two types of rules that may be defined if lifecycle policy is enabled for a dataset:

  • Transition Rules -> User can specify the number of Days after which objects present in dataset can be transitioned to a given StorageClass as per its usage. The transition time for objects is estimated from the time they are uploaded to dataset.
  • Expiration Rules -> User can specify the number of Days after which objects present in dataset will expire permanently. The expiration time for objects is estimated from the time they are uploaded to dataset.

Important Points

  • Lifecycle rules can be applied only for 1000 datasets throughout the application. In case you exceed the limit, the only option is to delete(not disable) the lifecycle policy for a particular dataset before enabling a new lifecycle policy.
  • Restored files after temporary-delete in dataset are treated as new objects uploaded to dataset. The metadata(UploadDate) of restored file is changed, hence the lifecycle rule is going to be applied from the date the file was restored and not from when it was originally uploaded to dataset. Even the storage class of file won't be retained after temporarily deleting the file.
  • A file can be transitioned only in one direction with respect to storage classes. It cannot be re-transitioned to previous storage class. Please refer to this AWS Documentation for a better understanding.
  • Files transitioned to GLACIER/DEEP_ARCHIVE storage class cannot be retrieved/accessed from S3 bucket through an ETL Job script or Amorphic-UI. Hence user won't also be allowed to temporarily-delete files from dataset which are present in GLACIER/DEEP_ARCHIVE storage class. However user will be allowed to permanently delete those files from dataset or truncate dataset.
  • If the size of an object is less than 128 KB, it is not monitored and not eligible for auto-tiering. Smaller objects are always stored in the Frequent Access tier. For information on S3 Intelligent-Tiering, see S3 Intelligent-Tiering access tiers.
  • The S3 Standard-IA and S3 One Zone-IA storage classes are suitable for objects larger than 128 KB that you plan to store for at least 30 days. If an object is less than 128 KB, Amazon S3 charges you for 128 KB. If you delete an object before the end of the 30-day minimum storage duration period, you are charged for 30 days. For pricing information, see Amazon S3 pricing.
  • Currently we are not giving the functionality to enable lifecycle policy for bulkload v1 type of connections since it creates a new dataset. In order to enable lifecycle policy for the new dataset being created, user will need to manually edit the respective dataset.
  • When the LifeCyclePolicyStatus for a dataset is in Enabling/Disabling/Deleting state, then files cannot be deleted from the respective dataset. Even truncate feature of dataset won't work. Please wait for lifecycle policy to be applied before deleting any files from dataset.

Enable/Disable Dataset Lifecycle Policy

The user can enable or disable lifecycle policy of a dataset at any given point of time as shown below.

Enable dataset lifecycle policy

Following are the fields derived in a dataset lifecycle policy:

  • Enable Life Cycle Policy: Select Yes for enabling or No for disabling lifecycle policy of dataset. If a lifecycle policy is enabled, then the user must provide either Expiration Days or Transition Rules or both.
  • Expiration Days: Number of days after which file should be deleted from dataset from the upload date/time of file.
  • Transition Rules: Rules for transitioning file to a given storage class after specified number of days from the upload date/time of file.

Delete Dataset Lifecycle Policy

The user can delete lifecycle policy of a dataset at any given point of time as shown below by clicking on Delete Life-Cycle Policy button.

Delete dataset lifecycle policy

Notification Alerts & Error Handling

In case some error occurs while enabling/disabling/deleting lifecycle policy, an ErrorMessage will be displayed on dataset details page under Life Cycle Policy Details section and the policy will be reverted back to its previous state. Moreover if the user has subscribed to email alerts in Amorphic application, user will receive an email on completion of each operation(enable/disable/delete) performed on dataset pertaining to lifecycle policy.