Skip to main content
Version: v2.5 print this page

Dataset Lifecycle

The Dataset Lifecycle policy is a feature that helps manage objects in Amazon S3 to keep costs low throughout their lifecycle. It allows users to control the transition and expiration of objects in the dataset using a set of rules that define actions for Amazon S3 to take.

By default, all objects in the dataset are stored in the S3 Standard storage class, which is a basic storage option. However, there are many other storage classes supported by S3 that offer more cost-efficient storage options for different types of objects and use cases.

For example, an object that is only accessed once every three months can be moved to a cheaper storage class, even if it takes longer to retrieve the object. For more information, see Storage Classes.

If a lifecycle policy is enabled for a dataset, two types of rules can be defined:

  • Transition Rules -> You can set the number of Days for objects in the dataset to move to a given StorageClass based on how often it is used. The transition time for objects is calculated from the moment they are uploaded to the dataset.
  • Expiration Rules -> You can set how many Days will pass before objects in the dataset expire and are removed permanently. The expiration time is calculated from the moment they are added to the dataset.

Important Points

  • You can only have 1000 datasets with lifecycle rules. If you have more, you must delete the lifecycle policy for a dataset before you can turn on a new one.
  • Restored files that were temporarily deleted in the dataset are seen as new objects uploaded to the dataset. The metadata (Upload Date) of the restored file is changed, so the lifecycle rule will be applied from the date the file was restored. The storage class of the file will not stay the same after temporarily deleting the file.
  • A file can only move in one direction when it comes to storage classes. It can't move back to a previous storage class. For more information, check out this AWS Documentation.
  • Files stored in GLACIER/DEEP_ARCHIVE can't be accessed from S3 with an ETL Job script or Amorphic-UI. Users won't be able to temporarily delete files from the dataset that are in GLACIER/DEEP_ARCHIVE. But they can permanently delete or truncate the dataset.
  • Objects with a size less than 128 KB are not monitored and are stored in the Frequent Access tier. For more information on S3 Intelligent-Tiering and its access tiers, please refer to S3 Intelligent-Tiering access tiers.
  • The S3 Standard-IA and S3 One Zone-IA storage classes are suitable for objects larger than 128 KB that you plan to store for at least 30 days. If an object is less than 128 KB, Amazon S3 charges you for 128 KB. If you delete an object before the end of the 30-day minimum storage duration period, you are charged for 30 days. For pricing information, see Amazon S3 pricing.
  • We cannot currently enable lifecycle policies for bulkload v1 type connections, as this would create a new dataset. To enable lifecycle policies for this new dataset, users must manually edit it.
  • When the LifeCyclePolicyStatus for a dataset is in Enabling, Disabling, or Deleting state, files cannot be deleted from the respective dataset. Even the truncate feature of the dataset won't work. Please wait for the lifecycle policy to be applied before deleting any files from the dataset.

Enable/Disable Dataset Lifecycle Policy

You can enable or disable lifecycle policy of a dataset at any given point of time as shown in the gif below.

Enable dataset lifecycle policy

Following are the fields derived in a dataset lifecycle policy:

  • Enable Life Cycle Policy: Select Yes to turn on the policy or No to turn it off. If the policy is on, you have to provide either Expiration Days or Transition Rules or both.
  • Expiration Days: Number of days after which the file will be deleted from the dataset after it is uploaded.
  • Transition Rules: Rules that move the file to a different storage class after a certain number of days, depending on the date of upload.

Users can also bulk update/delete the lifecycle policies for datasets through Bulk Management. Check Bulk update/delete lifecycle policies

Delete Dataset Lifecycle Policy

You can delete dataset lifecycle policy at any time by clicking the Delete Life-Cycle Policy button.

Delete dataset lifecycle policy

Notification Alerts & Error Handling

If an error occurs while enabling, disabling, or deleting a lifecycle policy, an ErrorMessage will be displayed on the dataset details page under the Life Cycle Policy Details section, and the policy will be reverted to its previous state.

Additionally, if the user has subscribed to email alerts in the Amorphic application, they will receive an email upon completion of each operation (enable, disable, delete) performed on the dataset related to the lifecycle policy.

Dataset lifecycle use case

A company has a large dataset of customer information stored in Amazon S3. The data is frequently accessed and updated, so it needs to be stored in the S3 Standard storage class for fast retrieval. However, after a certain period of time, the company realizes that some of the data is no longer frequently accessed and could be moved to a cheaper storage class without affecting its performance.

The company can use the Dataset Lifecycle policy to transition objects in the dataset to a cheaper storage class after they haven't been accessed for a certain period of time, say 90 days. This way, the company can save on storage costs without sacrificing the accessibility for their data. If the data is accessed again in the future, it can easily be moved back to the S3 Standard storage class with a few clicks.

This use case demonstrates how the Dataset Lifecycle policy can help companies save on storage costs while keeping their data easily accessible as and when required.