Skip to main content
 print this page

Repair dataset(s) in Amorphic

Repair Dataset feature in Amorphic allows the users to repair the dataset(s) individually as well as globally.

Repair a dataset

User can repair the dataset using the 'Dataset Repair' button in the dataset details page and following issues will be repaired:

  • AccessRequests: If there are any inconsistency with access requests for the dataset like orphan access requests even if the dataset is deleted, access request approved by another owner etc.
  • DatasetLoads: Deals with the dataset file load failures and updates the status to a safe relevant state.
  • MissingFiles: Compares the dataset files in S3 and also metadata in Dynamo and deletes the irrelevant/orphan files in S3 and updates the DynamoDB, also vice-versa.
  • InconsistentDatasetMetadata: Repair dataset attributes which gets stuck in a specific state( Eg. S3DataSyncStatus stuck in inprogress state) to failed state.

Repair dataset

Global datasets repair

User can repair all the datasets owned by the user by clicking on 'Global Dataset repair' button present on the right side of datasets listing page and following issues will be repaired:

  • AccessRequests: If there are any inconsistency with access requests for the dataset like orphan access requests even if the dataset is deleted, access request approved by another owner etc.
  • DatasetLoads: Deals with the dataset file load failures and updates the status to a safe relevant state.
  • MissingFiles: Compares the dataset files in S3 and also metadata in Dynamo and deletes the irrelevant/orphan files in S3 and updates the DynamoDB, also vice-versa.
  • InvalidDatasets: Deletes the irrelevant/orphan datasets metadata present in DynamoDB
  • InconsistentDatasetMetadata: Repair dataset attributes which gets stuck in a specific state( Eg. S3DataSyncStatus stuck in inprogress state) to failed state.

Global dataset Repair

Note

Global dataset repair runs in the background asynchronously. An email with full repair report will be sent to the user who triggered it. If a dataset has multiple owners, email notification will be sent to all the owners.