Skip to main content
info
This documentation is for version current of the product.
For the latest version(v2.7) documentation click here
 print this page

Repair dataset(s)/ Generate dataset(s) report in Amorphic

Repair Dataset/ Generate dataset report feature in Amorphic allows the users to repair the dataset(s)/ generate dataset(s) report individually as well as globally.

Note

This feature is not supported for Iceberg and Hudi Datasets.

Repair a dataset/ Generate dataset report

User can repair the dataset/ generate the report using the 'Repair Dataset/ Generate Dataset Report' button in the dataset details page and following issues will be repaired/ identified and generated as a report:

  • AccessRequests: If there are any inconsistency with access requests for the dataset like orphan access requests even if the dataset is deleted, access request approved by another owner etc.
  • DatasetLoads: Deals with the dataset file load failures and updates the status to a safe relevant state.
  • MissingFiles: Compares the dataset files in S3 and also metadata in Dynamo and deletes the irrelevant/orphan files in S3 and updates the DynamoDB, also vice-versa.
  • InconsistentDatasetMetadata: Repair dataset attributes which gets stuck in a specific state( Eg. S3DataSyncStatus stuck in inprogress state) to failed state.

Repair dataset

Additionally, users have the option to repair partitions during dataset repair. Repair partitions

Global dataset repair/ Generate global report

User can repair or generate metadata inconsistency report for all the datasets owned by the user by clicking on 'Global Dataset Repair/Report' button present on the right side of datasets listing page and selecting the 'Global Dataset Repair' or 'Generate Global Report' option. Then following issues will be repaired/reported:

  • AccessRequests: If there are any inconsistency with access requests for the dataset like orphan access requests even if the dataset is deleted, access request approved by another owner etc.
  • DatasetLoads: Deals with the dataset file load failures and updates the status to a safe relevant state.
  • MissingFiles: Compares the dataset files in S3 and also metadata in Dynamo and deletes the irrelevant/orphan files in S3 and updates the DynamoDB, also vice-versa.
  • InvalidDatasets: Deletes the irrelevant/orphan datasets metadata present in DynamoDB
  • InconsistentDatasetMetadata: Repair dataset attributes which gets stuck in a specific state( Eg. S3DataSyncStatus stuck in inprogress state) to failed state.

Global dataset Repair

Note

Global dataset repair/report runs in the background asynchronously. An email with full repair report will be sent to the user who triggered it. If a dataset has multiple owners, email notification will be sent to all the owners.