Repair dataset(s)/ Generate dataset(s) report in Amorphic
Repair Dataset/ Generate dataset report feature in Amorphic allows the users to repair the dataset(s)/ generate dataset(s) report individually as well as globally.
This feature is not supported for Iceberg and Hudi Datasets.
Repair a dataset/ Generate dataset report
User can repair the dataset/ generate the report using the 'Repair Dataset/ Generate Dataset Report' button in the dataset details page and following issues will be repaired/ identified and generated as a report:
- AccessRequests: If there are any inconsistency with access requests for the dataset like orphan access requests even if the dataset is deleted, access request approved by another owner etc.
- DatasetLoads: Deals with the dataset file load failures and updates the status to a safe relevant state.
- MissingFiles: Compares the dataset files in S3 and also metadata in Dynamo and deletes the irrelevant/orphan files in S3 and updates the DynamoDB, also vice-versa.
- InconsistentDatasetMetadata: Repair dataset attributes which gets stuck in a specific state( Eg. S3DataSyncStatus stuck in inprogress state) to failed state.
Additionally, users have the option to repair partitions during dataset repair.
Global dataset repair/ Generate global report
User can repair or generate metadata inconsistency report for all the datasets owned by the user by clicking on 'Global Dataset Repair/Report' button present on the right side of datasets listing page and selecting the 'Global Dataset Repair' or 'Generate Global Report' option. Then following issues will be repaired/reported:
- AccessRequests: If there are any inconsistency with access requests for the dataset like orphan access requests even if the dataset is deleted, access request approved by another owner etc.
- DatasetLoads: Deals with the dataset file load failures and updates the status to a safe relevant state.
- MissingFiles: Compares the dataset files in S3 and also metadata in Dynamo and deletes the irrelevant/orphan files in S3 and updates the DynamoDB, also vice-versa.
- InvalidDatasets: Deletes the irrelevant/orphan datasets metadata present in DynamoDB
- InconsistentDatasetMetadata: Repair dataset attributes which gets stuck in a specific state( Eg. S3DataSyncStatus stuck in inprogress state) to failed state.
Global dataset repair/report runs in the background asynchronously. An email with full repair report will be sent to the user who triggered it. If a dataset has multiple owners, email notification will be sent to all the owners.