Assets
An asset is any valuable data entity cataloged, such as datasets or views. A data asset is characterized by its structured or unstructured nature and is accompanied by metadata detailing its content, origin, structure, and usage context. This metadata enables efficient discovery, comprehension, and governance of the data, allowing users to access and utilize the information for analysis, reporting, and decision-making.
Users can discover and explore assets through the Discover feature. The following image shows the overview for an asset discovered through the global search.
Request Access
If users have access to the resource in Amorphic, they can navigate to it directly. If they do not have access, they can request permission to gain access to the resource.
Users can view detailed information about assets in the catalog, including schema, lineage, dependent resources, and exploration tools.
Schema
Users can view the columns available in datasets and views if they have full access to the resources.
AI Suggestions
Amorphic now provides AI powered suggestions for assets, providing column-level suggestions for the following categories :
- Column descriptions - provides an autogenerated one-liner description that gives relevant info on the kind of data stored in the column.
- Column classifications - classifies data inside a column depending on the kind of data stored. AI will provide multiple classifications suggestions from a pre-defined list of 50+ categories (see Appendix 1.a) that will be relevant for the data.
- PII entities detected - detects the presence of personally identifiable information in the dataset and classifies as entities from a list of 250+ categories (see Appendix 1.b).
These provide valuable recommendations for users on top of their data, and can be found under the Schema
section from asset details.
- AI suggestions are available only for datasets with Data Profiling enabled and Target Location : S3Athena, Redshift, Lakeformation and DynamoDB.
- It is mandatory that data profiling must be run atleast once for the dataset for which suggestions are to be shown.
Users will also have options to review (either approve or decline) the AI suggestions at each each column level as well as at the entire asset level. The approved suggestions will be incorporated to the asset schema, while the declined suggestions will no longer be displayed.
Additionally, the approved suggestions can be searched upon from the Discover
page.
- Approve/decline options will be available only for the suggestions for column descriptions and classfications. For PII entities, AI will continuosly evaluate the data over periodic uploads and indicate to the user upon detection of any PII data.
- Currently, AI suggestions cannot be regenerated for columns and categories that are once approved or declined. However, if the user still wants to edit the description they can go head and do it from the
Dataset Details
page (underProfile
section).
- For auto-generated descriptions, it is mandatory that the Anthropic Claude V3 Sonnet model needs to be enabled in the AWS account. If the model is disabled, the user will still be able to receive suggestions for PII entities and classifications categories.
- AI generated suggestions can be inconsisent. User will be responsible for approving or declining these suggestions.
Lineage
Users can see the lineage of datasets, showing all the connections from where the data is ingested to the jobs where it is being used.
Dependent Resources
This section displays all the resources that depend on the datasets and views.
Explore
This section shows all the notebooks and studios attached to datasets and views. Users can navigate to a notebook or studio if the session is available.
Repair Catalog Metadata (API Only)
This option allows users to repair catalog metadata stored in the OS cluster.
It involves the following steps:
Deleting the index.
Reading information from the asset metadata tables and re-indexing the data.
Resource Path: /catalog/reindex
HTTP Method: POST