DeepSearch Indices
DeepSearch Indices enables users to create an index in AWS Kendra to enable deep search techniques within a dataset consisting of data files.
For example, suppose you have a file stating product information stored in a dataset in Amazon Web Services (AWS) S3. This file contains information such as product name, product description, price, and reviews. To simplify the process of searching through this data, you can create an index in AWS Kendra using Deepsearch Indices.
This will allow you to perform deep searches over the product information by product name, description, price, and reviews, among other fields. For instance, you can search for all products that cost less than $50, or search for all products that contain the word "laptop" in their description. By creating an index, you can enhance the ease of finding the information you require within your dataset.
How to Create DeepSearch Index?
To create a new Deepsearch Index follow the steps stated below:
- Clicking on
+ New Deepsearch Index
- Complete the information below, excluding the name and description of the index:
Auto Terminate: This enables Deepsearch Index termination to save resource costs based on the termination time value provided by the user. Auto termination process is triggered every hour and looks for any Deepsearch Indices that needs to be notified or deleted and sends an email when one of the below criteria is met. The user will receive a notification email in the following scenarios:
- If the difference between the auto-terminate process trigger run (every whole hour) and the termination time is less than 30 minutes.
- If the auto-termination process was successfully able to delete the Deepsearch Index after the termination time
- If the auto-termination process wasn't able to delete the Deepsearch Index due to some fatal errors.
Auto Termination Time: You can set auto termination time for Deepsearch Index of less than 168 hours (7 days). If current time is greater than termination time, the index will be deleted within the next whole hour.
- The auto-termination process is scheduled to run every hour, precisely on the hour (e.g: 6:00, 7:00, 8:00, 9:00).
- User will receive a email notification only when the user is subscribed to alerts. Please refer to Alert Preferences to enable alerts.
- When the termination time elapses, auto termination process will terminate/delete the Deepsearch Index and also deletes all the metadata related to the Deepsearch Index and this process cannot be undone.
Once the index is created with the above metadata, the index will be in the CREATING state. It will take a few minutes to complete the creation process, after successful completion the index and the status will become ACTIVE. It will contain the following information:
Type | Description |
---|---|
Index Name | Deepsearch Index Name, which uniquely identifies the index. |
Description | A brief explanation of the index typically the type of data added to it. |
Status | Status of index. Ex: CREATING, ACTIVE etc. |
Documents Count | Number of the documents added to the index through datasets. |
Auto Terminate | Status of the auto-termination Ex: Enabled, Disabled |
Remaining Time | Amount of time (in hr) left for auto-termination |
Auto Termination Time | Time at which the system auto terminates the deepsearch index. |
CreatedBy | User who created the index. |
LastModifiedBy | User who has recently updated the index. |
Deepsearch Indices Operations
Amorphic DeepSearch Indices provides below operations for a DeepSearch index. However, you should have sufficient permission to perform these operations.
Type | Description |
---|---|
Create Index | Create an Deepsearch Index in AWS Kendra. |
View Index | View an existing index. |
Add Datasets | Add datasets to the index. |
Sync Jobs | Sync the documents to the index. |
Search | Search for the indexed document data. |
Delete Index | Delete an existing index. |
Add Datasets
You can add datasets to the index using the Add datasets to Index option. Note that only datasets containing unstructured files can be indexed. For more information on the types of supported documents, refer to the AWS Documentation.
Sync Jobs
Once the datasets are added to the index, you can run the Sync job
option to add the documents (structured and unstructured) from the datasets to the index.
Once the sync job operation is triggered and the job is in the SYNCING state, you can take the following actions:
- Refresh the jobs using the 'Refresh' button on the right side to get the latest status of the job.
- Stop the syncing of the datasets using the 'Stop' (red) button beside the SYNCING status of the respective job.
You can also view the details of the job, such as the Status, Datasets synced, Error message (if failed), etc.
Search
Once you've executed the sync-job and indexed documents, you can search for data within the documents using the search bar in the 'Search' tab. Upon entering a keyword, the search will be conducted through all indexed documents and results will be displayed with the following options:
- Add/Remove tags: You can add tags to the files and remove tags from the files.
- Download: You can download the file.
- File Preview: You can preview the file. Supported file formats are txt, json, docx, pdf and html.
- View Dataset: You can navigate to the respective dataset in which the file resides.
Please refer to the animation below on how to search for data:
Index Details
If the user has sufficient permissions to view an index, only then the index information can be viewed.
When the user enables auto-termination on the deepsearch index, the following details will be displayed: Remaining Time, which denotes the amount of time (rounded to the nearest upper hour) left for auto-termination.
In the below image, the auto-termination time is set to June-2-2021 18:51, but the deepsearch index will be deleted at 2021-06-02 19:00, as the termination process is scheduled to run at the whole hour.
In the details page, the Estimated Cost of the deepsearch index is also displayed, to show the approximate cost incurred since the creation time.