Query Engine
The Query Engine lets you use standard SQL to analyze data stored in Amazon S3 or Amazon Redshift. It also allows you to generate sample queries from existing datasets. You can run SQL queries to view or analyze the contents of datasets and keep track of the history of query executions.
If you want to run queries on redshift datasets or views, you have to select the Query Target Location as “redshift" in the Query Editor. You can view and download the results from Amorphic UI. Redshift queries can only return up to 5000 rows of results.
Redshift query results are stored temporarily and they get expired after 24 hours of running the query. Redshift datasets or views cannot be queried along with non-redshift datasets or views. Cross-tenant querying is supported only for redshift nodes of type "ra3".
Query Editor
Query editor is where you can type the SQL queries which you want to execute on the selected datasets. The query should be in SQL format.
You can clear the editor from the bottom right corner of the editor.
Generate Sample Query
If you're unsure of what to write in the query editor, you can generate a sample query to get an idea of the query's structure.
For generating the sample query, you need to select the Domain and Resource on which you want to run the query. Select the Tenant if you want to query the Redshift cluster. You can generate a sample query. After that, a sample query referring to the dataset will appear in the query editor.
When using reserved keywords in the query, enclose the keywords with double-quotes.
example:
'GROUP' is a reserved keyword, querying dataset where one of the column is 'GROUP'
Wrong query: select GROUP from domain.dataset
Correct query: select "GROUP" from domain.dataset
Query Engine dropdowns will display only the user-accessible domains/datasets/views.
For a Multi tenancy deployment, you can use Amorphic to run queries and create views across different tenants or databases. To access an object from another tenant or database, you need to use the full path notation, which includes the tenant name, schema name, and object name. (e.g. tenant.schema.object.)
When querying a redshift resource(dataset/view), use tenant_name.schema_name.resource_name. When querying an s3 athena resource, use schema_name.resource_name.
To access a particular column, use tenant_name.schema_name.object_name.column_name.
If you encounter any issues regarding data types it might be due to usage of different data types other than the supported ones. Please check the AWS Athena supported data types.
Amorphic Query Engine currently only supports the SQL queries starting with the following commands:
- For S3Athena and Redshift Datasets and Views: "SELECT", "MSCK REPAIR TABLE", "CREATE TABLE", "CREATE EXTERNAL TABLE", "SHOW PARTITIONS"
- For Iceberg Datasets: "SELECT", "INSERT INTO", "UPDATE", "DELETE FROM", "DESCRIBE", "SHOW TBLPROPERTIES", "OPTIMIZE", "MERGE INTO", "SHOW COLUMNS", "VACUUM"
- For Hudi Datasets: "SELECT", "INSERT INTO", "UPDATE", "DELETE FROM", "DESCRIBE", "SHOW TBLPROPERTIES", "SHOW COLUMNS", "ALTER TABLE"
Stop Query
Users now have the option to stop a query whenever necessary.
Select Query Target Location
You can choose to run the query with either AWS Athena or AWS Redshift. The default value is 'Athena'. Select either 'Athena' or 'Redshift' as the Query Target Location, depending on your needs.
Select Workgroup
You can also select workgroups to execute Athena queries. The default value is 'primary'.
Governed datasets and Iceberg datasets are queried by selecting AmazonAthenaEngineV3
workgroup.
Other datasets can be queried by selecting default value primary
workgroup.
Assume Role
In Amorphic, user can run certain queries on top of user accessible datasets through Query engine which reads the underlying data of the dataset stored in S3.
When Assume role is enabled, Amorphic will use User IAM role for submitting the query to consume the data instead of a generic lambda role. Unlike the generic lambda role, the user role has permissions specific to the resources which the user has access to.
While trying to query data from an view with target location athena which is created on top of a LakeFormation dataset, make sure to use the 'Assume IAM Role'. This role cannot be used for querying system datasets. This role also cannot be used when the user has more than 220 datasets and views
Query Results
Once the Query Run is successful, you can review the result on the Amorphic platform or download the file to use on an external platform. Both options will appear on the console page after the query is executed. On the Amorphic platform, only up to 50 records can be viewed. To access all records, users can download the full result.
API Usage
You can perform a POST call on /queries
to run queries against the query engine. The following parameters are required:
QueryString
Encoding
Valid values for Encoding
are "base64" and "none". Most queries can be run using Encoding
set to "none". However, you may encounter a "forbidden" issue due to WAF rules if QueryString
contains strings like '%%'. To avoid this issue, we suggest using Encoding
set to "base64" and providing a "base64" encoded QueryString
.
History
The History tab, located next to the query editor, contains a listing of all historic query executions.
Common Errors
The Amorphic Query Engine uses AWS Athena service in the back end. Occasionally, queries may fail and users may encounter common errors, such as data-schema mismatch or invalid SQL commands.
For more details on errors, refer common errors.
For more details, refer Hive data and metadata errors, and search for the specific error.
For LakeFormation Governed datasets, The following error can happen when user tries to query a Lakeformation-Governed dataset after deleting a file in the dataset:
GENERIC_INTERNAL_ERROR: Encountered an exception executed in context[...] with message[Encountered an exception executed in
context[retrieving blocks] with message[Encountered an exception executed in context[executing] with message[File not found: Request failed because the following file is missing:
https://<bucketname>.s3.<region>.amazonaws.com/<domain name>/<dataset name>/<partition column>%<partition value>/<username>_<dataset id>_<partition value>_<filename>.
(Service: AWSLakeFormation; Status Code: 400; Error Code: InvalidInputException; Request ID: <request_id>; Proxy: null)]]]