Skip to main content
info
This documentation is for version v2.5 of the product.
For the latest version(v2.7) documentation click here
Version: v2.5 print this page

Query Engine

The Query Engine lets you use standard SQL to analyze data stored in Amazon S3 or Amazon Redshift. It also allows you to generate sample queries from existing datasets. You can run SQL queries to view or analyze the contents of datasets and keep track of the history of query executions.

If you want to run queries on redshift datasets or views, you have to select the Query Target Location as “redshift" in the Query Editor. You can view and download the results from Amorphic UI. Redshift queries can only return up to 5000 rows of results.

info

Redshift query results are stored temporarily and they get expired after 24 hours of running the query. Redshift datasets or views cannot be queried along with non-redshift datasets or views. Cross-tenant querying is supported only for redshift nodes of type "ra3".

Query_Engine

Query Editor

Query editor is where you can type the SQL queries which you want to execute on the selected datasets. The query should be in SQL format.

You can clear the editor from the bottom right corner of the editor.

Generate Sample Query

If you're unsure of what to write in the query editor, you can generate a sample query to get an idea of the query's structure.

For generating the sample query, you need to select the Domain and Resource on which you want to run the query. Select the Tenant if you want to query the Redshift cluster. You can generate a sample query. After that, a sample query referring to the dataset will appear in the query editor.

Note

When using reserved keywords in the query, enclose the keywords with double-quotes.

example:
'GROUP' is a reserved keyword, querying dataset where one of the column is 'GROUP'
Wrong query: select GROUP from domain.dataset
Correct query: select "GROUP" from domain.dataset

Reserved keywords : redshift, athena

Note

Query Engine dropdowns will display only the user-accessible domains/datasets/views.

For a Multi tenancy deployment, you can use Amorphic to run queries and create views across different tenants or databases. To access an object from another tenant or database, you need to use the full path notation, which includes the tenant name, schema name, and object name. (e.g. tenant.schema.object.)

Note

When querying a redshift resource(dataset/view), use tenant_name.schema_name.resource_name. When querying an s3 athena resource, use schema_name.resource_name.

To access a particular column, use tenant_name.schema_name.object_name.column_name.

Note

If you encounter any issues regarding data types it might be due to usage of different data types other than the supported ones. Please check the AWS Athena supported data types.

Select Query Target Location

You can choose to run the query with either AWS Athena or AWS Redshift. The default value is 'Athena'. Select either 'Athena' or 'Redshift' as the Query Target Location, depending on your needs.

Select Workgroup

You can also select workgroups to execute Athena queries. The default value is 'primary'.

Governed datasets and Iceberg datasets are queried by selecting AmazonAthenaEngineV3 workgroup.

Other datasets can be queried by selecting default value primary workgroup.

Assume Role

In Amorphic, user can run certain queries on top of user accessible datasets through Query engine which reads the underlying data of the dataset stored in S3.

When Assume role is enabled, Amorphic will use User IAM role for submitting the query to consume the data instead of a generic lambda role. Unlike the generic lambda role, the user role has permissions specific to the resources which the user has access to.

Note

While trying to query data from an view with target location athena which is created on top of a LakeFormation dataset, make sure to use the 'Assume IAM Role'. This role cannot be used for querying system datasets. This role also cannot be used when the user has more than 220 datasets and views

Query Results

Once the Query Run is successful, you can review the result on the Amorphic platform or download the file to use on an external platform. Both options will appear on the console page after the query is executed. Queries with a Query Target Location of Redshift can only return up to 5000 rows.

API Usage

You can perform a POST call on /queries to run queries against the query engine. The following parameters are required:

  1. QueryString
  2. Encoding

Valid values for Encoding are "base64" and "none". Most queries can be run using Encoding set to "none". However, you may encounter a "forbidden" issue due to WAF rules if QueryString contains strings like '%%'. To avoid this issue, we suggest using Encoding set to "base64" and providing a "base64" encoded QueryString.

History

The History tab, located next to the query editor, contains a listing of all historic query executions.

Common Errors

The Amorphic Query Engine uses AWS Athena service in the back end. Occasionally, queries may fail and users may encounter common errors, such as data-schema mismatch or invalid SQL commands.

For more details on errors, refer common errors.

For more details, refer Hive data and metadata errors, and search for the specific error.

Hive errors

For LakeFormation Governed datasets, The following error can happen when user tries to query a Lakeformation-Governed dataset after deleting a file in the dataset:

GENERIC_INTERNAL_ERROR: Encountered an exception executed in context[...] with message[Encountered an exception executed in
context[retrieving blocks] with message[Encountered an exception executed in context[executing] with message[File not found: Request failed because the following file is missing:
https://<bucketname>.s3.<region>.amazonaws.com/<domain name>/<dataset name>/<partition column>%<partition value>/<username>_<dataset id>_<partition value>_<filename>.
(Service: AWSLakeFormation; Status Code: 400; Error Code: InvalidInputException; Request ID: <request_id>; Proxy: null)]]]