Query Engine
The Query Engine lets you use standard SQL to analyze data stored in Amazon S3 or Amazon Redshift. It also allows you to generate sample queries from existing datasets. You can run SQL codes to view or analyze the contents of datasets and keep track of the history of query executions.
If you want to run queries on redshift datasets or views, you have to select the Query Target Location as “redshift" in the Query Editor. You can view and download the results from Amorphic UI. Redshift queries can only return up to 5000 rows of results.
Redshift query results are stored temporarily and will be expired after 24 hours of running the query. Redshift datasets or views cannot be queried along with non-redshift datasets or views. Cross-tenant querying is supported only for redshift nodes of type "ra3".
Query Editor
Query editor is where you can type the SQL query you want to perform on the selected dataset. The query should be in SQL format.
You can clear the editor from the bottom right corner of the editor.
Generate Sample Query
If you're unsure of what to write in the query editor, you can generate a sample query to get an idea of the query's structure.
For generating the sample query, you need to select the Domain and Resource on which you want to run the query. Select the Tenant if you want to query the Redshift cluster. You can generate a sample query. After that, a sample query referring to the dataset will appear in the query editor.
Query Engine dropdowns will display only the user-accessible domains/datasets/views.
For a Multi tenancy deployment, you can use Amorphic to run queries and create views across different tenants or databases. To access an object from another tenant or database, you need to use the full path notation, which includes the tenant name, schema name, and object name. (e.g. tenant.schema.object.)
When querying a redshift resource(dataset/view), use tenant_name.schema_name.resource_name. When querying an s3 athena resource, use schema_name.resource_name.
To access a particular column, use tenant_name.schema_name.object_name.column_name.
If you encounter any issues regarding datatypes it might be due to usage of different datatypes other than the supported ones. Please check the AWS Athena supported datatypes.
Select Query Target Location
You can choose to run the query with either AWS Athena or AWS Redshift. The default value is 'Athena'. Select either 'Athena' or 'Redshift' as the Query Target Location, depending on your needs.
Select Workgroup
You can select the workgroup on which to make Athena queries. The default value is 'primary'.
Governed datasets and Iceberg datasets are queried by selecting AmazonAthenaEngineV3
workgroup.
Other datasets can be queried by selecting default value primary
workgroup.
Assume Role
In Amorphic, user can run certain queries on top of user accessible datasets through Query engine which reads the underlying data of the dataset stored in S3.
When Assume role is enabled, Amorphic will use User IAM role for submitting the query to consume the data instead of a generic lambda role. Unlike the generic lambda role, the user role has permissions specific to the resources which the user has access to.
While trying to query data from an view with target location athena which is created on top of a LakeFormation dataset, make sure to use the 'Assume IAM Role'. This role cannot be used for querying system datasets. This role also cannot be used when the user has more than 220 datasets and views
Query Results
Once the Query Run is successful, you can review the result on the Amorphic platform or download the file to use on an external platform. Both options will appear on the console page after the query is executed. Queries with a Query Target Location of Redshift can only return up to 5000 rows.
API Usage
You can perform a POST call on /queries
to run queries against the query engine. The following parameters are required:
QueryString
Encoding
Valid values for Encoding
are "base64" and "none". Most queries can be run using Encoding
set to "none". However, you may encounter a "forbidden" issue due to WAF rules if QueryString
contains strings like '%%'. To avoid this issue, we suggest using Encoding
set to "base64" and providing a "base64" encoded QueryString
.
History
The History tab, located next to the query editor, contains a listing of all historic query executions.
Common Errors
The Amorphic Query Engine uses AWS Athena service in the back end. Occasionally, queries may fail and users may encounter common errors, such as data-schema mismatch or invalid SQL commands.
For more details on errors, refer common errors.
For more details, refer Hive data and metadata errors, and search for the specific error.
For LakeFormation Governed datasets, The following error can happen when user tries to query a Lakeformation-Governed dataset after deleting a file in the dataset:
GENERIC_INTERNAL_ERROR: Encountered an exception executed in context[...] with message[Encountered an exception executed in
context[retrieving blocks] with message[Encountered an exception executed in context[executing] with message[File not found: Request failed because the following file is missing:
https://<bucketname>.s3.<region>.amazonaws.com/<domain name>/<dataset name>/<partition column>%<partition value>/<username>_<dataset id>_<partition value>_<filename>.
(Service: AWSLakeFormation; Status Code: 400; Error Code: InvalidInputException; Request ID: <request_id>; Proxy: null)]]]