Skip to main content
info
This documentation is for version current of the product.
For the latest version(v2.7) documentation click here
 print this page

Create Athena Tables using Query Engine

Users can now create athena tables using Query Engine using either the CREATE TABLE AS SELECT(CTAS) or CREATE EXTERNAL TABLE queries. The created tables will be synced back to Amorphic datasets using the resource sync functionality if the query execution is successful.

CREATE TABLE AS SELECT Queries

These queries can be used to create new athena tables containing results of a SELECT query using existing athena or lakeformation datasets. The new datasets being created would have lower case names irrespective of the case being used in the query. So please use lower case table names in order to avoid confusion.

After the query execution is submitted, the users would see a dataset with the same name in the datasets list populated only with some basic information. If the query execution is successful, resource sync would then update the necessary details in the dataset and users can only view the query results under the Files tab in the dataset. If the query execution fails, resource sync would delete this dataset. Users need to wait until the delete operation is complete before they can run the query again or the query would fail with the error saying dataset already exists.

Query Syntax

CREATE TABLE [IF NOT EXISTS] <target_domain>.<new_table_name>
[ WITH (property_name = expression [, ...]) ]
AS SELECT <list_of_required_columns_seperated_by_commas>, upload_date
FROM <source_domain>.<existing_table_name> [ WHERE <condition> ]
[ WITH NO DATA ]

Please ensure that upload_date is extracted in the SELECT query as it is being added as a default partition. All properties are optional but external_location and partitioned_by will be added by default from backend if not specified.

property_nameDescription
table_typeCurrently only 'HIVE' table creation is supported
external_locationThe location where Athena saves the query results. It would be the S3 DLZ location for the new dataset being created. The format to be followed - 's3://<DLZ_BUCKET_NAME>/<target_domain>/<new_table_name>/' This property will be added by default from the backend if not specified
formatStorage format for CTAS query results. Currently only 'PARQUET' is supported
partitioned_byAn array list of columns by which the CTAS table will be partitioned. The partitions must be of type 'String'. Ensure that the columns being used as partitions are included in the Select query in the exact order as that in the partition array. To keep consistency with existing amorphic datasets, partition 'upload_date' will be added by default as the last partition in backend, so please ensure that 'upload_date' is included in the select query

Please refer to the AWS documentation for more information regarding the supported properties

Use Cases

  1. Reducing a large dataset into a smaller, more efficient dataset.
  2. Selecting a subset of the columns and rows to only deliver what the consumer of the data really needs.
  3. Partitioning a dataset that is not currently optimized to improve performance and reduce the cost.

CREATE EXTERNAL TABLE

This query can be used to create new athena tables with properties specified in the query.

Query Syntax

CREATE EXTERNAL TABLE [IF NOT EXISTS]
<domain_name>.<new_table_name> [(col_name data_type [COMMENT col_comment] [, ...] )]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[ROW FORMAT row_format]
[STORED AS file_format]
[WITH SERDEPROPERTIES (...)]
[LOCATION 's3://<DLZ_BUCKET_NAME>/<target_domain>/<new_table_name>/']
[TBLPROPERTIES ( property_name=property_value [, ...] ) ]

Ensure that the keywords in the query are used in the exact order mentioned above. LOCATION must be of the format s3://<DLZ_BUCKET_NAME>/<target_domain>/<new_table_name>/. upload_date will be added as a default partition to keep the table consistent with Amorphic datasets. Please ensure that the properties specified are supported by Amorphic or else table will not be synced back Please refer to the AWS documentation for more information regarding the supported properties