Dataload Limits
The way data load limits work is mostly determined by the Data load throttling setting.
- If Data load throttling is enabled, the dataset files will be processed one by one, and the time taken to complete the process will depend on the number of files uploaded by the user.
- Files uploaded from the Amorphic UI directly from the dataset details page will be processed immediately, even if data throttling is enabled. However, files uploaded through any jobs, ingestion, etc., still undergo the normal data load throttling procedures.
- If the Data load throttling is disabled, then the dataset files will be processed and completed right away.
Dataset file upload process totally depends on the below functionality. This is ONLY applicable for append and update type of datasets.
In case of use cases with large number of files, The best data load limit for Redshift is 90-100.
No action is required to enable or disable data load throttling. The Amorphic system will automatically adjust it based on the number of files being processed in the application. However, users have the option to manually enable or disable it if needed.
If the Data load throttling is enabled manually by any user then it'll not turn off automatically.
Dataload Throttling automatic process
- If the number of processing files is within the throttle limit, then the Amorphic system will disable throttling.
- If the number of processing files exceeds the limit, then the Amorphic system will enable throttling and process the files in a queue.
Dataload Limits
Dataload limits lets you set a batch limit for processing the files you upload to the dataset. These limits are different for each target location (S3, S3Athena, Lakeformation, Dynamodb, AuroraMySQL).
For Example, If a user uploads 1000 files to an S3 type of dataset, the files will be processed based on the S3 limit. If the limit is 300, up to 300 files will be processed in parallel at the system level, and the remaining files will be queued. The system will poll the queue every 3 minutes and trigger the processing of files according to the limits mentioned below.
The specified ranges for each target location are calculated based on AWS limits and performance tests.
You can also view the count of recent dataload executions and number of messages waiting in the SQS queues to be processed on the Infra Management
.
Update Dataload Limits
You can update the data load limits for all applicable target locations by using the "Set Limits". In the Set Data Load Limits popup, enter the new values in the respective target location fields to update the limits and click the Update Limits to apply the changes.
- The minimum and maximum values for limits will be displayed in the respective helper tooltips.
- If the page does not show the updated limits after they have been successfully updated, please refresh the limits after a few seconds to reflect the changes. This could be due to a delay in the AWS SSM (Parameter Store) service.
View Lambda Concurrency and Dataload Statistics
You can view the account-level lambda concurrency and refresh it to get the latest value. You will get the latest value on every page load.
You can view the following dataload statistics for all applicable target locations:
- Recent Executions: Number of recent file load executions across the application. Hover on the 'time' icon beside the count to view the most recent execution time.
- Current Messages in Queue: Number of messages available in the queue across the application.