Sagemaker Studio
The Amorphic platform provides integration with AWS SageMaker Studio to accelerate machine learning workflows in SageMaker.
Amazon SageMaker Studio is an integrated development environment(IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, and deploy models to production without leaving SageMaker Studio. It allows you to quickly switch environments and collaborate seamlessly within your organization to build ML models at scale.
Utilizing SageMaker Studio through Amorphic enables users to streamline their workflow by alleviating the burden of creating numerous configurations. By leveraging Amorphic, individuals can harness the complete capabilities of AWS SageMaker and Notebooks, facilitating advanced development of machine learning models and pipelines.
Studio Operations
Amorphic Studio provides the below operations.
Operation | Description |
---|---|
Create Studio | Create a studio domain and required resources in AWS Sagemaker. |
Update Studio | Update the attributes of studio. |
Update Resource Access | Update resource access permissions for studio |
Delete Studio | Delete studio components |
- Default service quotas:
- Total domains: 2
- User Profiles: 2
- Domains with RStudioServerPro Apps: 1
- Please refer to the service quotas and raise an AWS request to get the quotas updated based on your use cases.
- If a service quota is exceeded, the studio creation would fail with an error similar to this:
LimitExceededError: Domain-level App [arn:aws:sagemaker:<region>:<>:app/<>/domain-shared/RStudioServerPro/default] failed to start: [The account-level service limit 'RStudioServerPro Apps running on system instances' is 1 Apps, with current utilization of 1 Apps and a request delta of 1 Apps. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota.].
- Sharing studios with tags is currently not supported.
Create Studio
To create a Studio:
- Click on
+ New Studio
- Fill in the details shown in the table:
Attribute | Description |
---|---|
Studio Name | Unique name for Studio. |
Description | Describe the studio's purpose and important details. |
Allowed Instances List | Select the list of ML compute instances with which apps can be created in the studio. |
Direct Internet Access | Sets whether SageMaker provides internet access to the studio. |
Volume Size (GB) | Default storage volume size (in GB) for apps created in studio. |
Max Volume Size (GB) | Max storage volume size (in GB) for apps created in studio. Defaults to 100 GB |
RStudio Access | Select whether to enable/disable access for the RStudio App in the studio. |
Datasets Access | Select datasets with write or read access required for the studio. |
Parameter Access | Select SSM parameters that can be used inside studio environment. |
Jupyter Lab Instance Type | Select the instance type to be used for creating the Jupyter Lab app in the studio. This is defaulted to the first value in the Allowed Instances List if not selected. |
- Studio creation involves provisioning of multiple underlying resources and can take around 5-10 minutes to reach InService status
- Read access to datasets with Lakeformation as target location cannot be provided to studio.
Studio Details
When a new studio is created, Amorphic creates an AWS sagemaker domain and underlying resources(user-profiles, spaces and apps) for consumption
Users can launch the Studio IDE using the User Profile URL available in the details page.
NoteIf a user does not have access to any underlying datasets/parameters attached to the Studio, they will not be able to access the URL and will see an error message indicating the resources which they don't have access to.
Users cannot create their own spaces within the studio. Amorphic will create collaborative spaces by default for users to use.
A default Jupyter Lab app is created for studio using the Jupyter Lab Instance Type within the studio. This is a collaborative app and multiple users can use this app.
If the users wants to update the configuration or stop the Jupyter Lab app, they can use the Stop space button and make the necessary changes.
- Please notify all the users before stopping the Jupyter Lab app. Users that are working in the space will lose work that is in memory or unsaved. Users will need to refresh their page to learn of the shut down.
- The Jupyter Lab Instance Lab attribute available in the studio details would be the default instance type using which the Jupyter Lab app is created. If users modify the configuration from the studio, this attribute will not be updated.
- Currently only Jupyter Lab and RStudio(if enabled) can be consumed from the Studio IDE. Users can also launch these apps directly from the Studio Apps tab available in the studio details page in Amorphic.
Using RStudio IDE
For using RStudio IDE in Studios, you need to have a valid license provisioned by AWS License Manager. Follow the instructions mentioned in the documentation.
- There can be two types of users within RStudio - Admins and users. The user who creates the studio is by default an Admin user.
- In Amorphic, if the user is provided owner access to the studio, the user would be an RStudio Admin and if the user is provided read-only access to the studio, the user would be an RStudio user
- Users can access the dashboard using the Admin Dashboard URL available in the Studio Apps page
- Admin users can access a dashboard which provides details such as number of sessions, users and instance utilization, etc.
- The application can be accessed using the App URL available in the Studio Apps page or from the Studio IDE using the User Profile URL.
- RStudio Sessions can only be created with instances that are specified in the Allowed Instances List in the studio
Update Studio
- Users can update the Allowed Instances List to be used for creating Apps within studio if needed.
Delete Studio
- Studio deletion can take up to 10-15 minutes depending on the number of linked users and apps created in the studio.
Update resource access
Users can attach Amorphic datasets in readonly or write mode to studio to get access to them inside IDE. Parameters can also be updated in this manner.
Studio by default allows all type of instances while creating an app inside it. Users can provide a list of machine instances to access inside studio IDE from Amorphic UI to avoid accidental creation of high costing instances.
Studio Benefits
Amazon SageMaker Studio offers a unified experience for ML development. ML teams can perform the complete ML workflow in a single web-based visual interface.
Access to pre-trained ML models, built-in algorithms, and prebuilt ML solutions
Studio Use cases
Unify your end-to-end ML development in SageMaker Studio with the most comprehensive ML tools all in one place. SageMaker offers high-performing MLOps tools to help you automate and standardize ML workflows and governance tools to support transparency and auditability across your organization.
Build foundation models faster in SageMaker Studio with access to a wide range of publicly available models, notebooks backed by high performance compute for fine-tuning, and ability to scale to distributed training directly from Studio notebooks.
SageMaker Studio offers a unified experience to perform all data analytics and ML workflows. Create, browse, and connect to Amazon EMR clusters. Build, test, and run interactive data preparation and analytics applications with Amazon Glue interactive sessions. Monitor and debug Spark jobs using familiar tools such as Spark UI – all right from SageMaker Studio notebooks.