Version: v2.4 print this page

Omics Storage

Omics storage enables storing and sharing petabytes of raw genomics data efficiently. There are two types of Omics Storage:

Reference Store
Sequence Store

Reference Store

Reference store is used for storing reference genomes. There can only be one reference store in a given environment, so users will have to share the same reference store. The genomic files must be in FASTA format.

Sequence Store

Sequence store is used for storing genomic sequences of interest for further analysis. There can be any number of sequence stores in a given environment. The genomic files can be in FASTQ (gzip-only), BAM, or CRAM formats.

Note

To use Omics Storage, HCLS should be enabled in the environment.

How to create an Omics Storage store?

Click on + New Omics Storage.
Fill in the required fields (Details listed below).

Create Storage Store

Following fields are needed to create an Omics Storage store:

Properties	Details
Store Name	A name for the Omics Storage store.
Store Type	Type of Storage store - Omics Reference/Omics Sequence.
Description	Description of the store being created.
Keywords	Keywords indexed & searchable in app. Choose meaningful keywords to flag related stores & easily find them later.

How to import data into an Omics Storage store?

Import job is the process of importing genomic data into the Reference/Sequence Store that you created. To start an import job you need a dataset which contains your genomic data, and this dataset must be of type s3 with file type as others. This is an asynchronous process and will take some time to complete.

Importing a Reference Genome

A Reference Store import job can ingest only 1 file at a time, and the desired file must be specified from the input dataset. Successful completion of this job will create a Reference Genome, which can be viewed under the Reference Genomes tab. Following details are required for a Reference Store import job:

Properties	Details
Dataset	The dataset containing the desired reference genome file.
Job Name	The name for your import job.
Description	An optional description for the import job.
File Name	The genomic data file that you want to import from the selected dataset.

Following image depicts how to import a reference genome: Reference Import

Importing a Read Set

A Sequence Store import job will ingest all files in the input dataset, and it also supports pair ended sequences. Each valid sequence in the input dataset will result in an individual read set, provided the sequence is compatible with the specified reference genome. Hence, the completion of this job will create Read Set(s), which can be viewed under the Read Sets tab. Following details are required for a Sequence Store import job:

Properties	Details
Dataset	The dataset containing the desired genome files.
Job Name	The name for your import job.
Description	An optional description for the import job.
Readset Name	The desired name for the imported readset.
Reference Store	Specifies the Reference Store.
Reference Genome	The desired valid reference genome for this particular sequence.
Sample Id	The desired sample id for the imported readset.
Subject Id	The desired subject id for the imported readset.

Note

For a pair ended sequence, the file names in the input dataset must be of the format fileName_1, fileName_2 indicating that they belong as a pair. A single import job can be used to ingest multiple sequences, however the Readset Name, Sample Id and Subject Id will remain the same.

Following image depicts how to import a sequence: Sequence Import

Reference Store​

Sequence Store​

How to create an Omics Storage store?​

How to import data into an Omics Storage store?​

Importing a Reference Genome​

Importing a Read Set​