Skip to main content
Version: v2.5 print this page

Omics Storage

Omics storage enables storing and sharing petabytes of raw genomics data efficiently. There are two types of Omics Storage:

Reference Store

Reference store is used for storing reference genomes. There can only be one reference store in a given environment, so users will have to share the same reference store. The genomic files must be in FASTA format.

Sequence Store

Sequence store is used for storing genomic sequences of interest for further analysis. There can be any number of sequence stores in a given environment. The genomic files can be in FASTQ (gzip-only), BAM, or CRAM formats.

Note

To use Omics Storage, HCLS should be enabled in the environment.

How to create an Omics Storage store?

  1. Click on + New Omics Storage.
  2. Fill in the required fields (Details listed below).

Create Storage Store

Following fields are needed to create an Omics Storage store:

PropertiesDetails
Store NameA name for the Omics Storage store.
Store TypeType of Storage store - Omics Reference/Omics Sequence.
DescriptionDescription of the store being created.
KeywordsKeywords indexed & searchable in app. Choose meaningful keywords to flag related stores & easily find them later.

How to import data into an Omics Storage store?

Import job is the process of importing genomic data into the Reference/Sequence Store that you created. To start an import job you need a dataset which contains your genomic data, and this dataset must be of type s3 with file type as others. This is an asynchronous process and will take some time to complete.

Importing a Reference Genome

  • A Reference Store import job can ingest only 1 file at a time, and the desired file must be specified from the input dataset. Successful completion of this job will create a Reference Genome, which can be viewed under the Reference Genomes tab. Following details are required for a Reference Store import job:
PropertiesDetails
DatasetThe dataset containing the desired reference genome file.
Job NameThe name for your import job.
DescriptionAn optional description for the import job.
File NameThe genomic data file that you want to import from the selected dataset.

Following image depicts how to import a reference genome: Reference Import

Importing a Read Set

  • A Sequence Store import job will ingest all files in the input dataset, and it also supports pair ended sequences. Each valid sequence in the input dataset will result in an individual read set, provided the sequence is compatible with the specified reference genome. Hence, the completion of this job will create Read Set(s), which can be viewed under the Read Sets tab. Following details are required for a Sequence Store import job:
PropertiesDetails
DatasetThe dataset containing the desired genome files.
Job NameThe name for your import job.
DescriptionAn optional description for the import job.
Readset NameThe desired name for the imported readset.
Reference StoreSpecifies the Reference Store.
Reference GenomeThe desired valid reference genome for this particular sequence.
Sample IdThe desired sample id for the imported readset.
Subject IdThe desired subject id for the imported readset.
Note

For a pair ended sequence, the file names in the input dataset must be of the format fileName_1, fileName_2 indicating that they belong as a pair. A single import job can be used to ingest multiple sequences, however the Readset Name, Sample Id and Subject Id will remain the same.

Following image depicts how to import a sequence: Sequence Import