graph TD; GenoHub[**GenoHub** Demultiplexed and compressed sequence reads in FASTQ format. Files should end in “.fastq.gz” or “.fq.gz”] Metadata[**Metadata CSV** Information for all samples in the GenoHub project. Must include the following columns: ***ID:*** GenoHub sample name ***R1:*** read 1 FASTQ file name ***R2:*** read 2 FASTQ file name ***Taxon:*** your best guess at taxonomic assignment ***UniqID:*** unique identifier linked to a voucher/tissue sample ] Analyses(Run quality/adapter trimming, mitogenome assembly, etc) Scratch[(**Hydra Scratch** /scratch/nmnh_ocean_dna 40 TB. Not backed up, no automatic file purging.)] Store[(**Hydra Store** /store/public/oceandna 40 TB. Not backed up, no automatic file purging. For large raw data files and inactive projects. Drive system is slower, can't be used for active analysis)] PDrive[(**P Drive** P:\NMNH-OCEAN-DNA 80 TB. Incrementally backed up daily, fully backed up weekly. Only accessible from SI computers.)] Move1[download raw FASTQ files] Move3[copy important results] Move4[Dan M. runs monthly and on-demand backup] Metadata-->Store GenoHub-->Move1 Move1-->Store Move1-->Scratch subgraph " " Scratch-->Analyses Analyses-->Move3 Move3-->Store end Store-->Move4 Move4-->PDrive classDef process stroke:black,color:white,fill:#159BD7,stroke-dasharray: 5 5 classDef storage stroke:black,color:white,fill:#159BD7 classDef ccr stroke:black,color:white,fill:#159BD7 class Rename,Analyses,Move1,Move2,Move3,Move4 process class Metadata,GenoHub,Scratch,Store,PDrive storage click Rename "bestpractices.html" linkStyle default stroke:grey, stroke-width:4px
Data management workflow
The following is a proposed data management guide for SI Ocean DNA sequence data. This workflow was designed for genome skimming datasets, but could be adapted for other project types.
Important
Please see the README.md
the Ocean DNA Hydra Store directory for details about where to upload raw data and how to name files and directories.