Data management workflow

The following is a proposed data management guide for SI Ocean DNA sequence data. This workflow was designed for genome skimming datasets, but could be adapted for other project types.

Important

Please see the README.md the Ocean DNA Hydra Store directory for details about where to upload raw data and how to name files and directories.

graph TD;

  GenoHub[**GenoHub**
Demultiplexed and compressed sequence reads in FASTQ format. Files should end in “.fastq.gz” or “.fq.gz”]
  Metadata[**Metadata CSV**
Information for all samples in the GenoHub project. Must include the following columns:
***ID:*** GenoHub sample name
***R1:*** read 1 FASTQ file name
***R2:*** read 2 FASTQ file name
***Taxon:*** your best guess at taxonomic assignment
***UniqID:*** unique identifier linked to a voucher/tissue sample
  ]
  Analyses(Run quality/adapter trimming, mitogenome assembly, etc)
  Scratch[(**Hydra Scratch**
/scratch/nmnh_ocean_dna
40 TB. Not backed up, no automatic file purging.)]
  Store[(**Hydra Store**
/store/public/oceandna
40 TB. Not backed up, no automatic file purging. For large raw data files and inactive projects. Drive system is slower, can't be used for active analysis)]
  PDrive[(**P Drive**
P:\NMNH-OCEAN-DNA
80 TB. Incrementally backed up daily, fully backed up weekly. Only accessible from SI computers.)]
  
  Move1[download raw FASTQ files]  
  Move3[copy important results]
  Move4[Dan M. runs monthly and on-demand backup]

  Metadata-->Store
  GenoHub-->Move1
  Move1-->Store
  Move1-->Scratch
  subgraph " "
    Scratch-->Analyses
    Analyses-->Move3
    Move3-->Store
  end
  Store-->Move4
  Move4-->PDrive

  classDef process stroke:black,color:white,fill:#159BD7,stroke-dasharray: 5 5
  classDef storage stroke:black,color:white,fill:#159BD7
  classDef ccr stroke:black,color:white,fill:#159BD7

  class Rename,Analyses,Move1,Move2,Move3,Move4 process
  class Metadata,GenoHub,Scratch,Store,PDrive storage

  click Rename "bestpractices.html"

  linkStyle default stroke:grey, stroke-width:4px