This codebook.txt file was generated on 20190325 by Chris Shaffer ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset Genome Assembly of phage JustBecause 2. Author Information Principal Investigator Christopher Shaffer Box 1137; Dept of Biology Washington University in St. Louis St. Louis MO 63130 shaffer@wustl.edu 3. Date of data collection (single date, range, approximate date) Fall 2017 4. Geographic location of data collection (where was data collected?): Phage isolated 38.645708 N, 90.311579 W 5. Information about funding sources that supported the collection of the data: Supported in part by the Biology Dept Washgington University and by Howard Hughes Medical Institute, Chevy Chase MD -------------------------- SHARING/ACCESS INFORMATION -------------------------- 1. Licenses/restrictions placed on the data: No restrictions on use 2. Links to publications that cite or use the data: 3. Links to other publicly accessible locations of the data: NA 4. Links/relationships to ancillary data sets: http://phagesdb.org/phages/JustBecause 5. Was data derived from another source? NA 6. Recommended citation for the data: --------------------- DATA & FILE OVERVIEW --------------------- 1. File List A. Filename: finishing_notes_on_assembly.txt Short description: Comments and analysis results for genome assembly B. Filename: JustBecause_fasta Short description: Final DNA sequence for phage Satis C. Filename: justbecuase_100k.fastq Short description: raw read data of genome in fastq format D. Filename: justbecause_coverage.png Short description: visualization E. Filename: justbecause_alignment.png Short description: visualization C. Filename: justbecuase_assembly (level 2 sub-folder) Short description: Folder with all subfolders and files properly organized for use with Consed: 454AlignmentInfo.tsv 454AllContigs.fna 454AllContigs.qual 454AssemblyProject.xml 454ContigGraph.txt 454LargeContigs.fna 454LargeContigs.qual 454NewblerMetrics.txt 454NewblerProgress.txt 454PairStatus.txt 454ReadStatus.txt 454Scaffolds.fna 454Scaffolds.qual 454Scaffolds.txt 454TrimStatus.txt consed (level 3 sub-folder) chromat_dir (level 4 sub-folder) [empty] edit_dir (level 4 subfolder) phd.ball justbecause_right.fasta justbecause_left.fasta justbecause_draft.fasta contig00001.fasta badLibraries.txt 454Contigs.ace.3.wrk 454Contigs.ace.3 454Contigs.ace.2.wrk 454Contigs.ace.2 454Contigs.ace.1.wrk 454Contigs.ace.1 454Contigs.ace.0 phd_dir (level 4 subfolder) [empty] 2. Relationship between files: All files pertain to the sequencing and assembly of phage JustBecause. 3. Additional related data collected that was not included in the current data package: NA 4. Are there multiple versions of the dataset? NO -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Description of methods used for collection/generation of data: Genome assembly of Illumina MiSeq reads using Newbler. Complete consed package created. 2. Methods for processing the data: Assembly by Newbler (commercial software no longer available) 3. Instrument- or software-specific information needed to interpret the data: Visualization and assessment of assembly by Consed version 25, see: Gordon and Green. 2013. Consed: A Graphical Editor for Next-Generation Sequencing. Bioinformatics. Volume 29 Number 22 pp.2936-2937. 4. Standards and calibration information, if appropriate: NA 5. Environmental/experimental conditions: NA 6. Describe any quality-assurance procedures performed on the data: NA 7. People involved with sample collection, processing, analysis and/or submission: Saransh Gothi John H. Alarcon Kelly Hartigan Christopher Shaffer