This is my first lab notebook post using GitHub’s markdown.
My goals today are to:
- Get my first lab notebook post out
- Start on my first task to familiarize with the lab’s general DNA methylation analysis pipeline for PE bisulfite sequencing data
- Try to find a monitor and keyboard from the surplus room in SAFS
This is an example of how to include images in posts and of how much I love shellfish
Let’s see how this goes!
End-of-day Update:
- Post worked!
- Progress on first task:
- Got server credentials and was able to mount servers to access data
- Set up jupyter notebook. Found this tutorial helpful. Still need to work on how to sync with GitHub so large data files and analyses can be saved on the server.
- Was able to download Bismark and get it to run locally on a subset of the data (10K reads). See details below. I did not do further analysis because I wanted to get the jupyter notebook fully set up and run the analysis on the server before getting too deep. -will continue to work on jupyter notebook setup and analysis tomorrow
- Went down to SAFS to ask Laurie about the surplus room, but she was in a meeting. Will try tomorrow
####################RUNNING BISMARK LOCALLY##################################### ##needed to install bowtie2 locally conda install -c bioconda bowtie2 ##needed to install samtools locally conda install -c bioconda samtools ##install bismark conda install -c bioconda bismark ##download genome locally curl http://owl.fish.washington.edu/halfshell/genomic-databank/Cvirginica_v300.fa > Cvirginica_v300.fa
##bismark genome preparation bismark_genome_preparation –path_to_bowtie /Users/Shelly/anaconda3/bin/ –verbose /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300 ###output from command => see below
#running bismark bismark -u 10000 –non_directional –score_min L,0,-0.6 –genome /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300/ -1 /Volumes/web/seashell/bu-serine-wd/18-04-07/R1.fastq.gz -2 /Volumes/web/seashell/bu-serine-wd/18-04-07/R2.fastq.gz
###############output from bismark alignment#######################
Path to Bowtie 2 specified as: bowtie2
Bowtie seems to be working fine (tested command ‘bowtie2 –version’ [2.3.4])
Output format is BAM (default)
Alignments will be written out in BAM format. Samtools found here: ‘/Users/Shelly/anaconda3/bin/samtools’
Reference genome folder provided is /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300/ (absolute path is ‘/Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300/)’
FastQ format assumed (by default)
Processing sequences up to read no. 10000 from the input file
Input files to be analysed (in current folder ‘/Users/Shelly/Desktop/personal/Career/StevenRobetsLab/data_analysis/Cvirg_Apr2018/Bismark_attempt1’):
/Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R1.fastq.gz
/Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R2.fastq.gz
Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)
Setting parallelization to single-threaded (default)
Current working directory is: /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/data_analysis/Cvirg_Apr2018/Bismark_attempt1
Now reading in and storing sequence information of the genome specified in: /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300/
chr NC_035780.1 (65668440 bp)
chr NC_035781.1 (61752955 bp)
chr NC_035782.1 (77061148 bp)
chr NC_035783.1 (59691872 bp)
chr NC_035784.1 (98698416 bp)
chr NC_035785.1 (51258098 bp)
chr NC_035786.1 (57830854 bp)
chr NC_035787.1 (75944018 bp)
chr NC_035788.1 (104168038 bp)
chr NC_035789.1 (32650045 bp)
chr NC_007175.2 (17244 bp)
Single-core mode: setting pid to 1
Paired-end alignments will be performed
=======================================
The provided filenames for paired-end alignments are /Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R1.fastq.gz and /Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R2.fastq.gz
Input files are in FastQ format
Processing reads up to sequence no. 10000 from /Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R1.fastq.gz
Writing a C -> T converted version of the input file zr2096_1_s1_R1.fastq.gz to zr2096_1_s1_R1.fastq.gz_C_to_T.fastq
Created C -> T converted version of the FastQ file zr2096_1_s1_R1.fastq.gz (10001 sequences in total)
Processing reads up to sequence no. 10000 from /Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R2.fastq.gz
Writing a G -> A converted version of the input file zr2096_1_s1_R2.fastq.gz to zr2096_1_s1_R2.fastq.gz_G_to_A.fastq
Created G -> A converted version of the FastQ file zr2096_1_s1_R2.fastq.gz (10001 sequences in total)
Input files are zr2096_1_s1_R1.fastq.gz_C_to_T.fastq and zr2096_1_s1_R2.fastq.gz_G_to_A.fastq (FastQ)
Now running 2 instances of Bowtie 2 against the bisulfite genome of /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300/ with the specified options: -q –score-min L,0,-0.2 –ignore-quals –no-mixed –no-discordant –dovetail –maxins 500
Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from zr2096_1_s1_R1.fastq.gz_C_to_T.fastq and zr2096_1_s1_R2.fastq.gz_G_to_A.fastq, with the options: -q –score-min L,0,-0.2 –ignore-quals –no-mixed –no-discordant –dovetail –maxins 500 –norc))
Found first alignment:
HWI-C00124:321:CC781ANXX:1:1101:1249:2156_1:N:0:CGATGT/1 77 * 0 0 * * 0 GAGTTTTTTTGATTATTTGTTGTTTGTTGTTTGTTTGNNNNTNNNTNNGTTTGTTTGTTTGTTTGTTTGTTTGTAAATTTTTTATATTTTTATTTTTTTT BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF####<###<##«<FFFFFFFFFFFFFFFF<FFFFFF/</FFFFFF/B/FFFFF/FFFFFFFF YT:Z:UP
HWI-C00124:321:CC781ANXX:1:1101:1249:2156_2:N:0:CGATGT/2 141 * 0 0 * * 0 CCCCTTAAAAAAAAACAAAACCCTTCATTCAAACAAACTTAAATCCCCTTCACCTAAAAATACTTTATATCAAATTTAATTAAAATTAACCCAATAATTC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF YT:Z:UP
Now starting a Bowtie 2 paired-end alignment for CTread1GAread2GAgenome (reading in sequences from zr2096_1_s1_R1.fastq.gz_C_to_T.fastq and zr2096_1_s1_R2.fastq.gz_G_to_A.fastq, with the options: -q –score-min L,0,-0.2 –ignore-quals –no-mixed –no-discordant –dovetail –maxins 500 –nofw))
Found first alignment:
HWI-C00124:321:CC781ANXX:1:1101:1249:2156_1:N:0:CGATGT/1 77 * 0 0 * * 0 GAGTTTTTTTGATTATTTGTTGTTTGTTGTTTGTTTGNNNNTNNNTNNGTTTGTTTGTTTGTTTGTTTGTTTGTAAATTTTTTATATTTTTATTTTTTTT BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF####<###<##«<FFFFFFFFFFFFFFFF<FFFFFF/</FFFFFF/B/FFFFF/FFFFFFFF YT:Z:UP
HWI-C00124:321:CC781ANXX:1:1101:1249:2156_2:N:0:CGATGT/2 141 * 0 0 * * 0 CCCCTTAAAAAAAAACAAAACCCTTCATTCAAACAAACTTAAATCCCCTTCACCTAAAAATACTTTATATCAAATTTAATTAAAATTAACCCAATAATTC BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF YT:Z:UP
»> Writing bisulfite mapping results to zr2096_1_s1_R1_bismark_bt2_pe.bam «<
Reading in the sequence files /Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R1.fastq.gz and /Volumes/web/seashell/bu-serine-wd/18-04-07/zr2096_1_s1_R2.fastq.gz
10000 reads; of these:
10000 (100.00%) were paired; of these:
9720 (97.20%) aligned concordantly 0 times
163 (1.63%) aligned concordantly exactly 1 time
117 (1.17%) aligned concordantly >1 times
2.80% overall alignment rate
10000 reads; of these:
10000 (100.00%) were paired; of these:
9717 (97.17%) aligned concordantly 0 times
163 (1.63%) aligned concordantly exactly 1 time
120 (1.20%) aligned concordantly >1 times
2.83% overall alignment rate
Processed 10000 sequences in total
Successfully deleted the temporary files zr2096_1_s1_R1.fastq.gz_C_to_T.fastq and zr2096_1_s1_R2.fastq.gz_G_to_A.fastq
Final Alignment report
======================
Sequence pairs analysed in total: 10000
Number of paired-end alignments with a unique best hit: 298
Mapping efficiency: 3.0%
Sequence pairs with no alignments under any condition: 9562
Sequence pairs did not map uniquely: 140
Sequence pairs which were discarded because genomic sequence could not be extracted: 0
Number of sequence pairs with unique best (first) alignment came from the bowtie output:
CT/GA/CT: 152 ((converted) top strand)
GA/CT/CT: 0 (complementary to (converted) top strand)
GA/CT/GA: 0 (complementary to (converted) bottom strand)
CT/GA/GA: 146 ((converted) bottom strand)
Number of alignments to (merely theoretical) complementary strands being rejected in total: 0
Final Cytosine Methylation Report
=================================
Total number of C’s analysed: 10572
Total methylated C’s in CpG context: 1191
Total methylated C’s in CHG context: 48
Total methylated C’s in CHH context: 117
Total methylated C’s in Unknown context: 0
Total unmethylated C’s in CpG context: 295
Total unmethylated C’s in CHG context: 2453
Total unmethylated C’s in CHH context: 6468
Total unmethylated C’s in Unknown context: 13
C methylated in CpG context: 80.1%
C methylated in CHG context: 1.9%
C methylated in CHH context: 1.8%
C methylated in unknown context (CN or CHN): 0.0%
Bismark completed in 0d 0h 0m 51s
====================
Bismark run complete
====================
###############output from bismark genome preparation####################### #Path to genome folder specified as: /Users/Shelly/Desktop/personal/Career/StevenRobetsLab/GENOMES/Cvirginica/v300/ #Aligner to be used: Bowtie 2 (default) #Writing bisulfite genomes out into a single MFA (multi FastA) file
#Bismark Genome Preparation - Step I: Preparing folders
#Path to Bowtie 2 specified: /Users/Shelly/anaconda3/bin/