Purpose:
- Confirm that the NF-core pipeline Methylseq is actually using the parameters I am specifying in my
.config
file - Check whether the alignment score threshold should be set to -0.8 or -0.6.
Methods:
- create a samplesheet with just one sample
sample,fastq_1,fastq_2
POR-216-TP1,/gscratch/scrubbed/strigg/analyses/20250714_methylseq_test/data/POR-216-TP1_R1_001.fastp-trim.fq.gz,/gscratch/scrubbed/strigg/analyses/20250714_methylseq_test/data/POR-216-TP1_R2_001.fastp-trim.fq.gz
- copy data to klone
#create a new directory
mkdir 20250714_methylseq_test
#change dir
cd 20250714_methylseq_test
#create a new dir for data
mkdir data
#copy data
wget https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/output/01.00-E-Peve-WGBS-trimming-fastp-FastQC-MultiQC/POR-216-TP1_R1_001.fastp-trim.fq.gz
wget https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/output/01.00-E-Peve-WGBS-trimming-fastp-FastQC-MultiQC/POR-216-TP1_R2_001.fastp-trim.fq.gz
- modify .config file to use alignment score threshold -0.6
vim /gscratch/srlab/strigg/bin/uw_hyak_srlab.config
params {
config_profile_description = 'UW Hyak Roberts labs cluster profile provided by nf-core/configs.'
config_profile_contact = 'Shelly A. Wanamaker @shellywanamaker'
config_profile_url = 'https://faculty.washington.edu/sr320/'
}
process {
errorStrategy = 'retry'
maxSubmitAwait = '60 min'
maxRetries = 2
executor = 'slurm'
queue = { task.attempt < 4 ? (task.attempt < 3 ? 'ckpt-all' : 'cpu-g2' ) : 'cpu-g2-mem2x' }
clusterOptions = { task.attempt < 4 ? (task.attempt < 3 ? "-A srlab" : "-A coenv" ) : "-A srlab" }
scratch = '/gscratch/scrubbed/srlab/'
resourceLimits = [
cpus: 16,
memory: '400.GB',
time: '72.h'
]
withName:preseq {
errorStrategy = 'ignore'
}
withName: BISMARK_ALIGN {
ext.args = {
[
'--score_min L,0,-0.6',
'--non_directional'
].join(' ')
}
ext.prefix = { "${meta.id}" }
}
withName: BISMARK_METHYLATIONEXTRACTOR {
ext.args = {
[
'--merge_non_CpG',
'--comprehensive',
'--multicore 48',
'--buffer_size 75%'
].join(' ')
}
ext.prefix = { "${meta.id}" }
}
withName: BISMARK_COVERAGE2CYTOSINE {
ext.args = {
[
'--merge_CpG',
'--zero_based'
].join(' ')
}
ext.prefix = { "${meta.id}" }
}
}
executor {
queuesize = 50
submitRateLimit = '1 sec'
}
singularity {
enabled = true
autoMounts = true
cacheDir = '/gscratch/scrubbed/srlab/.apptainer'
}
trace {
enabled = true
file = 'pipeline_trace.txt'
fields = 'task_id,hash,name,status,exit,submit,duration,realtime,%cpu,rss,vmem,rchar,wchar,disk,queue,hostname,cpu_model'
}
debug {
cleanup = false
}
- run pipeline
nextflow run nf-core/methylseq \
-c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250714_methylseq_test/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250714_methylseq_test \
--fasta /gscratch/srlab/strigg/GENOMES/Porites_evermanni_v1.fa \
--em_seq \
-resume \
-with-report nf_report.html \
-with-trace \
-with-timeline nf_timeline.html \
--skip_trimming \
--nomeseq
- copy data to gannet
rsync --archive --verbose --progress --include='*/' --exclude='*.gz' --exclude='*.bam' --exclude='BismarkIndex' --exclude=data/ 20250714_methylseq_test shellytrigg@gannet.fish.washington.edu:/volume2/web/metacarcinus/E5/Pevermanni
Results
ouptput: https://gannet.fish.washington.edu/metacarcinus/E5/Pevermanni/20250714_methylseq_test/
Using the pipeline_trace.txt file, we can look in the work cache directories to see what exact commands are called for BISMARK_ALIGN
, BISMARK_DEDUPLICATE
, BISMARK_METHYLATIONEXTRACTOR
and BISMARK_COVERAGE2CYTOSINE
to make sure they are the same as SR lab parameters:
BISMARK_ALIGN
2a/8c3c04BISMARK_DEDUPLICATE
41/d5c00cBISMARK_METHYLATIONEXTRACTOR
14/be7f0eBISMARK_COVERAGE2CYTOSINE
13/82eba5635f7a2abc3c55dde1ef8760/.command.sh
bismark \
-1 POR-216-TP1_R1_001.fastp-trim.fq.gz -2 POR-216-TP1_R2_001.fastp-trim.fq.gz \
--genome BismarkIndex \
--bam \
--score_min L,0,-0.6 --non_directional --prefix POR-216-TP1 --multicore 3
bismark_methylation_extractor \
POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bam \
--bedGraph \
--counts \
--gzip \
--report \
-p \
--merge_non_CpG --comprehensive --multicore 48 --buffer_size 75%
deduplicate_bismark \
\
-p \
--bam POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_pe.bam
coverage2cytosine \
POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bismark.cov.gz \
--genome BismarkIndex \
--output POR-216-TP1 \
--gzip \
--merge_CpG --zero_based
This confirms the pipeline is using the same parameters as SR lab pipeline
Comparing -0.6 to -0.8 alignment scores
report from using -0.8 https://gannet.fish.washington.edu/metacarcinus/E5/Pevermanni/20250619_methylseq/bismark/reports/POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_PE_report.html
report from using -0.6 https://gannet.fish.washington.edu/metacarcinus/E5/Pevermanni/20250714_methylseq_test/bismark/reports/POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_PE_report.html
Alignment score threshold | -0.8 | -0.6 |
---|---|---|
Sequence pairs analysed in total | 3.47E+07 | 3.47E+07 |
Paired-end alignments with a unique best hit | 2.31E+07 | 2.21E+07 |
Pairs without alignments under any condition | 8.51E+06 | 9.65E+06 |
Pairs that did not map uniquely | 3.09E+06 | 2.98E+06 |
Genomic sequence context not extractable (edges of chromosomes) | 4.96E+03 | 3.72E+03 |
Total C’s analysed | 1.18E+09 | 1.13E+09 |
Methylated C’s in CpG context | 1.44E+07 | 1.39E+07 |
Methylated C’s in CHG context | 1.26E+06 | 1.15E+06 |
Methylated C’s in CHH context | 5.68E+06 | 5.12E+06 |
Methylated C’s in Unknown context | 4.35E+04 | 2.64E+04 |
Unmethylated C’s in CpG context | 1.77E+08 | 1.70E+08 |
Unmethylated C’s in CHG context | 2.06E+08 | 1.98E+08 |
Unmethylated C’s in CHH context | 7.73E+08 | 7.40E+08 |
Unmethylated C’s in Unknown context | 2.16E+06 | 1.43E+06 |
Percentage methylation (CpG context) | 7.50% | 7.60% |
Percentage methylation (CHG context) | 0.60% | 0.60% |
Percentage methylation (CHH context) | 0.70% | 0.70% |
Methylated C’s in Unknown context | 2.00% | 1.80% |
Comparing nf-core to Sam’s pipeline
Sam’s output and nf-core output for Apul WGBS are identical (both used AS -0.6)