Purpose:

  1. Confirm that the NF-core pipeline Methylseq is actually using the parameters I am specifying in my .config file
  2. Check whether the alignment score threshold should be set to -0.8 or -0.6.

Methods:

  1. create a samplesheet with just one sample
sample,fastq_1,fastq_2
POR-216-TP1,/gscratch/scrubbed/strigg/analyses/20250714_methylseq_test/data/POR-216-TP1_R1_001.fastp-trim.fq.gz,/gscratch/scrubbed/strigg/analyses/20250714_methylseq_test/data/POR-216-TP1_R2_001.fastp-trim.fq.gz
  1. copy data to klone
#create a new directory
mkdir 20250714_methylseq_test

#change dir 
cd 20250714_methylseq_test

#create a new dir for data
mkdir data

#copy data
wget https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/output/01.00-E-Peve-WGBS-trimming-fastp-FastQC-MultiQC/POR-216-TP1_R1_001.fastp-trim.fq.gz

wget https://gannet.fish.washington.edu/gitrepos/urol-e5/timeseries_molecular/E-Peve/output/01.00-E-Peve-WGBS-trimming-fastp-FastQC-MultiQC/POR-216-TP1_R2_001.fastp-trim.fq.gz

  1. modify .config file to use alignment score threshold -0.6

vim /gscratch/srlab/strigg/bin/uw_hyak_srlab.config

params {
    config_profile_description = 'UW Hyak Roberts labs cluster profile provided by nf-core/configs.'
    config_profile_contact = 'Shelly A. Wanamaker @shellywanamaker'
    config_profile_url = 'https://faculty.washington.edu/sr320/'
}
 
process {
    errorStrategy = 'retry'
    maxSubmitAwait = '60 min'
    maxRetries = 2
    executor = 'slurm'
    queue  = { task.attempt < 4 ? (task.attempt < 3 ? 'ckpt-all' : 'cpu-g2' ) : 'cpu-g2-mem2x' }    
    clusterOptions = { task.attempt < 4 ? (task.attempt < 3 ? "-A srlab" : "-A coenv" ) : "-A srlab" } 
    scratch = '/gscratch/scrubbed/srlab/'
    resourceLimits = [
        cpus: 16,
        memory: '400.GB',
        time: '72.h'
   ]
 
   withName:preseq { 
        errorStrategy = 'ignore' 
    }  
  
   withName: BISMARK_ALIGN {
        ext.args = {
          [
            '--score_min L,0,-0.6',
	    '--non_directional'
          ].join(' ')
        }
        ext.prefix = { "${meta.id}" }
    }
   
   withName: BISMARK_METHYLATIONEXTRACTOR {
        ext.args = {
	  [
	   '--merge_non_CpG',
	   '--comprehensive',
	   '--multicore 48',
	   '--buffer_size 75%'
	  ].join(' ')
	}
	ext.prefix = { "${meta.id}" }
    }

   withName: BISMARK_COVERAGE2CYTOSINE {
        ext.args = {
          [
            '--merge_CpG',
            '--zero_based'
          ].join(' ')
        }
        ext.prefix = { "${meta.id}" }
    }
}


 
executor {
    queuesize = 50
    submitRateLimit = '1 sec'
}
 
singularity {
    enabled = true
    autoMounts = true
    cacheDir = '/gscratch/scrubbed/srlab/.apptainer'
}

trace {
    enabled = true
    file = 'pipeline_trace.txt'
    fields = 'task_id,hash,name,status,exit,submit,duration,realtime,%cpu,rss,vmem,rchar,wchar,disk,queue,hostname,cpu_model'
}
 
debug {
    cleanup = false
}
  1. run pipeline
nextflow run nf-core/methylseq \
-c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250714_methylseq_test/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250714_methylseq_test \
--fasta /gscratch/srlab/strigg/GENOMES/Porites_evermanni_v1.fa \
--em_seq \
-resume \
-with-report nf_report.html \
-with-trace \
-with-timeline nf_timeline.html \
--skip_trimming \
--nomeseq 
  1. copy data to gannet
rsync --archive --verbose --progress --include='*/' --exclude='*.gz' --exclude='*.bam' --exclude='BismarkIndex' --exclude=data/ 20250714_methylseq_test shellytrigg@gannet.fish.washington.edu:/volume2/web/metacarcinus/E5/Pevermanni

Results

ouptput: https://gannet.fish.washington.edu/metacarcinus/E5/Pevermanni/20250714_methylseq_test/

Using the pipeline_trace.txt file, we can look in the work cache directories to see what exact commands are called for BISMARK_ALIGN, BISMARK_DEDUPLICATE, BISMARK_METHYLATIONEXTRACTOR and BISMARK_COVERAGE2CYTOSINE to make sure they are the same as SR lab parameters:

bismark \
    -1 POR-216-TP1_R1_001.fastp-trim.fq.gz -2 POR-216-TP1_R2_001.fastp-trim.fq.gz \
    --genome BismarkIndex \
    --bam \
    --score_min L,0,-0.6 --non_directional --prefix POR-216-TP1 --multicore 3

bismark_methylation_extractor \
    POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bam \
    --bedGraph \
    --counts \
    --gzip \
    --report \
    -p \
    --merge_non_CpG --comprehensive --multicore 48 --buffer_size 75%    

deduplicate_bismark \
     \
    -p \
    --bam POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_pe.bam    
    
coverage2cytosine \
    POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_pe.deduplicated.bismark.cov.gz \
    --genome BismarkIndex \
    --output POR-216-TP1 \
    --gzip \
    --merge_CpG --zero_based

This confirms the pipeline is using the same parameters as SR lab pipeline

Comparing -0.6 to -0.8 alignment scores

report from using -0.8 https://gannet.fish.washington.edu/metacarcinus/E5/Pevermanni/20250619_methylseq/bismark/reports/POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_PE_report.html

report from using -0.6 https://gannet.fish.washington.edu/metacarcinus/E5/Pevermanni/20250714_methylseq_test/bismark/reports/POR-216-TP1.POR-216-TP1_R1_001.fastp-trim_bismark_bt2_PE_report.html

Alignment score threshold -0.8 -0.6
Sequence pairs analysed in total 3.47E+07 3.47E+07
Paired-end alignments with a unique best hit 2.31E+07 2.21E+07
Pairs without alignments under any condition 8.51E+06 9.65E+06
Pairs that did not map uniquely 3.09E+06 2.98E+06
Genomic sequence context not extractable (edges of chromosomes) 4.96E+03 3.72E+03
Total C’s analysed 1.18E+09 1.13E+09
Methylated C’s in CpG context 1.44E+07 1.39E+07
Methylated C’s in CHG context 1.26E+06 1.15E+06
Methylated C’s in CHH context 5.68E+06 5.12E+06
Methylated C’s in Unknown context 4.35E+04 2.64E+04
Unmethylated C’s in CpG context 1.77E+08 1.70E+08
Unmethylated C’s in CHG context 2.06E+08 1.98E+08
Unmethylated C’s in CHH context 7.73E+08 7.40E+08
Unmethylated C’s in Unknown context 2.16E+06 1.43E+06
Percentage methylation (CpG context) 7.50% 7.60%
Percentage methylation (CHG context) 0.60% 0.60%
Percentage methylation (CHH context) 0.70% 0.70%
Methylated C’s in Unknown context 2.00% 1.80%

Comparing nf-core to Sam’s pipeline

Sam’s output and nf-core output for Apul WGBS are identical (both used AS -0.6)