Purpose
This post is related to github issue # 2174
The purpose of this entry is to align RNAseq and EM-seq data to the P. tuahiniensis genome using nf-core pipelines on Klone.
Results
RNAseq
- mutliqc: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250421_RNAseq/multiqc/multiqc_report.html
- RNAseq output and counts matrices: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250421_RNAseq/
- Samples with poor alignment, duplicated reads and overrepresented seqs
- POC-201-TP3 44.9% & 41.6% (unmapped: too short; low coverage remaining: 15.8M)
- POC-219-TP3 18.1% & 17.1% (unmapped: too short; low coverage remaining: 8.34M)
- POC-52-TP1 17.1% & 15.9% (unmapped: too short; low coverage: 5.4M)
- POC-255-TP3 51.6% & 47.4% (overrepresented seqs)
- Samples with GC bias
- all samples above and POC-42-TP4 (low coverage: 21.15M)
EM-seq
- multiqc: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250422_methylseq/multiqc/bismark/multiqc_report.html
- Methylseq output and counts matrices: https://gannet.fish.washington.edu/metacarcinus/E5/Ptuahiniensis/20250422_methylseq/
Methods
Copy genome files to klone
# show path
pwd
/gscratch/srlab/strigg/GENOMES
# copy genome
wget https://owl.fish.washington.edu/halfshell/genomic-databank/Pocillopora_meandrina_HIv1.assembly.fasta
#copy gtf file
wget https://github.com/urol-e5/timeseries_molecular/raw/d5f546705e3df40558eeaa5c18b122c79d2f4453/F-Ptua/data/Pocillopora_meandrina_HIv1.genes-validated.gtf
# copy gff file
wget https://github.com/urol-e5/timeseries_molecular/raw/d5f546705e3df40558eeaa5c18b122c79d2f4453/F-Ptua/data/Pocillopora_meandrina_HIv1.genes-validated.gff3
Note: I had to remove the ‘3’ from the file extention of the gff and zip it to run the nf-core RNAseq pipeline
Copy RNAseq data to Klone
# open screen session (reopened existing session)
screen -r RNAseq
# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=16:00:00
# copy data
rsync --progress --verbose --archive shellytrigg@gannet.fish.washington.edu:/volume2/web/gitrepos/urol-e5/timeseries_molecular/F-Ptua/output/01.00-F-Ptua-RNAseq-trimming-fastp-FastQC-MultiQC/*.gz /gscratch/scrubbed/strigg/analyses/20250421_RNAseq
Copy WGBS to klone
# open screen session
screen -r methylseq
# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=16:00:00
# copy data
rsync --progress --verbose --archive shellytrigg@gannet.fish.washington.edu:/volume2/web/gitrepos/urol-e5/timeseries_molecular/F-Ptua/output/01.00-F-Ptua-WGBS-trimming-fastp-FastQC-MultiQC/*.gz /gscratch/scrubbed/strigg/analyses/20250421_methylseq
Run RNAseq pipeline
# open screen session
screen -r RNAseq
# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=24:00:00
# activate conda environment
mamba activate nextflow
# run pipeline
nextflow run nf-core/rnaseq -resume \
-c /gscratch/srlab/nextflow/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250421_RNAseq/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250421_RNAseq \
--gtf /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.genes-validated.gtf \
--gff /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.genes-validated.gff.gz \
--fasta /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.assembly.fasta \
--skip_trimming \
--aligner star_salmon \
--skip_pseudo_alignment \
--multiqc_title Pmeandrina_RNAseq \
--deseq2_vst
Run methylseq pipeline
Had to rerun the pipeline because my first iteration omitted a backslash after the --em_seq
parameter so it included trimming by default.
Second iteration run in the same screen session and interactive node initiated in the first iteration.
nextflow run nf-core/methylseq \
-c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250422_methylseq/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250422_methylseq \
--fasta /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.assembly.fasta \
--em_seq \
-resume \
-with-report nf_report.html \
-with-trace \
-with-timeline nf_timeline.html \
--skip_trimming \
--nomeseq
First iteration (started on 2025-04-21)
# open screen session
screen -r methylseq
# start interactive node
salloc -A srlab -p cpu-g2-mem2x -N 1 -c 1 --mem=16GB --time=72:00:00
#activate conda environment
mamba activate nextflow
#run methylseq pipeline
nextflow run nf-core/methylseq \
-c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config \
--input /gscratch/scrubbed/strigg/analyses/20250421_methylseq/samplesheet.csv \
--outdir /gscratch/scrubbed/strigg/analyses/20250421_methylseq \
--fasta /gscratch/srlab/strigg/GENOMES/Pocillopora_meandrina_HIv1.assembly.fasta \
--em_seq
-resume \
-with-report nf_report.html \
-with-trace \
-with-timeline nf_timeline.html \
--skip_trimming \
--nomeseq