This post is related to github issue #694

The main problem was not being able to combine the ID_CpG_labelled outputs together into one file as Sam was able to do in his original CpGoe analysis using this script.

Yaamini created new CpGoe files specific for CAP, CDS, GENE, Exons, and windows features of the genome mentioned in this post and following this markdown file. After having problems with the script adding ID_CpG file headers and joining the files together, I attempted to run this script

on Mox:

  1. First I copied the data and script over to Mox: - mkdir /gscratch/srlab/strigg/data/Cvirg/FROGER_CAP_CpGoe - cd /gscratch/srlab/strigg/data/Cvirg/FROGER_CAP_CpGoe
    • rsync –archive –progress –verbose yaamini@172.25.149.226:/Volumes/web/spartina/2019-05-21-FROGER/CAP_CpGoe .
  2. Next I turned this script (/gscratch/srlab/strigg/data/Cvirg/FROGER_CAP_CpGoe/CAP_CpGoe/2019-05-22-CAP-File-Naming-Script.sh) into a Mox job (/gscratch/srlab/strigg/jobs/20190523_CAP_File_Naming.sh) and ran it on May 23. It errored out in the join step after 3-4 hours of running and joining 26 of the 90 files.

    • error message: “Broken pipe join –nocheck-order ID_CpG_labelled_all ${file}ID_CpG_labelled “
    • slurm file is here: /gscratch/srlab/strigg/data/Cvirg/FROGER_CAP_CpGoe/CAP_CpGoe/slurm-863764.out
  3. I copied everything to my folder on Gannet https://gannet.fish.washington.edu/metacarcinus/Cvirginica/FROGER/

  4. I realized some of the sample names have more than one underscore and having a non-unique column name could have been problematic.

  5. We also learned that certain bash commands (sed and maybe even the join command) may not run the same way on a mac (how Yaamini and I have tried running the script) as they do on a PC (how Sam ran it)

  6. I made changes to the script to

On Ostrich:

  1. I used this jupyter notebook to
    • run the updated script on the CAP data
    • run the updated script on the GENE data to duplicate what Sam already did so I could compare the output of the new script to Sam’s.
  2. QC: In R, I compared output of new script with output of Sam’s orginial script that I reformatted in R.

CONCLUSIONS:

  • updated script gives same output as original script and can now be used on remaining CpGoe analyses (CDS, Exons, windows).