#experiment #design #line #fasta #site #command #oligonucleotide

app oligo4sshic

A command line tools to design oligonucleotide for ssHi-C experiment

1 unstable release

new 0.1.0 Nov 22, 2024

#302 in Command line utilities

MIT license

150KB
3.5K SLoC

Design annealing oligonucleotides for ssHi-C

oligo4sshic is a small rust program to generate oligonucleotides for single-strand Hi-C experiments

Installation

You need cargo installed

  • curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  • sudo apt install cargo
  • sudo pacman -S rust
  • brew install rust

Then run the following command:

git clone git@gitbio.ens-lyon.fr:LBMC/GM/oligo4sshic.git
cargo install --path oligo4sshic

From a fasta file and a restriction site (e.g. 'GATC'), and a list of secondary restriction sites this program will provide a list of oligonucleotides of size N with the following characteristic.

  • We choose a strand of DNA (the reference or it's reverse complement)
  • If the strand is the reverse, change the final coordinate of the oligo to the forward strand
  • The restriction site at X nt of the start
  • ~5 SNP after the restriction site uniformly distributed
  • No overlapping oligonucleotides, in the case of an overlap take the one on the right
  • If a site from the secondary sites list or its reverse complement is present introduce SNP to inactivate it (e.g. CAATTG,AATATT,GANTC)
  • At least one site from the block list must be present in the oligo sequence (forward or reverse).
  • SNP doesn't create new restriction sites
  • oligonucleotides are equidistant on the fasta sequence
  • oligonucleotides are not reverse complement of one another (on more than Y nt)
  • In case of reverse complement remove oligo that keep the highest uniformity score when removed
  • No SNP at less than Z nt of the borders
  • Remove oligo with N in the sequence
  • The selected oligonucleotides have the highest melting temperature
  • The selected oligonucleotides have homogeneous melting temperature between each other

Usage

Usage: oligo4sshic [OPTIONS] --fasta <FASTA> --output-snp <OUTPUT_SNP> --output-raw <OUTPUT_RAW>

Options:
  -f, --fasta <FASTA>
          fasta file of the genome
      --forward-intervals <FORWARD_INTERVALS>
          comma separated list of chromosomic interval to work on, on the forward strand (e.g. chr_a:1-100,chr_b:200-300) [default: all]
      --reverse-intervals <REVERSE_INTERVALS>
          comma separated list of chromosomic interval to work on, on the reverse strand (e.g. chr_a:1-100,chr_b:200-300) [default: ]
      --output-snp <OUTPUT_SNP>
          output file with the list of oligos sequence in fasta format with snp
      --output-raw <OUTPUT_RAW>
          output file with the list of oligos sequence in fasta format without snp
      --site <SITE>
          sequence of the site to look for for [default: GATC]
      --secondary-sites <SECONDARY_SITES>
          comma separated list of site sequences that will be disabled by SNPs [default: CAATTG,AATATT,GANTC]
      --size <SIZE>
          site of the oligonucleotides [default: 75]
      --site-start <SITE_START>
          site start position withing the oligonucleotide sequences [default: 5]
      --no-snp-zone <NO_SNP_ZONE>
          number of nucleotides that will not be transformed in SNPs after the site and before the end of the oligonucleotide sequences [default: 5]
      --complementary-size <COMPLEMENTARY_SIZE>
          maximum number of complementary bases between two oligonucleotides [default: 7]
      --snp-number <SNP_NUMBER>
          number of snp to add to the oligonucleotide sequence [default: 5]
      --tries <TRIES>
          number of run to try to find the highest number of oligos [default: 10]
  -v, --verbose
          work with the reverse complement of the fasta file
  -h, --help
          Print help
  -V, --version
          Print version

Dependencies

~1.5–2.2MB
~40K SLoC