1 unstable release
new 0.1.0 | Nov 22, 2024 |
---|
#302 in Command line utilities
150KB
3.5K
SLoC
Design annealing oligonucleotides for ssHi-C
oligo4sshic
is a small rust program to generate oligonucleotides for single-strand Hi-C experiments
Installation
You need cargo
installed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
sudo apt install cargo
sudo pacman -S rust
brew install rust
Then run the following command:
git clone git@gitbio.ens-lyon.fr:LBMC/GM/oligo4sshic.git
cargo install --path oligo4sshic
From a fasta file and a restriction site (e.g. 'GATC'), and a list of secondary restriction sites this program will provide a list of oligonucleotides of size N with the following characteristic.
- We choose a strand of DNA (the reference or it's reverse complement)
- If the strand is the reverse, change the final coordinate of the oligo to the forward strand
- The restriction site at X nt of the start
- ~5 SNP after the restriction site uniformly distributed
- No overlapping oligonucleotides, in the case of an overlap take the one on the right
- If a site from the secondary sites list or its reverse complement is present introduce SNP to inactivate it (e.g.
CAATTG
,AATATT
,GANTC
) - At least one site from the block list must be present in the oligo sequence (forward or reverse).
- SNP doesn't create new restriction sites
- oligonucleotides are equidistant on the fasta sequence
- oligonucleotides are not reverse complement of one another (on more than Y nt)
- In case of reverse complement remove oligo that keep the highest uniformity score when removed
- No SNP at less than Z nt of the borders
- Remove oligo with N in the sequence
- The selected oligonucleotides have the highest melting temperature
- The selected oligonucleotides have homogeneous melting temperature between each other
Usage
Usage: oligo4sshic [OPTIONS] --fasta <FASTA> --output-snp <OUTPUT_SNP> --output-raw <OUTPUT_RAW>
Options:
-f, --fasta <FASTA>
fasta file of the genome
--forward-intervals <FORWARD_INTERVALS>
comma separated list of chromosomic interval to work on, on the forward strand (e.g. chr_a:1-100,chr_b:200-300) [default: all]
--reverse-intervals <REVERSE_INTERVALS>
comma separated list of chromosomic interval to work on, on the reverse strand (e.g. chr_a:1-100,chr_b:200-300) [default: ]
--output-snp <OUTPUT_SNP>
output file with the list of oligos sequence in fasta format with snp
--output-raw <OUTPUT_RAW>
output file with the list of oligos sequence in fasta format without snp
--site <SITE>
sequence of the site to look for for [default: GATC]
--secondary-sites <SECONDARY_SITES>
comma separated list of site sequences that will be disabled by SNPs [default: CAATTG,AATATT,GANTC]
--size <SIZE>
site of the oligonucleotides [default: 75]
--site-start <SITE_START>
site start position withing the oligonucleotide sequences [default: 5]
--no-snp-zone <NO_SNP_ZONE>
number of nucleotides that will not be transformed in SNPs after the site and before the end of the oligonucleotide sequences [default: 5]
--complementary-size <COMPLEMENTARY_SIZE>
maximum number of complementary bases between two oligonucleotides [default: 7]
--snp-number <SNP_NUMBER>
number of snp to add to the oligonucleotide sequence [default: 5]
--tries <TRIES>
number of run to try to find the highest number of oligos [default: 10]
-v, --verbose
work with the reverse complement of the fasta file
-h, --help
Print help
-V, --version
Print version
Dependencies
~1.5–2.2MB
~40K SLoC