Olivar is a Python3 software for multiplex PCR tiling design. Olivar implements a novel algorithm to reduce non-specific amplifications in PCR, while avoiding primer design at SNPs and other undesired regions at the same time. Olivar also optimize for primer dimers with the SADDLE algorithm. Olivar is also published as an article on Nature Communications.
A web interface is available at olivar.rice.edu, although it does not support all available functions at the moment.
1. Install Miniconda if not installed already (quick command line install)
conda create -n olivar olivar --channel conda-forge --channel bioconda --channel defaults --strict-channel-priority
Tip
Setting channel priority is important for Bioconda packages to function properly. You may also persist channel priority settings for all package installation by modifying your ~/.condarc
file. For more information, check the Bioconda documentation.
conda activate olivar
olivar --help
python >=3.8
blast >=2.12.0
biopython
numpy
pandas
plotly >=5.13.0
tqdm
-
(Required) Reference sequence in fasta format (example). Ambiguous bases are not supported and may raise errors.
-
(Optional) List of sequence variations to be avoided, in csv format (example). Column "START" and "STOP" are required, "FREQ" is considered as 1.0 if empty. Other columns are not required. Coordinates are 1-based.
-
(Optional) A BLAST database of non-specific sequences. More details can be found in Prepare a BLAST database.
Note
To reproduce the results in example_output (primers used in the publication), use BLAST v2.12.0 or v2.13.0 and follow the instructions and commands below.
To specify the version of BLAST when installing Olivar,
conda install olivar blast=2.13.0
Git LFS is needed to clone the example BLAST database.
The Olivar CLI tool comprises of four sub-commands: build
, tiling
, save
and validate
. Descriptions of command-line arguments can be found in Command-line parameters.
Tip
build
, tiling
, and validate
support multiprocessing with -p
option.
A fasta reference sequence is required, coordinates of sequence variations and BLAST database are optional.
olivar build example_input/EPI_ISL_402124.fasta -v example_input/delta_omicron_loc.csv -d example_input/Human/GRCh38_primary -o example_output -p 1
An Olivar reference file (olivar-ref.olvr) will be generated. Use multiple CPU cores (-p
) to accelerate this process.
In this step, the input reference sequence is chopped into kmers, and GC content, sequence complexity and BLAST hits are calculated for each kmer. Sequence variations are also labeled if coordinates are provided. A risk score is assigned to each nucleotide of the reference sequence, guiding the placement of primer design regions.
An Olivar reference file generated in step 1 is required. Set random seed (--seed
) to make the results reproducible. Use multiple CPU cores (-p
) to accelerate this process. Output files are listed below (coordinates are 1-based).
olivar tiling example_output/olivar-ref.olvr -o example_output --max-amp-len 420 --min-amp-len 252 --check-var --seed 10 -p 1
Default name | Description |
---|---|
olivar-design.olvd | Olivar design file, keeping all intermediate results during the design. |
olivar-design.csv | Sequences, coordinates (1-based) and pool assignment of primers, inserts and amplicons. |
olivar-design.json | Design configurations. |
olivar-design.fasta | Reference sequence. |
olivar-design.html | An interactive plot to view primers and the risk array. |
olivar-design_Loss.html | Loss of PDR optimization and primer dimer optimization. |
olivar-design_risk.csv | Risk scores of each risk component. |
olivar-design.scheme.bed | Primer sequences and coordinates in ARTIC/PrimalScheme format. |
In this step, the placement of primer design regions (PDRs) is optimized based on the risk array (Fig.1d), and primer candidates are generated by SADDLE for each PDR in the optimized PDR set. SADDLE also minimizes primer dimer by exploring different combinations of primer candidates.
Output files in step 2 can be generated repeatedly as long as the Olivar deisng file (.olvd) is provided.
olivar save example_output/olivar-design.olvd -o example_output
Warning
.olvr and .olvd files are generated with pickle. Do NOT load those files from untrusted sources.
Input should be a csv file, with four required columns: "amplicon_id" (amplicon name), "fP" (sequence of forward primer), "rP" (sequence of reverse primer) and "pool" (primer pool number, e.g., 1). This could be an Olivar designed primer pool generated in step 2, or primer pools that are not designed by Olivar. Output files are listed below (coordinates are 1-based). Use multiple CPU cores (-p
) to accelerate this process.
olivar validate example_output/olivar-design.csv --pool 1 -d example_input/Human/GRCh38_primary -o example_output -p 1
Default name | Description |
---|---|
olivar-val_pool-1.csv | Basic information of each single primer, including dG, dimer score, BLAST hits, etc. |
olivar-val_pool-1_ns-amp.csv | Predicted non-specific amplicons. |
olivar-val_pool-1_ns-pair.csv | Predicted non-specific primer pairs. |
Olivar can also be imported as a Python package, comprising of four functions with the same names and parameters as the four sub-commands in the CLI.
from olivar import build, tiling, save, validate
Refer to example.py for more details.
olivar build fasta-file [--var <string>] [--db <string>] [--output <string>]
[--title <string>] [--threads <int>]
Argument | Default | Description |
---|---|---|
fasta-file | Positional argument. Path to the fasta reference sequence. | |
--var, -v | None | Optional, path to the csv file of SNP coordinates and frequencies. Required columns: "START", "STOP", "FREQ". "FREQ" is considered as 1.0 if empty. Coordinates are 1-based. |
--db, -d | None | Optional, path to the BLAST database. Note that this path should end with the name of the BLAST database (e.g., "example_input/Human/GRCh38_primary"). |
--output, -o | ./ | Output directory (output to current directory by default). |
--title, -t | olivar-ref | Name of the Olivar reference file. |
--threads, -p | 1 | Number of threads. |
olivar tiling olvr-file [--output <string>] [--title <string>] [--max-amp-len <int>]
[--min-amp-len <int>] [--w-egc <float>] [--w-lc <float>] [--w-ns <float>] [--w-var <float>]
[--temperature <float>] [--salinity <float>] [--dg-max <float>] [--min-gc <float>]
[--max-gc <float>] [--min-complexity <float>] [--max-len <int>] [--check-var]
[--fp-prefix <DNA>] [--rp-prefix <DNA>] [--seed <int>] [--threads <int>]
Argument | Default | Description |
---|---|---|
olvr-file | Positional argument. Path to the Olivar reference file (.olvr). | |
--output, -o | ./ | Output path (output to current directory by default). |
--title, -t | olivar-design | Name of design. |
--max-amp-len | 420 | Maximum amplicon length. |
--min-amp-len | None | Minimum amplicon length. 0.9*{max-amp-len} if not provided. |
--w-egc | 1.0 | Weight for extreme GC content. |
--w-lc | 1.0 | Weight for low sequence complexity. |
--w-ns | 1.0 | Weight for non-specificity. |
--w-var | 1.0 | Weight for variations. |
--temperature | 60.0 | PCR annealing temperature. |
--salinity | 0.18 | Concentration of monovalent ions in units of molar. |
--dg-max | -11.8 | Maximum free energy change of a primer in kcal/mol. |
--min-gc | 0.2 | Minimum GC content of a primer. |
--max-gc | 0.75 | Maximum GC content of a primer. |
--min-complexity | 0.4 | Minimum sequence complexity of a primer. |
--max-len | 36 | Maximum length of a primer. |
--check-var | False | Boolean flag. Filter out primer candidates with variations within 5nt of 3' end. NOT recommended when a lot of variations are provided, since this would significantly reduce the number of primer candidates. |
--fp-prefix | None | Prefix of forward primer. Empty by default. |
--rp-prefix | None | Prefix of reverse primer. Empty by default. |
--seed | 10 | Random seed for optimizing PDRs and SADDLE. |
--threads, -p | 1 | Number of threads. |
olivar save olvd-file [--output <string>]
Argument | Default | Description |
---|---|---|
olvd-file | Positional argument. Path to the Olivar design file (.olvd) | |
--output, -o | ./ | Output directory (output to current directory by default). |
olivar validate csv-file [--pool <int>] [--db <string>] [--output <string>]
[--title <string>] [--max-amp-len <int>] [--temperature <float>] [--threads <int>]
Argument | Default | Description |
---|---|---|
csv-file | Positional argument. Path to the csv file of a primer pool. Required columns: "amplicon_id" (amplicon name), "fP" (sequence of forward primer), "rP" (sequence of reverse primer), "pool" (pool number, e.g., 1). | |
--pool | 1 | Primer pool number. |
--db, -d | None | Optional, path to the BLAST database. Note that this path should end with the name of the BLAST database (e.g., "example_input/Human/GRCh38_primary"). |
--output, -o | ./ | Output directory (output to current directory by default). |
--title, -t | olivar-val | Name of validation. |
--max-amp-len | 1500 | Maximum length of predicted non-specific amplicon. Ignored is no BLAST database is provided. |
--temperature | 60.0 | PCR annealing temperature. |
--threads, -p | 1 | Number of threads. |
Tip
All BLAST related commands/scripts are installed along with Olivar.
- To make your own BLAST database with the
makeblastdb
command, check out the NCBI BLAST User Manual.
The example BLAST database is created with 23 Chromosomes and MT of human genome assembly GRCh38, with the command (BLAST version 2.12.0):
makeblastdb -in GRCh38_primary.fasta -dbtype nucl -title GRCh38_primary -parse_seqids -hash_index -out GRCh38_primary -max_file_sz 4GB -logfile makeblastdb.out -taxid 9606
- To download a pre-built BLAST database from NCBI (e.g., RefSeq representative gennomes for viruses), use the
update_blastdb.pl
script:
update_blastdb.pl --decompress ref_viruses_rep_genomes
For more details about update_blastdb.pl
, check the BLAST Help.
For more pre-built databases, check the NCBI FTP site.