Vision framework which brings a more robust, Deep Learning-based approach to some usual OpenCV use cases


DeepCV (Work In Progress)

Project authored by Paul-Emmanuel SOTIR
This project is under Open Source MIT License, see ./LICENSE or more details.

It's not the best choice, it's DeepCV choice!

Given that DeepCV is still at an early stage of developement, someone interested in this project right now would better see DeepCV as a project template, à la Kedro project template, rather than a fully stable vision/deep learning framework.

DeepCV is a Kedro/PyTorch project which aims to simplify the implementation of simple vision tasks. DeepCV allows you to easily create, train, debug and deploy vision processing pipelines by leveraging recent DeepLearning algorithms along with the usual OpenCV tooling.

Some of DeepCV's main features which are already implemented are:

  • The deepcv.meta python module contains various utilities to make it easier to define models, train models with ignite, search hyperparameters with NNI, preprocess and augment data, schedule learning rate(s) with One Cycle policy, perform meta-learning thanks to various tools like HyperparametersEmbedding and as well as meta deep-learning abstractions (e.g. Experiment, DatasetStats, Task, Hyperparameters, HyperparameterSpace) stored for each experiments in a 'metadataset', ...
  • deepcv.meta.base_module.DeepcvModule A base class for easier DeepCV model definition: model sub-modules (NN blocks or layers) can be defined in a simple and generic manner in ./conf/base/parameters.yml. This model base class greatly simplifies pytorch model definition without loosing in expressivity.
  • DeepCV contains various tooling common to any Machine Learning / DeepLearning / Vision projects which are , like: Machine Learning experiments management, state-of-the-art Deep Learning architecture definition, model training and insights, Python vision learning to be transferred between any DeepCV image models by sharing, training, forking and/or merging these shared weights.
  • [./conf/base/parameters.yml] can also specify data augmentation and preprocessing recipes.
  • Quickly follow and visualize and compare training experiments and vision pipelines with MLFlow Web UI, Kedro-viz, TensorboardX, NNI Web UI, Jupyter notebooks...,
  • ...

Install instructions

In order to handle dependencies, DeepCV requires a conda distribution (e.g. Anaconda or Miniconda) to be installed in your working environment.
You will then need to activate conda environment using conda activate deepcv if you want to run DeepCV code.
DeepCV is a project based on Kedro machine learning project template. We modified kedro install command in order to better support conda environments. Thus, during kedro installation, a conda environment for DeepCV will be created according to .src/environment.yml conda env file; Feel free to modify it according to your needs (e.g. add dependencies, ...) and then run kedro install to either update or create a new conda environment.

Once installed you can use DeepCV as a project template or a dependency in your code, and run it either througt kedro run command to run machine learning pipelines or directly from your python source file. You can also run a deepcv source file to test it: e.g. python -o ./src/deepcv/synchronization/ will run some unit tests related to audio synchronization tasks.

Method #1: Install from this repository

git clone
cd ./DeepCV

# Install DeepCV dependencies and conda environment (if you want to customize conda environment you can either modify default YAML conda env file at ./src/environment.yml or specify a new env file (see `--conda-yml` option in `kedro install -h`))
kedro install

# You then need to activate conda environment (make sure to have a conda distro installed)
conda activate deepcv

# You can then run tests of any deepcv module to verify successfull installation of DeepCV (May not work for now, stay tuned 📡):
python deepcv/classification/

Method #2: Install our package from Anaconda repository

TODO: Package DeepCV project and upload it to a conda cloud repository

conda install deepcv

# Install DeepCV dependencies and conda environment (if you want to customize conda environment you can either modify default YAML conda env file at ./src/environment.yml or specify a new env file (see `--conda-yml` option in `kedro install -h`))
kedro install

# You then need to activate conda environement (make sure to have a conda distro installed)
conda activate deepcv

# You can then run tests of any deepcv module to verify successfull installation of DeepCV (May not work for now, stay tuned 📡):
python deepcv/classification/

Usage example

Make sure to run conda activate deepcv (or activate your own conda env with deepcv as a dependency) before runing DeepCV code.
Here is an example usage of deepcv from your own Python source:

import deepcv

def main():
    # TODO: Show example usage of deepcv
    raise NotImplementedError

if __name__ == "__main__":


DeepCV is a Kedro PyTorch project which aims to simplify the implementation of simple vision tasks. DeepCV allows you to easily define vision processing pipelines by leveraging recent DeepLearning algorithms along with the usual OpenCV tools.
DeepCV is a project based on Kedro machine learning project template, which enforce and simplifies the definition, configuration, training and inference of machine learning pipelines.
Moreover DeepCV uses MLFlow for better experiment versionning and visualization.

See hosted Sphinx documentation for more details, local version at ./docs/build/html/index.html (WIP: Not hosted for now nor very informative, stay tunned 📡)

Alternatively, if you need documentation from a specific branch or updated documentation with your own contributions, you can build sphinx documentation by following these instructions:

git clone
cd ./DeepCV
kedro install
kedro build-docs
# Once documentation have been sucessfully built, you can browse to ./docs/build/html/index.html

Contribution guide

Any contribution are welcome, but keep in mind this project is still at a very early stage of development.
Feel free to submit issues if you have any feature suggestion, ideas, application areas, or if you have difficulties to reuse it.

📝TODO List📝

DeepCV Features and code refactoring TODO List 💥(☞゚ヮ゚)☞💥

👍 = DONE; ♻ = WIP; 💤: TODO

  • 👍 Implement conda activate when kedro is called in (WIP: testing and debug)
  • 👍Finalize image classifier model definition + move generic code of ImageClassifier to DeepCVModule base class
  • 👍parse and process recipes from parameters.yml
  • 👍refactor augmentation operators of AugMix on PIL images in
  • 👍make possible to specify dense and/or residual links in NN architecture configuration (parameters.yml) and process it accordingly in forward method of deepcv.meta.base_module.DeepcvModule model base class
  • 👍Improve dense/residual link support by reducing its memory footprint: store deepcv.meta.base_module.DeepcvModule sub-modules output features only if they are actually needed by a residual/dense link deeper in NN architecture
  • 👍Improve deepcv.meta.base_module.DeepcvModule model base class to parse YAML NN architecture definition of parameters.yml in a more powerfull/generic way to allow siamese NNs, residual/dense links down/up-sampling, attention gates, multi-scale/depth inputs/outputs, ... (for now, DeepcvModule only support YAML NN architecture definition made of a list of sub-modules with eventual residual-dense links.
  • 👍Setup and download various torchvision datasets (at least CIFAR10/100)
  • 👍fix tests/deepcv module imports (make 'tests' like a third party module appart from deepcv or move 'tests' into deepcv module)
  • 👍fix code and YAML config files in order to be able to run basic kedro pipelines and build documentation
  • 👍Run and Debug whole image classifier pipeline
  • 👍Look into possible implementation of AugMix into deepcv (+ see any improved versions of AugMix)
  • 👍Full Experiment tracking MLFlow integration
  • ♻parse and process recipes from parameters.yml
  • ♻Implement HRNet architecture, create multiscale fusion submodule similar to HRNet's features fusion
  • ♻Basic NNI Hyperparameter search integration (and remove any hyperopt usage)
  • ♻Save training directory to MLFlow and delete it once training is done (+ allow to recover it & name it after run id)
  • ♻Implement more tooling for hyperparameter scheduling: allow multiple schedulers and tools for easier "super-convergence"
  • ♻Improve Hyperparameters/HyperparameterSpace/HyperparameterEmbedding/GeneralizationAcrossScalesPredictor implementations
  • ♻Fully implement HybridConnectivityGatedNet model (+ refactor it to make usage of newest version of deepcv.meta.base_module.DeepcvModule model base class)
  • ♻Train image classifier model + perform its hp search + Human/object detection model
  • ♻Train HRNet implementation on CIFAR10/100 and try to reproduce their results
  • ♻Implement OneCycle Policy along with optional learning rate scales varying for each layers or conv blocks + momentum and eventually investigate similar policies for other hyperprarmeters (e.g. dropout_prob, L2, ...) + consider to integrate fastai to deepcv dependencies in order to reuse its OneCycle policy implemetation?
  • ♻Implement architectures templates/patterns for multiscale neural net inputs and outputs + eventually gaussian blur kernels applied to convolutions activations with decreasing blur kernel size during training steps (+ rapid SOTA review from citing papers of these techniques)
  • ♻Train, vizualize and investigate the effect of various contrastive losses like deepcv.meta.contrastive.JensenShannonDivergenceConsistencyLoss or deepcv.meta.contrastive.TripletMarginLoss for embeddings and supervised training setups (à la contrastive learning for pretraining or additional loss term in supervised training setup) -> look for other contrastive or generative and combine thoose approaches with classical supervised losses
  • ♻Use UDA (UnsupervisedDataAugmentation) and replace/append-to its underlying image augmentation method (RandAugment) with a custom model distilled from SinGAN (i.e. use SinGAN under UDA framework) 🎓🧪:
  • ♻Implement basic image feature/keypoints detection and matching and compare it against classical/non-ML vision approaches like SIFT, ORB, ...: Feature extraction and matching using lightweight CNNs, improving the reliability and reproducibility of image processing pipelines compared to implementations that rely on classical feature extractors such as SIFT, ORB, ...
  • ♻Implement a mechanism to choose which pipelines/models/third-party projects/dependencies to enable or not (i.e. optional plugins to DeepCV) by following Kedro "modular pipelines" guidelines
  • ♻Implement more unit tests and sanity checks
  • ♻Implement a Web app merging and managing all web UIs (Tensorboard, Kedro Viz, MLFlow UI, NNI UI, Jupyter (Lab), and allow custom dashboards with Streamlit ( or Dash(plotly devs) + eventually a simple UI for thrid party git repository installation (Flask interface to (+ Google facets, GGPlot, Seaborn, plotly, matplotlib tooling in notebook/custom dashboards)
  • ♻ hyperparameter search and neural architecture search with NNI HP/NAS (+ NNI compression/prunning/quantization) APIs integration, allowing easier NNI usage on DeepCV training pipelines. NNI tooling in DeepCV features:
    • 👍NNI YAML Config generated from common/default config template for each training pipelines
    • 👍Allow NNI NAS Mutable Layer(s)/input(s) usage in 'deepcv.meta.base_module.DeepcvModule' YAML model specification (mutliple alternatives in submodules architectures spec.)
    • 👍Automated NNI NAS search space generation from model throught nnictl ss-gen-space call from Python
    • 👍Tooling for easier NNI NAS (single-shot or classic NAS algorithm) training
    • 👍Simple NNI compression (prunning and/or quantization) support in 'deepcv.meta.ignite_training.train' training procedure
    • ♻Make possible to easily run a NNI HP search which itself performs Single-Shot NNI NAS training for each HP trial (HP search over NAS search)
    • 👍Consistent MLFlow logging during NNI usage (NNI and MLFlow experiment/run names/ids are similar and NNI results are logged to MLFlow if DeepCV pipeline is tagged with 'train')
    • ♻Make sure DeepCV tooling/integration for NNI wont raise issues during usage and wont interfere with regular nnictl calls (+ NNI experiments can be resumed, cleanly stoped and runned after code or config changes) (+ test NNI Web UI visualization for all use cases)
  • ♻Implement/extend 'deepcv.meta.nn' tooling with Pyramidal convolution kernels (PyConv), and more broadly, grouped convs, so that 'deepcv.meta.nn.conv' allows simple definition of any regular conv, grouped conv, pyramidal conv and conv on multiple resolution of input/output features maps (i.e. HRNet siamese conv branches implemetented as a single branch of multi-resolution convolutions). (PyramidalConv/PyConv features multiple kernel sizes, each of them beeing associated to a kernel depth/count choosen to have the same computational cost as a regular convolution. i.e., computational cost of larger kernels is balanced using more convolution groups, given that kernel depth determines how many groups are needed to perform a convolution. See 'PyConv' original paper: (TMP NOTE: look into paper for more implementation details: e.g. probably use zero padding with varying size w.r.t. kernel size in order to preserve constant output feature size)
  • 💤Reuse outlier filtering technique for feature keypoint matching as a third party; see original paper: "AdaLAM: Revisiting Handcrafted Outlier Detection" and its respective github repository (PyTorch implementation): (TMP NOTES: Keypoint = Feature vector (made of scale and orientation sub-vectors) assiciated to a position vector (e.g. 2D position for 2D image keypoints). Similarity/distance metrics between two keypoints are often function of substraction between orientation sub-vectors and division of scale sub-vectors.)
  • 💤Start Ensembling and stacking utilities module implementation
  • ♻Create jupyter notebook(s) for basic prototyping and training results visualization + implement utility tools for jupyter notebooks
  • 💤Implement Uncertainty estimation utilities in deepcv.meta.uncertainty.estimation module so that most of DeepCV Modules infers uncertainty estimates and exploit estimated uncertainty to perform a kind of active learning and/or boosting and to develop rules/lightweight-meta-models which improves generalization and accuracy. See: and PhD thesis:
  • 💤Implement or integrate distillation with optionnal quantization tools + distillation from ensembles of teacher networks (see NNI, Apex and built-in PyTorch quantization/compression tooling)
  • 💤Setup Continuous Integration using Travis CI
  • 💤Create or find an ECA implementation: channel attention gate on convolution gate using sigmoid of 1D convolution output as attention gate (element-wise multiplication of each channels with their respective gating scale) (kernel size of 1D conv: k << ChannelCount with k=Func(C))
  • 💤Add image completion/reconstruction/generation/combination to DeepCV with a custom model distilled and/or quantized from SinGAN + for data augmentation setup: integrate it to AugMix augmentation transforms
  • 💤Implement PyTorch model profiling tooling which merges results from a few OpenSource tools, like torchprof, torch-scan, torchstat, pytorch-summary, THOP: pytorch-OpCounter and/or flops-counter.pytorch. See also: How fast is my model? and Deep Learning Memory Usage and Pytorch Optimization Tricks
  • 💤implement warpers over DeepCV model pipelines to allow scikit model interface usage and better integration along with OpenCV code + fine-tuning tooling of whole pipelines on small amount of custom data
  • 💤Custom lossless Image and video compression codec using learned arithmetic encoder policies to minimize image and video sizes (which means faster streaming, faster reading from disk, lower storage size) : Implement (fork lepton and L3C for better AC and DC compression using deterministic shallow-NN prediction from context) or add better JPEG and MPEG compression codecs (for use cases where storage-size/bandwidth is the priority, e.g. for faster video/image processing or streaming pipelines, or smaller media storage (= priority to size, then, prioritize decompression time vs compression time)) and/or look for algorithms which could be applied directly on compressed images/frames (see Lepton and L3C ) + utilities to convert files to our codec for faster processing:
    • must be lossless to preserve visual quality when encoding back to jpeg, but should match the benefits from any existing lossy jpeg compression (e.g. lossless algorithm built on top of jpeg's tiles)
    • keep in mind the possibility of progressive image/frame loading/streaming in a future implementation<
    • benchmark performances on imagenet, compare speed and size with L3C (use benchmarking code from
  • 💤add a simple open-source implementation of wave function collapsing, optimize it -> Future work : Procedural Content Generation: Use a GAN to generates slots (learn scenes manifold by semantic clusters) used by Wave Function Collapse (+ Growing Grids as space filling algorithm to determine tile shapes)
  • ♻Finish implementation of 'meta convolution layers' and try to train it
  • ♻Fix, test DeepCV project packaging and deployment and improve modularity/portability
  • 💤Integrate Kornia ( pytorch-based vision library as third party dependency
  • 💤Find best performing video loading library (at least faster ffmpeg, e.g. video decoding directly to GPU memory ) for a broadly used codec and implement video processing tooling (convertion, preprocessing, image pipeline application, input and target interpolation, distributed video processing, ...)
  • 💤Implement various basic video pipeline processing tools
  • 💤Create more sophisticated video inference interpolation by avoiding end-to-end model inference on each frame and by making inference conditionned on previous frames's embedding and inference
  • 💤Implement timeseries models for high-level (low dimensionality) video features understanding
  • ♻Implement a pipeline for video stiching and add support for video stabilization, audio-and/or-visual synchronization, image compression (lossless or lossy), watermark removal, visual tracking, pose estimation
  • 💤Implement more tools for faster deep learning model convergence and generalization, thanks to active learning, boosting and meta-learning techniques
  • Read more papers and implement DeepCV accordingly ;-) (generic papers which seems relevant but I didn't read yet: Fantastic Generalization Measures and Where to Find Them,, ect ... ...)
  • 💤.../♻.../👍... And much more features/supackages have been already implemented, WIP, or TO-be-DOne

Interesting third party projects which could be integrated into DeepCV

Eventually create submodule(s) for the following github projects under third_party directory (see + script to update submodule to latest release commit?):

External dependencies TODO: Remove this section

kedro/mlflow/ignite/pytorch/tensorboard//NNI/Apex/Scikit-learn/Numpy/pandas/Jupyter/.../Python/Conda/CUDA + DeepCV with ffmpeg(+ faster/hardware-accelerated video h264/VP9/AV1 decompression lib?) + DeepCV docker image with or without GPU acceleration + keep in mind portability (e.g. to android, ARM, jetson, ...)

