Skip to content

Data challenge to predict the binding nature of DNA sequence regions to specific transcription factors, using kernel methods

Notifications You must be signed in to change notification settings

HabibSlim/AdvancedLearningModels

Repository files navigation

Transcription factor binding prediction with kernel methods

Python Scipy

[Report]

Summary

Introduction

For this data challenge, our task was to predict whether input DNA sequence regions were binding or not to specific transcription factors. We implement from scratch and compare various string kernels operating on DNA sequences, alongside SVM/KRR and KLR classifiers.

This project was developed by David Emukpere and Habib Slim in the context of the Kernel Methods for Machine Learning course, taught by Julien Mairal at Université Grenoble Alpes (UGA).

Dependencies

This project uses the following external dependencies:

  • scipy, for the linalg package
  • cvxopt, for QP solving
  • pandas and numpy

Usage

In order to reproduce our results, please access the "submission 1" Jupyter notebook (for the KRR submission), and the "submission 2" Jupyter notebook (for the bagged KRR submission). For the latter, Gram matrices have been pre-computed and compressed into the ./data/ folder since we re-used them quite frequently, this also means that the second script will produce the CSV submission a lot faster that the first script.

References

  1. [Leslie et al., 2001] The Spectrum Kernel: A string kernel for SVM protein classification.
  2. [Lodhi et al., 2002] Text classification using string kernels.
  3. [Ratsch et al., 2004] Accurate Splice Site Prediction for Caenorhabditis Elegans.
  4. [Leslie et al., 2004] Mismatch string kernels for discriminative protein classification.

About

Data challenge to predict the binding nature of DNA sequence regions to specific transcription factors, using kernel methods

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published