Skip to content
@ML4GLand

Machine Learning for Genomics

A collection of tools for investigating how DNA encodes function with machine learning

Welcome to the land of machine learning for genomics!

ML4GLand is a community for that develops and maintains tools (primarily in Python) for genomics sequence based machine learning.

Why?

Deep learning has become a popular tool for investigating gene regulation, including DNA and RNA protein binding specificity, chromatin state and architecture, and transcriptional activity. However, executing a typical workflow for building and interpreting deep learning models remains a challenge. Training nuances specific to genomics data along with complex preprocessing and interpretation methods create an especially high learning curve, and heterogeneity in implementations of most code associated with publications hinders reproducibility and extensibility. A tool for exposing existing data, models and methods to computational scientists, that can also serve as a platform for development, will greatly improve our ability to use sequence-based machine learning to interrogate gene regulatory mechanisms.

We aim to build a framework for developing sequence-to-function deep learning models

Previous work has shown the utility of such frameworks. DeepChem and scverse are excellent examples. Our mission is to put together a similar ecosystem for sequence based genomics.

Core packages

  • SeqPro -- a Python package for processing DNA/RNA sequences for machine learning.
  • SeqData -- a Python package for preparing machine learning-ready genomic sequence datasets.
  • SeqExplainer -- a Python package for interpreting sequence-to-function machine learning models.
  • EUGENe -- a Python package for streamlining and customizing end-to-end deep-learning sequence analyses in regulatory genomics.

Ecosystem packages

  • SeqDatasets -- a repository for downloading datasets and loading them with SeqData.
  • MotifData -- a Python package for handling motifs.

Usage repositories

  • tutorials - a repository of tutorials for ML4GLand tools.
  • use cases -- a repositoy of use cases that showcase ML4GLand tools potential ecosystem packages.

Pinned Loading

  1. EUGENe EUGENe Public

    Elucidating the Utility of Genomic Elements with Neural Nets

    Jupyter Notebook 61 4

  2. SeqExplainer SeqExplainer Public

    Interpreting sequence-to-function machine learning models

    Jupyter Notebook 2 1

  3. SeqPro SeqPro Public

    Genomic sequence preprocessing toolkit

    Python 7 1

  4. tutorials tutorials Public

    A set of tutorials for how to use all the tools in ML4GLand

    Jupyter Notebook

  5. SeqData SeqData Public

    Annotated sequence data

    Jupyter Notebook 10 1

  6. use_cases use_cases Public

    Repository documenting applications of the ML4GLand suite on published datasets

    Jupyter Notebook

Repositories

Showing 10 of 23 repositories
  • SeqPro Public

    Genomic sequence preprocessing toolkit

    ML4GLand/SeqPro’s past year of commit activity
    Python 7 MIT 1 1 0 Updated Aug 19, 2024
  • SeqData Public

    Annotated sequence data

    ML4GLand/SeqData’s past year of commit activity
    Jupyter Notebook 10 MIT 1 0 0 Updated Aug 6, 2024
  • Horlacher_HepG2_CLIP Public

    Predicting protein-RNA interaction via sequence-to-signal learning from CLIP-seq data

    ML4GLand/Horlacher_HepG2_CLIP’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jul 20, 2024
  • ML4GLand/pbmc3k_10X-Multiome’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jun 10, 2024
  • K562_ATAC-seq Public

    ENCODE K562_ATAC-seq dataset

    ML4GLand/K562_ATAC-seq’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jun 10, 2024
  • ML4GLand/K562_STARR-seq’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jun 10, 2024
  • ML4GLand/HepG2_U2AF2-eCLIP’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jun 3, 2024
  • ML4GLand/K562_CTCF-ChIP-seq’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jun 3, 2024
  • EUGENe Public

    Elucidating the Utility of Genomic Elements with Neural Nets

    ML4GLand/EUGENe’s past year of commit activity
    Jupyter Notebook 61 MIT 4 8 0 Updated Apr 16, 2024
  • SeqExplainer Public

    Interpreting sequence-to-function machine learning models

    ML4GLand/SeqExplainer’s past year of commit activity
    Jupyter Notebook 2 MIT 1 0 0 Updated Jan 25, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…