Skip to content

Nico-Curti/PhDthesis

Repository files navigation

Author Project Documentation Build Status
N. Curti PhDthesis docs Linux : travisCI
Windows : miss
Supervisor Co-Supervisor
Prof. D. Remondini Prof. G. Castellani
Prof. A. Bazzani

GitHub pull-requests GitHub issues

GitHub stars GitHub watchers

Implementation and optimization of algorithms in Biomedical Big Data Analytics

Abstract

Big Data Analytics poses many challenges to the research community who has to handle several computational problems related to the vast amount of data. An increasing interest involves Biomedical data aiming to get the so-called "personalized medicine", where therapy plans are designed on the specific genotype and phenotype of the individual patient and algorithm optimization plays a key role to this purpose. In this work we discuss about several topics related to Biomedical Big Data Analytics with a special attention to numerical issues and algorithmic solutions related to them. We introduce a novel feature selection algorithm tailored on omics datasets, proving its efficiency on synthetic and real high-throughput genomic datasets. The proposed algorithm is a supervised signature identification method based on a bottom-up combinatorial approach that exploits the discriminant power of all variable pairs. We tested our algorithm against other state-of-art models and it outperforms existing results or compares to them.

We also implement and optimize different types of deep learning models, testing their efficiency on biomedical image processing tasks. Three customized frameworks for deep learning neural network models development are discussed and used to describe the numerical improvements proposed on the various topics. In the first implementation we optimize two Super Resolution models and we show their results on NMR images, proving their efficiency in generalization tasks without a retraining. The second optimization involves a state-of-art Object Detection neural network architecture, obtaining a significant speedup in computational performance. We also highlight how Super Resolution models are able to overcome object detection issues and increase detection performance. In the third application we discuss about femur head segmentation problem on CT images: a semi-automated pipeline for the image annotation is proposed and a deep learning neural network model trained on these images.

The last section of this work is the implementation of a novel biomedical database obtained by the harmonization of multiple data sources that provide network-like relationship between biomedical entities. The data involved in this project related to diseases, symptoms and other biological relates were mined using web-scraping methods, and a novel natural language processing pipeline was designed to maximize the overlap between the different data sources. We describe the key steps which lead us to this network-of-networks database and we discuss its potential applications to biomedical research.

Installation

To compile the project you can use the Makefile with the simple make command. In this way all the figure into the img directory will be converted into a pdf_tex and the full Pdf document will be generated.

The on-line version of the thesis can be found on the gitbook version or via the github web.

Table of contents

License

The Implementation and optimization of algorithms in Biomedical Big Data Analytics document is licensed under the MIT "Expat" License. License

Acknowledgment

Thanks goes to all contributors of this project.

Citation

Please cite Implementation and optimization of algorithms in Biomedical Big Data Analytics if you use it in your research.

@misc{PhDtheis,
  author = {Nico Curti},
  title = {Implementation and optimization of algorithms in Biomedical Big Data Analytics},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://nico-curti2.gitbook.io/phd-thesis/}},
}