Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release] 0.10.0 🎉 #127

Merged
merged 51 commits into from
Dec 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
6653ca7
[patch:lib] Switch out pomegranate HMM backend to hmmlearn (#105)
eonu Dec 25, 2020
cad2810
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 25, 2020
714b54e
[patch:lib] Re-implement KNNClassifier (#106)
eonu Dec 27, 2020
e91a834
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 27, 2020
370c867
[rm:pkg] Remove h5py dependency + add Python v3.9 (#107)
eonu Dec 28, 2020
fa790dc
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
5c3ebd6
[add:docs] Use intersphinx for external documentation links (#108)
eonu Dec 28, 2020
c8a88d6
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
591d677
[patch:docs] Change List[x] documentation syntax to list of x (#109)
eonu Dec 28, 2020
8009527
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
c07f922
[patch:docs] Fix minor typos in README.md (#110)
eonu Dec 28, 2020
6d99f04
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
33293b8
[patch:docs] Add warning to TrimZeros documentation (#111)
eonu Dec 28, 2020
e72d662
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
46a6974
[patch:lib] Change min-max scale bounds to floats (#112)
eonu Dec 28, 2020
688ec2e
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
878934b
[patch:lib] Allow array-like of transforms instead of list (#113)
eonu Dec 28, 2020
f3ede88
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
2383c0b
[patch:docs] Always have a newline after class docstring (#114)
eonu Dec 28, 2020
ae2b38a
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
c0f544a
[patch:test] Add docstrings to test cases (#115)
eonu Dec 28, 2020
f4a9114
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 28, 2020
b45929d
[add:lib] Add HMMClassifier serialization/deserialization (#116)
eonu Dec 29, 2020
cd8916a
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 29, 2020
e422f09
[patch:docs] Change input format notebook (#117)
eonu Dec 29, 2020
6733a63
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 29, 2020
40f27bb
[patch:lib] Use relative (fractional) Sakoe–Chiba window width (#118)
eonu Dec 29, 2020
05382a0
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 29, 2020
6625027
[patch:lib] Add __repr__ function to all classifiers (#120)
eonu Dec 29, 2020
faa8ec7
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 29, 2020
1368c09
[patch:lib] Use feature-independent warping (DTWI) (#123)
eonu Dec 29, 2020
f1e71bc
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 29, 2020
0027344
[patch:docs] Update copyright and move references (#125)
eonu Dec 29, 2020
677a213
Merge branch 'dev' of https://github.com/eonu/sequentia into dev
eonu Dec 29, 2020
3f5cada
Bump version
eonu Dec 29, 2020
d72b883
Add changelog entry
eonu Dec 29, 2020
d50320d
[patch:lib] Ensure minimum Sakoe-Chiba band width is 1 (#126)
eonu Dec 29, 2020
c5234d7
Merge branch 'dev' of https://github.com/eonu/sequentia into release/…
eonu Dec 29, 2020
5446f35
Add #126 to changelog
eonu Dec 29, 2020
63fa0ee
Add Cython and numpy to setup_requires
eonu Dec 30, 2020
b16dc22
Add scipy to setup_requires
eonu Dec 30, 2020
3262932
SandboxViolation fix
eonu Dec 30, 2020
626d7bd
Fix setuptools indentation
eonu Dec 30, 2020
7f3214f
make fix_setuptools work for Python 3
eonu Dec 30, 2020
3a307d4
Add Cython to install_requires
eonu Dec 30, 2020
55921d2
Don't use Cython on RTD
eonu Dec 30, 2020
d44b9fe
Add Cython to setup_requires
eonu Dec 30, 2020
1ea55df
Re-add fix_setuptools
eonu Dec 30, 2020
088059d
Don't use Cython above 0.29
eonu Dec 30, 2020
20f2748
Unrestrict Cython
eonu Dec 30, 2020
ce60f66
Add .readthedocs.yml config file
eonu Dec 30, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version: 2
python:
version: 3.7
install:
- requirements: docs/requirements.txt
- method: pip
path: .
system_packages: true
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@ sudo: false
# Specify Python and versions to test
language: python
python:
- 3.5
- 3.6
- 3.7
- 3.8
- 3.9

# Installation scripts
install:
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
## [0.10.0](https://github.com/eonu/sequentia/releases/tag/v0.10.0)

#### Major changes

- Switch out [`pomegranate`](https://github.com/jmschrei/pomegranate) HMM backend to [`hmmlearn`](https://github.com/hmmlearn/hmmlearn). ([#105](https://github.com/eonu/sequentia/pull/105))
- Remove separate HMM and GMM-HMM implementations – only keep a single GMM-HMM implementation (in the `GMMHMM` class) and treat multivariate Gaussian emission HMM as a special case of GMM-HMM. ([#105](https://github.com/eonu/sequentia/pull/105))
- Support string and numeric labels by using label encodings (from [`sklearn.preprocessing.LabelEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html)). ([#105](https://github.com/eonu/sequentia/pull/105))
- Add support for Python v3.6, v3.7, v3.8, v3.9 and remove support for v3.5. ([#105](https://github.com/eonu/sequentia/pull/105))
- Switch from approximate DTW algorithm ([`fastdtw`](https://github.com/slaypni/fastdtw)) to exact implementation ([`dtaidistance`](https://github.com/wannesm/dtaidistance)) for `KNNClassifier`. ([#106](https://github.com/eonu/sequentia/pull/106))

#### Minor changes

- Switch to use duck-typing for iterables instead of requiring lists. ([#105](https://github.com/eonu/sequentia/pull/105))
- Rename 'strict left-right' HMM topology to 'linear'. ([#105](https://github.com/eonu/sequentia/pull/105))
- Switch `m2r` to `m2r2`, as `m2r` is no longer maintained. ([#105](https://github.com/eonu/sequentia/pull/105))
- Change `covariance` to `covariance_type`, to match `hmmlearn`. ([#105](https://github.com/eonu/sequentia/pull/105))
- Use `numpy.random.RandomState(seed=None)` as default instead of `numpy.random.RandomState(seed=0)`. ([#105](https://github.com/eonu/sequentia/pull/105))
- Switch `KNNClassifier` serialization from HDF5 to pickling. ([#106](https://github.com/eonu/sequentia/pull/106))
- Use [`intersphinx`](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) for external documentation links, e.g. to `numpy`. ([#108](https://github.com/eonu/sequentia/pull/108))
- Change `MinMaxScale` bounds to floats. ([#112](https://github.com/eonu/sequentia/pull/112))
- Add `__repr__` function to `GMMHMM`, `HMMClassifier` and `KNNClassifier`. ([#120](https://github.com/eonu/sequentia/pull/120))
- Use feature-independent warping (DTWI). ([#121](https://github.com/eonu/sequentia/pull/121))
- Ensure minimum Sakoe-Chiba band width is 1. ([#126](https://github.com/eonu/sequentia/pull/126))

## [0.7.2](https://github.com/eonu/sequentia/releases/tag/v0.7.2)

#### Major changes
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2019-2021 Edwin Onuonga <ed@eonu.net>
Copyright (c) 2019-2022 Edwin Onuonga <ed@eonu.net>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
59 changes: 39 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,35 +30,46 @@

## Introduction

Sequential data is one of the most commonly observed forms of data. These can range from time series (sequences of observations occurring through time) to non-temporal sequences such as DNA nucleotides. Time series such as audio signals and stock prices are often of particular interest as changing patterns over time naturally provide many interesting opportunities and challenges for machine learning.
Sequential data is a commonly-observed, yet difficult-to-handle form of data. These can range from time series (sequences of observations occurring through time) to non-temporal sequences such as DNA nucleotides.

This library specifically aims to tackle classification problems for isolated sequences by creating an interface to a number of classification algorithms.
Time series data such as audio signals, stock prices and electro-cardiogram signals are often of particular interest to machine learning practitioners and researchers, as changing patterns over time naturally provide many interesting opportunities and challenges for machine learning prediction and statistical inference.

Despite these types of sequences sounding very specific, you probably observe some of them on a regular basis!
**Sequentia is a Python package that specifically aims to tackle classification problems for isolated sequences by providing implementations of a number of classification algorithms**.

**Some examples of classification problems for isolated sequences include classifying**:
Examples of such classification problems include:

- a word utterance by its speech audio signal,
- a hand-written character according to its pen-tip trajectory,
- a hand or head gesture in a video or motion-capture recording.
- classifying a spoken word based on its audio signal (or some other representation such as MFCCs),
- classifying a hand-written character according to its pen-tip trajectory,
- classifying a hand or head gesture in a motion-capture recording,
- classifying the sentiment of a phrase or sentence in natural language.

Compared to the classification of fixed-size inputs (e.g. a vector, or images), sequence classification problems face two major hurdles:

1. the sequences are generally of different duration to each other,
2. the observations within a given sequence (may) have temporal dependencies on previous observations which occured earlier within the same sequence, and these dependencies may be arbitrarily long.

Sequentia aims to provide interpretable out-of-the-box machine learning algorithms suitable for these tasks, which require minimal configuration.

In recent times, variants of the Recurrent Neural Network (particularly LSTMs and GRUs) have generally proven to be the most successful in modelling long-term dependencies in sequences. However, the design of RNN architectures is very opiniated and requires much configuration and engineering, and is therefore not included as part of the package.

## Features

Sequentia offers the use of multivariate observation sequences with varying durations using the following methods:
The following algorithms provided within Sequentia support the use of multivariate observation sequences with different durations.

### Classification algorithms

- [x] Hidden Markov Models (via [Pomegranate](https://github.com/jmschrei/pomegranate) [[1]](#references))<br/>Learning with the Baum-Welch algorithm <a href="#references">[2]</a>
- [x] Multivariate Gaussian emissions
- [x] Gaussian Mixture Model emissions (full and diagonal covariances)
- [x] Left-right and ergodic topologies
- [x] Approximate Dynamic Time Warping k-Nearest Neighbors (implemented with [FastDTW](https://github.com/slaypni/fastdtw) [[3]](#references))
- [x] Hidden Markov Models (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))<br/><em>Learning with the Baum-Welch algorithm [[1]](#references)</em>
- [x] Gaussian Mixture Model emissions
- [x] Linear, left-right and ergodic topologies
- [x] Dynamic Time Warping k-Nearest Neighbors (via [`dtaidistance`](https://github.com/wannesm/dtaidistance))
- [x] Sakoe–Chiba band global warping constraint
- [x] Feature-independent warping (DTWI)
- [x] Custom distance-weighted predictions
- [x] Multi-processed predictions

<p align="center">
<img src="https://i.ibb.co/jVD2S4b/classifier.png" width="60%"/><br/>
Example of a classification algorithm: a multi-class HMM isolated sequence classifier
<img src="/docs/_static/classifier.svg" width="60%"/><br/>
Example of a classification algorithm: <em>a multi-class HMM sequence classifier</em>
</p>

### Preprocessing methods
Expand All @@ -81,32 +92,40 @@ Documentation for the package is available on [Read The Docs](https://sequentia.

For tutorials and examples on the usage of Sequentia, [look at the notebooks here](https://nbviewer.jupyter.org/github/eonu/sequentia/tree/master/notebooks/).

## Acknowledgments

In earlier versions of the package (<0.10.0), an approximate dynamic time warping algorithm implementation ([`fastdtw`](https://github.com/slaypni/fastdtw)) was used in hopes of speeding up k-NN predictions, as the authors of the original FastDTW paper [[2]](#references) claim that approximated DTW alignments can be computed in linear memory and time - compared to the O(N^2) runtime complexity of the usual exact DTW implementation.

However, I was recently contacted by [Prof. Eamonn Keogh](https://www.cs.ucr.edu/~eamonn/) (at _University of California, Riverside_), whose recent work [[3]](#references) makes the surprising revelation that FastDTW is generally slower than the exact DTW algorithm that it approximates. Upon switching from the `fastdtw` package to [`dtaidistance`](https://github.com/wannesm/dtaidistance) (a very solid implementation of exact DTW with fast pure C compiled functions), DTW k-NN prediction times were indeed reduced drastically.

I would like to thank Prof. Eamonn Keogh for directly reaching out to me regarding this finding!

## References

<table>
<tbody>
<tr>
<td>[1]</td>
<td>
<a href="http://jmlr.org/papers/volume18/17-636/17-636.pdf">Jacob Schreiber. <b>"pomegranate: Fast and Flexible Probabilistic Modeling in Python."</b> Journal of Machine Learning Research 18 (2018), (164):1-6.</a>
<a href=https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf">Lawrence R. Rabiner. <b>"A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition"</b> <em>Proceedings of the IEEE 77 (1989)</em>, no. 2, pp. 257-86.</a>
</td>
</tr>
<tr>
<td>[2]</td>
<td>
<a href=https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf">Lawrence R. Rabiner. <b>"A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition"</b> Proceedings of the IEEE 77 (1989), no. 2, pp. 257-86.</a>
<a href="https://pdfs.semanticscholar.org/05a2/0cde15e172fc82f32774dd0cf4fe5827cad2.pdf">Stan Salvador & Philip Chan. <b>"FastDTW: Toward accurate dynamic time warping in linear time and space."</b> <em>Intelligent Data Analysis 11.5 (2007)</em>, 561-580.</a>
</td>
</tr>
<tr>
<td>[3]</td>
<td>
<a href="https://pdfs.semanticscholar.org/05a2/0cde15e172fc82f32774dd0cf4fe5827cad2.pdf">Stan Salvador, and Philip Chan. <b>"FastDTW: Toward accurate dynamic time warping in linear time and space."</b> Intelligent Data Analysis 11.5 (2007), 561-580.</a>
<a href="https://arxiv.org/ftp/arxiv/papers/2003/2003.11246.pdf">Renjie Wu & Eamonn J. Keogh. <b>"FastDTW is approximate and Generally Slower than the Algorithm it Approximates"</b> <em>IEEE Transactions on Knowledge and Data Engineering (2020)</em>, 1–1.</a>
</td>
</tr>
</tbody>
</table>

# Contributors
## Contributors

All contributions to this repository are greatly appreciated. Contribution guidelines can be found [here](/CONTRIBUTING.md).

Expand All @@ -130,6 +149,6 @@ All contributions to this repository are greatly appreciated. Contribution guide
---

<p align="center">
<b>Sequentia</b> &copy; 2019-2021, Edwin Onuonga - Released under the <a href="https://opensource.org/licenses/MIT">MIT</a> License.<br/>
<b>Sequentia</b> &copy; 2019-2022, Edwin Onuonga - Released under the <a href="https://opensource.org/licenses/MIT">MIT</a> License.<br/>
<em>Authored and maintained by Edwin Onuonga.</em>
</p>
2 changes: 1 addition & 1 deletion docs/_includes/examples/classifiers/gmmhmm.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
X = [np.random.random((10 * i, 3)) for i in range(1, 4)]

# Create and fit a left-right HMM with random transitions and initial state distribution
hmm = GMMHMM(label='class1', n_states=5, n_components=3, covariance='diagonal', topology='left-right')
hmm = GMMHMM(label='class1', n_states=10, n_components=3, topology='left-right', covariance_type='diag')
hmm.set_random_initial()
hmm.set_random_transitions()
hmm.fit(X)
11 changes: 0 additions & 11 deletions docs/_includes/examples/classifiers/hmm.py

This file was deleted.

6 changes: 3 additions & 3 deletions docs/_includes/examples/classifiers/hmm_classifier.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import numpy as np
from sequentia.classifiers import HMM, HMMClassifier
from sequentia.classifiers import GMMHMM, HMMClassifier

# Set of possible labels
labels = ['class{}'.format(i) for i in range(5)]

# Create and fit some sample HMMs
hmms = []
for i, label in enumerate(labels):
hmm = HMM(label=label, n_states=(i + 3), topology='left-right')
hmm = GMMHMM(label=label, n_states=(i + 3), n_components=2, topology='left-right')
hmm.set_random_initial()
hmm.set_random_transitions()
hmm.fit([np.arange((i + j * 20) * 30).reshape(-1, 3) for j in range(1, 4)])
Expand All @@ -21,4 +21,4 @@
clf = HMMClassifier()
clf.fit(hmms)
predictions = clf.predict(X)
accuracy, confusion = clf.evaluate(X, y, labels=labels)
accuracy, confusion = clf.evaluate(X, y)
2 changes: 1 addition & 1 deletion docs/_includes/examples/classifiers/knn_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
y = ['class0', 'class1', 'class1']

# Create and fit the classifier
clf = KNNClassifier(k=1, radius=5)
clf = KNNClassifier(k=1, classes=list(set(y)))
clf.fit(X, y)

# Predict labels for the training data (just as an example)
Expand Down
Binary file removed docs/_static/classifier.png
Binary file not shown.
3 changes: 3 additions & 0 deletions docs/_static/classifier.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/covariance_types.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/hmm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading