diff --git a/README.md b/README.md index aed7e59c..a24a4670 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@

- A machine learning interface for isolated temporal sequence classification algorithms in Python. + A machine learning interface for isolated sequence classification algorithms in Python.

@@ -30,32 +30,31 @@ ## Introduction -Temporal sequences are sequences of observations that occur over time. Changing patterns over time naturally provide many interesting opportunities and challenges for machine learning. +Sequential data is one of the most commonly observed forms of data. These can range from time series (sequences of observations occurring through time) to non-temporal sequences such as DNA nucleotides. Time series such as audio signals and stock prices are often of particular interest as changing patterns over time naturally provide many interesting opportunities and challenges for machine learning. -This library specifically aims to tackle classification problems for isolated temporal sequences by creating an interface to a number of classification algorithms. +This library specifically aims to tackle classification problems for isolated sequences by creating an interface to a number of classification algorithms. Despite these types of sequences sounding very specific, you probably observe some of them on a regular basis! -**Some examples of classification problems for isolated temporal sequences include classifying**: +**Some examples of classification problems for isolated sequences include classifying**: -- word utterances in speech audio signals, -- hand-written characters according to their pen-tip trajectories, -- hand or head gestures in a video or motion-capture recording. +- a word utterance by its speech audio signal, +- a hand-written character according to its pen-tip trajectory, +- a hand or head gesture in a video or motion-capture recording. ## Features -Sequentia offers the use of multivariate observation sequences with varying durations in conjunction with the following algorithms and methods: +Sequentia offers the use of multivariate observation sequences with varying durations using the following methods: ### Classification algorithms -- [x] Hidden Markov Models (via [Pomegranate](https://github.com/jmschrei/pomegranate) [[1]](#references)) +- [x] Hidden Markov Models (via [Pomegranate](https://github.com/jmschrei/pomegranate) [[1]](#references))
Learning with the Baum-Welch algorithm [2] - [x] Multivariate Gaussian emissions - [x] Gaussian Mixture Model emissions (full and diagonal covariances) - [x] Left-right and ergodic topologies -- [x] Approximate Dynamic Time Warping k-Nearest Neighbors (implemented with [FastDTW](https://github.com/slaypni/fastdtw) [[2]](#references)) +- [x] Approximate Dynamic Time Warping k-Nearest Neighbors (implemented with [FastDTW](https://github.com/slaypni/fastdtw) [[3]](#references)) - [x] Custom distance-weighted predictions - [x] Multi-processed predictions -- [ ] Long Short-Term Memory Networks (_soon!_)


@@ -94,6 +93,12 @@ For tutorials and examples on the usage of Sequentia, [look at the notebooks her [2] + + Lawrence R. Rabiner. "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition" Proceedings of the IEEE 77 (1989), no. 2, pp. 257-86. + + + + [3] Stan Salvador, and Philip Chan. "FastDTW: Toward accurate dynamic time warping in linear time and space." Intelligent Data Analysis 11.5 (2007), 561-580. diff --git a/docs/index.rst b/docs/index.rst index 7afd0567..202dbdcc 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -11,7 +11,7 @@ About ===== -Sequentia is a collection of machine learning algorithms for performing the classification of isolated temporal sequences. +Sequentia is a collection of machine learning algorithms for performing the classification of isolated sequences. Each isolated sequence is generally modeled as a section of a longer multivariate time series that represents the entire sequence. Naturally, this fits the description of many types of problems such as: diff --git a/lib/sequentia/classifiers/hmm/gmmhmm.py b/lib/sequentia/classifiers/hmm/gmmhmm.py index 08844dd5..4c7bc773 100644 --- a/lib/sequentia/classifiers/hmm/gmmhmm.py +++ b/lib/sequentia/classifiers/hmm/gmmhmm.py @@ -2,7 +2,7 @@ from .hmm import HMM class GMMHMM(HMM): - """A hidden Markov model representing an isolated temporal sequence class, + """A hidden Markov model representing an isolated sequence class, with mixtures of multivariate Gaussian components representing state emission distributions. Parameters diff --git a/lib/sequentia/classifiers/hmm/hmm.py b/lib/sequentia/classifiers/hmm/hmm.py index 1a4071f1..ec9586c1 100644 --- a/lib/sequentia/classifiers/hmm/hmm.py +++ b/lib/sequentia/classifiers/hmm/hmm.py @@ -5,7 +5,7 @@ from ...internals import _Validator class HMM: - """A hidden Markov model representing an isolated temporal sequence class. + """A hidden Markov model representing an isolated sequence class. Parameters ---------- diff --git a/notebooks/Pen-Tip Trajectories (Example).ipynb b/notebooks/Pen-Tip Trajectories (Example).ipynb index 89d9da10..743b6fe0 100644 --- a/notebooks/Pen-Tip Trajectories (Example).ipynb +++ b/notebooks/Pen-Tip Trajectories (Example).ipynb @@ -371,7 +371,7 @@ "\n", "The $k$-Nearest Neighbor ($k$-NN) classifier is a conceptually simple machine learning algorithm that is also easy to implement. As a result, it is often used as a baseline, despite often being able to perform much better than more complex algorithms.\n", "\n", - "However, applying $k$-NN to isolated temporal observation sequences is not so straightforward since different observation sequences may have different durations, making it difficult to come up with a distance measure that can be used to compare the two sequences. \n", + "However, applying $k$-NN to isolated observation sequences is not so straightforward since different observation sequences may have different durations, making it difficult to come up with a distance measure that can be used to compare the two sequences. \n", "\n", "One such appropriate distance measure is [Dynamic Time Warping](https://en.wikipedia.org/wiki/Dynamic_time_warping). However, due to the non-parametric nature of $k$-NN, it may take very long to predict new observation sequences. In an effort to reduce this wait, Sequentia uses the [FastDTW](https://github.com/slaypni/fastdtw) implementation of the Dynamic Time Warping algorithm, which allows for faster, configurable approximatations to the DTW distance calculations which can save memory and time. \n", "\n", diff --git a/setup.py b/setup.py index 4047993b..fd50677e 100644 --- a/setup.py +++ b/setup.py @@ -13,7 +13,7 @@ version = VERSION, author = 'Edwin Onuonga', author_email = 'ed@eonu.net', - description = 'A machine learning interface for isolated temporal sequence classification algorithms in Python.', + description = 'A machine learning interface for isolated sequence classification algorithms in Python.', long_description = long_description, long_description_content_type = 'text/markdown', url = 'https://github.com/eonu/sequentia',