Merge branch 'master' of https://github.com/eonu/sequentia into master

eonu · Jan 7, 2021 · 403f0e6 · 403f0e6
2 parents 2e23549 + da74076
commit 403f0e6
Show file tree

Hide file tree

Showing 10 changed files with 178 additions and 78 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,17 @@
+## [0.10.2](https://github.com/eonu/sequentia/releases/tag/v0.10.2)
+
+#### Major changes
+
+- Add support for dependent feature warping (addresses [#124](https://github.com/eonu/sequentia/pull/124)). ([#135](https://github.com/eonu/sequentia/pull/135))
+- Add multi-processed predictions for `HMMClassifier` (addresses [#121](https://github.com/eonu/sequentia/pull/121)). ([#136](https://github.com/eonu/sequentia/pull/136))
+- Re-order `predict()` and `evaluate()` arguments. ([#138](https://github.com/eonu/sequentia/pull/138))
+
+#### Minor changes
+
+- Add `original_labels` documentation to `KNNClassifier`. ([#133](https://github.com/eonu/sequentia/pull/133))
+- Simplify `GMMHMM` documentation. ([#134](https://github.com/eonu/sequentia/pull/134))
+- Fix posterior comment in `classifier.svg`. ([#137](https://github.com/eonu/sequentia/pull/137))
+
 ## [0.10.1](https://github.com/eonu/sequentia/releases/tag/v0.10.1)
 
 #### Minor changes

diff --git a/README.md b/README.md
@@ -58,12 +58,13 @@ The following algorithms provided within Sequentia support the use of multivaria
 
 ### Classification algorithms
 
-- [x] Hidden Markov Models (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))<br/><em>Learning with the Baum-Welch algorithm [[1]](#references)</em>
+- [x] Hidden Markov Models (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))<br/><em>Learning with the Baum-Welch algorithm</em> [[1]](#references)
   - [x] Gaussian Mixture Model emissions
   - [x] Linear, left-right and ergodic topologies
+  - [x] Multi-processed predictions
 - [x] Dynamic Time Warping k-Nearest Neighbors (via [`dtaidistance`](https://github.com/wannesm/dtaidistance))
   - [x] Sakoe–Chiba band global warping constraint
-  - [x] Feature-independent warping (DTWI)
+  - [x] Dependent and independent feature warping (DTWD & DTWI)
   - [x] Custom distance-weighted predictions
   - [x] Multi-processed predictions
 

diff --git a/docs/_static/classifier.svg b/docs/_static/classifier.svg
diff --git a/docs/sections/classifiers/gmmhmm.rst b/docs/sections/classifiers/gmmhmm.rst
@@ -70,17 +70,16 @@ Note that even in the case that multiple Gaussian densities are not needed, the
 can be adjusted so that irrelevant Gaussians are omitted and only a single Gaussian remains.
 However, the default setting of the :class:`~GMMHMM` class is a single Gaussian.
 
-Then a GMM-HMM is completely determined by the learnable parameters
-:math:`\lambda=(\boldsymbol{\pi}, A, B)` where :math:`B=(C,\Pi,\Psi)` and
+Then a GMM-HMM is completely determined by
+:math:`\lambda=(\boldsymbol{\pi}, A, B)`, where :math:`B` is a collection of
+:math:`M` emission distributions (one for each state :math:`m=1,\ldots,M`), which are each
+parameterized by a collection of
 
-- :math:`C=\big(c_1^{(m)}, \ldots, c_K^{(m)}\big)_{m=1}^M` is
-  a collection of the mixture weights,
-- :math:`\Pi=\big(\boldsymbol\mu_1^{(m)}, \ldots, \boldsymbol\mu_K^{(m)}\big)_{m=1}^M` is
-  a collection of the mean vectors,
-- :math:`\Psi=\big(\Sigma_1^{(m)}, \ldots, \Sigma_K^{(m)}\big)_{m=1}^M` is
-  a collection of the covariance matrices,
+- mixture weights :math:`c_1^{(m)}, \ldots, c_K^{(m)}`,
+- mean vectors :math:`\boldsymbol\mu_1^{(m)}, \ldots, \boldsymbol\mu_K^{(m)}`,
+- covariance matrices :math:`\Sigma_1^{(m)}, \ldots, \Sigma_K^{(m)}`,
 
-for every mixture component of each state of the HMM.
+for each of the :math:`1,\ldots,K` mixture components of each state.
 
 Usually if :math:`K` is large enough, a mixture of :math:`K` Gaussian densities can effectively
 model any probability density function. With large enough :math:`K`, we can also restrict the
@@ -89,8 +88,8 @@ and at the same time decrease the number of parameters that need to be updated d
 
 The covariance matrix type can be specified by a string parameter ``covariance_type`` in the
 :class:`~GMMHMM` constructor that takes values `'spherical'`, `'diag'`, `'full'` or `'tied'`.
-The various types are explained well in this `StackExchange answer <https://stats.stackexchange.com/a/326678>`_,
-and summarized in the below image (also courtesy of the same StackExchange answerer).
+The various types are explained well `here <https://stats.stackexchange.com/a/326678>`_,
+and summarized in the below image (also courtesy of the author of the response in the previous link).
 
 .. image:: /_static/covariance_types.png
     :alt: Covariance Types

diff --git a/lib/sequentia/__init__.py b/lib/sequentia/__init__.py
@@ -1,4 +1,4 @@
-__version__ = '0.10.1'
+__version__ = '0.10.2'
 
 from .classifiers import *
 from .preprocessing import *
diff --git a/lib/sequentia/classifiers/hmm/hmm_classifier.py b/lib/sequentia/classifiers/hmm/hmm_classifier.py
@@ -1,4 +1,6 @@
-import numpy as np, pickle
+import tqdm, tqdm.auto, numpy as np, pickle
+from joblib import Parallel, delayed
+from multiprocessing import cpu_count
 from .gmmhmm import GMMHMM
 from sklearn.metrics import confusion_matrix
 from sklearn.preprocessing import LabelEncoder
@@ -43,7 +45,7 @@ def fit(self, models):
         self._encoder = LabelEncoder()
         self._encoder.fit([model.label for model in models])
 
-    def predict(self, X, prior='frequency', return_scores=False, original_labels=True):
+    def predict(self, X, prior='frequency', return_scores=False, original_labels=True, verbose=True, n_jobs=1):
         """Predicts the label for an observation sequence (or multiple sequences) according to maximum likelihood or posterior scores.
 
         Parameters
@@ -66,6 +68,18 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru
         original_labels: bool
             Whether to inverse-transform the labels to their original encoding.
 
+        verbose: bool
+            Whether to display a progress bar or not.
+
+            .. note::
+                If both ``verbose=True`` and ``n_jobs > 1``, then the progress bars for each process
+                are always displayed in the console, regardless of where you are running this function from
+                (e.g. a Jupyter notebook).
+
+        n_jobs: int > 0 or -1
+            | The number of jobs to run in parallel.
+            | Setting this to -1 will use all available CPU cores.
+
         Returns
         -------
         prediction(s): str/numeric or :class:`numpy:numpy.ndarray` (str/numeric)
@@ -92,6 +106,9 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru
         else:
             self._val.one_of(prior, ['frequency', 'uniform'], desc='prior')
         self._val.boolean(return_scores, desc='return_scores')
+        self._val.boolean(original_labels, desc='original_labels')
+        self._val.boolean(verbose, desc='verbose')
+        self._val.restricted_integer(n_jobs, lambda x: x == -1 or x > 0, 'number of jobs', '-1 or greater than zero')
 
         # Create look-up for prior probabilities
         if prior == 'frequency':
@@ -105,10 +122,15 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru
         # Convert single observation sequence to a singleton list
         X = [X] if isinstance(X, np.ndarray) else X
 
+        # Lambda for calculating the log un-normalized posteriors as a sum of the log forward probabilities (likelihoods) and log priors
+        posteriors = lambda x: np.array([model.forward(x) + np.log(prior[model.label]) for model in self._models])
+
         # Calculate log un-normalized posteriors as a sum of the log forward probabilities (likelihoods) and log priors
         # Perform the MAP classification rule and return labels to original encoding if necessary
-        posteriors = lambda x: np.array([model.forward(x) + np.log(prior[model.label]) for model in self._models])
-        scores = np.array([posteriors(x) for x in X])
+        n_jobs = min(cpu_count() if n_jobs == -1 else n_jobs, len(X))
+        X_chunks = [list(chunk) for chunk in np.array_split(np.array(X, dtype=object), n_jobs)]
+        scores = Parallel(n_jobs=n_jobs)(delayed(self._chunk_predict)(i+1, posteriors, chunk, verbose) for i, chunk in enumerate(X_chunks))
+        scores = np.concatenate(scores)
         best_idxs = np.atleast_1d(scores.argmax(axis=1))
         labels = self._encoder.inverse_transform(best_idxs) if original_labels else best_idxs
 
@@ -117,7 +139,7 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru
         else:
             return (labels, scores) if return_scores else labels
 
-    def evaluate(self, X, y, prior='frequency'):
+    def evaluate(self, X, y, prior='frequency', verbose=True, n_jobs=1):
         """Evaluates the performance of the classifier on a batch of observation sequences and their labels.
 
         Parameters
@@ -137,6 +159,18 @@ def evaluate(self, X, y, prior='frequency'):
 
             Alternatively, class prior probabilities can be specified in an iterable of floats, e.g. `[0.1, 0.3, 0.6]`.
 
+        verbose: bool
+            Whether to display a progress bar or not.
+
+            .. note::
+                If both ``verbose=True`` and ``n_jobs > 1``, then the progress bars for each process
+                are always displayed in the console, regardless of where you are running this function from
+                (e.g. a Jupyter notebook).
+
+        n_jobs: int > 0 or -1
+            | The number of jobs to run in parallel.
+            | Setting this to -1 will use all available CPU cores.
+
         Returns
         -------
         accuracy: float
@@ -146,7 +180,7 @@ def evaluate(self, X, y, prior='frequency'):
             The confusion matrix representing the discrepancy between predicted and actual labels.
         """
         X, y = self._val.observation_sequences_and_labels(X, y)
-        predictions = self.predict(X, prior=prior, return_scores=False, original_labels=False)
+        predictions = self.predict(X, prior=prior, return_scores=False, original_labels=False, verbose=verbose, n_jobs=n_jobs)
         cm = confusion_matrix(self._encoder.transform(y), predictions, labels=self._encoder.transform(self._encoder.classes_))
         return np.sum(np.diag(cm)) / np.sum(cm), cm
 
@@ -183,6 +217,13 @@ def load(cls, path):
         with open(path, 'rb') as file:
             return pickle.load(file)
 
+    def _chunk_predict(self, process, posteriors, chunk, verbose): # Requires fit
+        """Makes predictions (scores) for a chunk of the observation sequences, for a given subprocess."""
+        return np.array([posteriors(x) for x in tqdm.auto.tqdm(
+            chunk, desc='Classifying examples (process {})'.format(process),
+            disable=not(verbose), position=process-1
+        )])
+
     @property
     def models(self):
         try:

diff --git a/lib/sequentia/classifiers/knn/knn_classifier.py b/lib/sequentia/classifiers/knn/knn_classifier.py
@@ -1,7 +1,7 @@
 import warnings, tqdm, tqdm.auto, numpy as np, types, pickle, marshal
 from joblib import Parallel, delayed
 from multiprocessing import cpu_count
-from dtaidistance import dtw
+from dtaidistance import dtw, dtw_ndim
 from sklearn.metrics import confusion_matrix
 from sklearn.preprocessing import LabelEncoder
 from ...internals import _Validator
@@ -20,7 +20,7 @@ class KNNClassifier:
     k: int > 0
         Number of neighbors.
 
-    classes: array-liike of str/numeric
+    classes: array-like of str/numeric
         The complete set of possible classes/labels.
 
     weighting: 'uniform' or callable
@@ -60,6 +60,9 @@ class KNNClassifier:
 
                 pip install -vvv --upgrade --no-cache-dir --force-reinstall dtaidistance
 
+    independent: bool
+        Whether or not to allow features to be warped independently from each other. See `here <https://www.cs.ucr.edu/~eamonn/Multi-Dimensional_DTW_Journal.pdf>`_ for a good overview of both approaches.
+
     random_state: numpy.random.RandomState, int, optional
         A random state object or seed for reproducible randomness.
 
@@ -84,7 +87,7 @@ class KNNClassifier:
         The complete set of possible classes/labels.
     """
 
-    def __init__(self, k, classes, weighting='uniform', window=1., use_c=False, random_state=None):
+    def __init__(self, k, classes, weighting='uniform', window=1., use_c=False, independent=False, random_state=None):
         self._val = _Validator()
         self._k = self._val.restricted_integer(
             k, lambda x: x > 0, desc='number of neighbors', expected='greater than zero')
@@ -116,6 +119,9 @@ def __init__(self, k, classes, weighting='uniform', window=1., use_c=False, rand
             warnings.warn('DTAIDistance C library not available – using Python implementation', ImportWarning)
             self._use_c = False
 
+        self._independent = self._val.boolean(independent, 'independent')
+        self._dtw = self._dtwi if independent else self._dtwd
+
     def fit(self, X, y):
         """Fits the classifier by adding labeled training observation sequences.
 
@@ -131,14 +137,17 @@ def fit(self, X, y):
         self._X, self._y = X, self._encoder.transform(y)
         self._n_features = X[0].shape[1]
 
-    def predict(self, X, verbose=True, original_labels=True, n_jobs=1):
+    def predict(self, X, original_labels=True, verbose=True, n_jobs=1):
         """Predicts the label for an observation sequence (or multiple sequences).
 
         Parameters
         ----------
         X: numpy.ndarray (float) or list of numpy.ndarray (float)
             An individual observation sequence or a list of multiple observation sequences.
 
+        original_labels: bool
+            Whether to inverse-transform the labels to their original encoding.
+
         verbose: bool
             Whether to display a progress bar or not.
 
@@ -165,16 +174,17 @@ def predict(self, X, verbose=True, original_labels=True, n_jobs=1):
             raise RuntimeError('The classifier needs to be fitted before predictions are made')
 
         X = self._val.observation_sequences(X, allow_single=True)
+        self._val.boolean(original_labels, desc='original_labels')
         self._val.boolean(verbose, desc='verbose')
         self._val.restricted_integer(n_jobs, lambda x: x == -1 or x > 0, 'number of jobs', '-1 or greater than zero')
 
         if isinstance(X, np.ndarray):
             distances = np.array([self._dtw(X, x) for x in tqdm.auto.tqdm(self._X, desc='Calculating distances', disable=not(verbose))])
             return self._output(self._find_nearest(distances), original_labels)
         else:
-            n_jobs = cpu_count() if n_jobs == -1 else n_jobs
+            n_jobs = min(cpu_count() if n_jobs == -1 else n_jobs, len(X))
             X_chunks = [list(chunk) for chunk in np.array_split(np.array(X, dtype=object), n_jobs)]
-            labels = Parallel(n_jobs=min(n_jobs, len(X)))(delayed(self._chunk_predict)(i+1, chunk, verbose) for i, chunk in enumerate(X_chunks))
+            labels = Parallel(n_jobs=n_jobs)(delayed(self._chunk_predict)(i+1, chunk, verbose) for i, chunk in enumerate(X_chunks))
             return self._output(np.concatenate(labels), original_labels) # Flatten the resulting array
 
     def evaluate(self, X, y, verbose=True, n_jobs=1):
@@ -205,7 +215,7 @@ def evaluate(self, X, y, verbose=True, n_jobs=1):
         """
         X, y = self._val.observation_sequences_and_labels(X, y)
         self._val.boolean(verbose, desc='verbose')
-        predictions = self.predict(X, verbose=verbose, original_labels=False, n_jobs=n_jobs)
+        predictions = self.predict(X, original_labels=False, verbose=verbose, n_jobs=n_jobs)
         cm = confusion_matrix(self._encoder.transform(y), predictions, labels=self._encoder.transform(self._encoder.classes_))
         return np.sum(np.diag(cm)) / np.sum(cm), cm
 
@@ -235,6 +245,7 @@ def save(self, path):
                 'weighting': marshal.dumps((self._weighting.__code__, self._weighting.__name__)),
                 'window': self._window,
                 'use_c': self._use_c,
+                'independent': self._independent,
                 'random_state': self._random_state,
                 'X': self._X,
                 'y': self._y,
@@ -259,7 +270,7 @@ def load(cls, path):
             data = pickle.load(file)
 
             # Check deserialized object dictionary and keys
-            keys = set(('k', 'classes', 'weighting', 'window', 'use_c', 'random_state', 'X', 'y', 'n_features'))
+            keys = set(('k', 'classes', 'weighting', 'window', 'use_c', 'independent', 'random_state', 'X', 'y', 'n_features'))
             if not isinstance(data, dict):
                 raise TypeError('Expected deserialized object to be a dictionary - make sure the object was serialized with the save() function')
             else:
@@ -277,6 +288,7 @@ def load(cls, path):
                 weighting=weighting,
                 window=data['window'],
                 use_c=data['use_c'],
+                independent=data['independent'],
                 random_state=data['random_state']
             )
 
@@ -290,11 +302,16 @@ def _dtw_1d(self, a, b, window): # Requires fit
         """Computes the DTW distance between two univariate sequences."""
         return dtw.distance(a, b, use_c=self._use_c, window=window)
 
-    def _dtw(self, A, B): # Requires fit
-        """Computes the multivariate DTW distance as the sum of the pairwise per-feature DTW distances."""
+    def _dtwi(self, A, B): # Requires fit
+        """Computes the multivariate DTW distance as the sum of the pairwise per-feature DTW distances, allowing each feature to be warped independently."""
         window = max(1, int(self._window * max(len(A), len(B))))
         return np.sum([self._dtw_1d(A[:, i], B[:, i], window=window) for i in range(self._n_features)])
 
+    def _dtwd(self, A, B): # Requires fit
+        """Computes the multivariate DTW distance so that the warping of the features depends on each other, by modifying the local distance measure."""
+        window = max(1, int(self._window * max(len(A), len(B))))
+        return dtw_ndim.distance(A, B, use_c=self._use_c, window=window)
+
     def _argmax(self, a):
         """Same as numpy.argmax but returns all occurrences of the maximum, and is O(n) instead of O(2n).
         From: https://stackoverflow.com/a/58652335
@@ -391,6 +408,7 @@ def __repr__(self):
             ('k', repr(self._k)),
             ('window', repr(self._window)),
             ('use_c', repr(self._use_c)),
+            ('independent', repr(self._independent)),
             ('classes', repr(list(self._encoder.classes_)))
         ]
         try: