diff --git a/CHANGELOG.md b/CHANGELOG.md index ace06b6e..e292e9a8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,17 @@ +## [0.10.2](https://github.com/eonu/sequentia/releases/tag/v0.10.2) + +#### Major changes + +- Add support for dependent feature warping (addresses [#124](https://github.com/eonu/sequentia/pull/124)). ([#135](https://github.com/eonu/sequentia/pull/135)) +- Add multi-processed predictions for `HMMClassifier` (addresses [#121](https://github.com/eonu/sequentia/pull/121)). ([#136](https://github.com/eonu/sequentia/pull/136)) +- Re-order `predict()` and `evaluate()` arguments. ([#138](https://github.com/eonu/sequentia/pull/138)) + +#### Minor changes + +- Add `original_labels` documentation to `KNNClassifier`. ([#133](https://github.com/eonu/sequentia/pull/133)) +- Simplify `GMMHMM` documentation. ([#134](https://github.com/eonu/sequentia/pull/134)) +- Fix posterior comment in `classifier.svg`. ([#137](https://github.com/eonu/sequentia/pull/137)) + ## [0.10.1](https://github.com/eonu/sequentia/releases/tag/v0.10.1) #### Minor changes diff --git a/README.md b/README.md index 935fcb0b..2b096231 100644 --- a/README.md +++ b/README.md @@ -58,12 +58,13 @@ The following algorithms provided within Sequentia support the use of multivaria ### Classification algorithms -- [x] Hidden Markov Models (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))
Learning with the Baum-Welch algorithm [[1]](#references) +- [x] Hidden Markov Models (via [`hmmlearn`](https://github.com/hmmlearn/hmmlearn))
Learning with the Baum-Welch algorithm [[1]](#references) - [x] Gaussian Mixture Model emissions - [x] Linear, left-right and ergodic topologies + - [x] Multi-processed predictions - [x] Dynamic Time Warping k-Nearest Neighbors (via [`dtaidistance`](https://github.com/wannesm/dtaidistance)) - [x] Sakoe–Chiba band global warping constraint - - [x] Feature-independent warping (DTWI) + - [x] Dependent and independent feature warping (DTWD & DTWI) - [x] Custom distance-weighted predictions - [x] Multi-processed predictions diff --git a/docs/_static/classifier.svg b/docs/_static/classifier.svg index 6538744b..a802b033 100644 --- a/docs/_static/classifier.svg +++ b/docs/_static/classifier.svg @@ -1,3 +1,3 @@ -
Class

Class 1...
Class

Class 2...
Class

Class C...
Train a HMM for each class
(Baum-Welch algorithm)
Train a HMM for each class...
Calculate un-normalized posterior (likelihood prior)
for a new observation sequence
(Forward algorithm)
Calculate un-normalized posterior...
\lamb...
Find the HMM that maximizes the posterior probability of given
Find the HMM that maximizes the post...
\math...
c^{(n...
c^{(n...
c^{(n...
Split training data by class
Split training data by class
\mathb...
\mathb...
\mathb...
c'
Assign to the class represented by the model with the highest posterior
Assign O' to the class c' represente...
Viewer does not support full SVG 1.1
\ No newline at end of file +
Class

Class 1...
Class

Class 2...
Class

Class C...
Train a HMM for each class
(Baum-Welch algorithm)
Train a HMM for each class...
Calculate un-normalized posterior (likelihood prior)
for a new observation sequence
(Forward algorithm)
Calculate un-normalized posterior...
\lamb...
Find the HMM that maximizes the posterior probability of given
Find the HMM that maximizes the post...
\math...
c^{(n...
c^{(n...
c^{(n...
Split training data by class
Split training data by class
\mathb...
\mathb...
\mathb...
c'
Assign to the class represented by the model with the highest posterior
Assign O' to the class c' represente...
Viewer does not support full SVG 1.1
\ No newline at end of file diff --git a/docs/sections/classifiers/gmmhmm.rst b/docs/sections/classifiers/gmmhmm.rst index bb8ac95a..2a65f74f 100644 --- a/docs/sections/classifiers/gmmhmm.rst +++ b/docs/sections/classifiers/gmmhmm.rst @@ -70,17 +70,16 @@ Note that even in the case that multiple Gaussian densities are not needed, the can be adjusted so that irrelevant Gaussians are omitted and only a single Gaussian remains. However, the default setting of the :class:`~GMMHMM` class is a single Gaussian. -Then a GMM-HMM is completely determined by the learnable parameters -:math:`\lambda=(\boldsymbol{\pi}, A, B)` where :math:`B=(C,\Pi,\Psi)` and +Then a GMM-HMM is completely determined by +:math:`\lambda=(\boldsymbol{\pi}, A, B)`, where :math:`B` is a collection of +:math:`M` emission distributions (one for each state :math:`m=1,\ldots,M`), which are each +parameterized by a collection of -- :math:`C=\big(c_1^{(m)}, \ldots, c_K^{(m)}\big)_{m=1}^M` is - a collection of the mixture weights, -- :math:`\Pi=\big(\boldsymbol\mu_1^{(m)}, \ldots, \boldsymbol\mu_K^{(m)}\big)_{m=1}^M` is - a collection of the mean vectors, -- :math:`\Psi=\big(\Sigma_1^{(m)}, \ldots, \Sigma_K^{(m)}\big)_{m=1}^M` is - a collection of the covariance matrices, +- mixture weights :math:`c_1^{(m)}, \ldots, c_K^{(m)}`, +- mean vectors :math:`\boldsymbol\mu_1^{(m)}, \ldots, \boldsymbol\mu_K^{(m)}`, +- covariance matrices :math:`\Sigma_1^{(m)}, \ldots, \Sigma_K^{(m)}`, -for every mixture component of each state of the HMM. +for each of the :math:`1,\ldots,K` mixture components of each state. Usually if :math:`K` is large enough, a mixture of :math:`K` Gaussian densities can effectively model any probability density function. With large enough :math:`K`, we can also restrict the @@ -89,8 +88,8 @@ and at the same time decrease the number of parameters that need to be updated d The covariance matrix type can be specified by a string parameter ``covariance_type`` in the :class:`~GMMHMM` constructor that takes values `'spherical'`, `'diag'`, `'full'` or `'tied'`. -The various types are explained well in this `StackExchange answer `_, -and summarized in the below image (also courtesy of the same StackExchange answerer). +The various types are explained well `here `_, +and summarized in the below image (also courtesy of the author of the response in the previous link). .. image:: /_static/covariance_types.png :alt: Covariance Types diff --git a/lib/sequentia/__init__.py b/lib/sequentia/__init__.py index e7c389e7..8a295646 100644 --- a/lib/sequentia/__init__.py +++ b/lib/sequentia/__init__.py @@ -1,4 +1,4 @@ -__version__ = '0.10.1' +__version__ = '0.10.2' from .classifiers import * from .preprocessing import * \ No newline at end of file diff --git a/lib/sequentia/classifiers/hmm/hmm_classifier.py b/lib/sequentia/classifiers/hmm/hmm_classifier.py index 8f2ef552..96d1a6c8 100644 --- a/lib/sequentia/classifiers/hmm/hmm_classifier.py +++ b/lib/sequentia/classifiers/hmm/hmm_classifier.py @@ -1,4 +1,6 @@ -import numpy as np, pickle +import tqdm, tqdm.auto, numpy as np, pickle +from joblib import Parallel, delayed +from multiprocessing import cpu_count from .gmmhmm import GMMHMM from sklearn.metrics import confusion_matrix from sklearn.preprocessing import LabelEncoder @@ -43,7 +45,7 @@ def fit(self, models): self._encoder = LabelEncoder() self._encoder.fit([model.label for model in models]) - def predict(self, X, prior='frequency', return_scores=False, original_labels=True): + def predict(self, X, prior='frequency', return_scores=False, original_labels=True, verbose=True, n_jobs=1): """Predicts the label for an observation sequence (or multiple sequences) according to maximum likelihood or posterior scores. Parameters @@ -66,6 +68,18 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru original_labels: bool Whether to inverse-transform the labels to their original encoding. + verbose: bool + Whether to display a progress bar or not. + + .. note:: + If both ``verbose=True`` and ``n_jobs > 1``, then the progress bars for each process + are always displayed in the console, regardless of where you are running this function from + (e.g. a Jupyter notebook). + + n_jobs: int > 0 or -1 + | The number of jobs to run in parallel. + | Setting this to -1 will use all available CPU cores. + Returns ------- prediction(s): str/numeric or :class:`numpy:numpy.ndarray` (str/numeric) @@ -92,6 +106,9 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru else: self._val.one_of(prior, ['frequency', 'uniform'], desc='prior') self._val.boolean(return_scores, desc='return_scores') + self._val.boolean(original_labels, desc='original_labels') + self._val.boolean(verbose, desc='verbose') + self._val.restricted_integer(n_jobs, lambda x: x == -1 or x > 0, 'number of jobs', '-1 or greater than zero') # Create look-up for prior probabilities if prior == 'frequency': @@ -105,10 +122,15 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru # Convert single observation sequence to a singleton list X = [X] if isinstance(X, np.ndarray) else X + # Lambda for calculating the log un-normalized posteriors as a sum of the log forward probabilities (likelihoods) and log priors + posteriors = lambda x: np.array([model.forward(x) + np.log(prior[model.label]) for model in self._models]) + # Calculate log un-normalized posteriors as a sum of the log forward probabilities (likelihoods) and log priors # Perform the MAP classification rule and return labels to original encoding if necessary - posteriors = lambda x: np.array([model.forward(x) + np.log(prior[model.label]) for model in self._models]) - scores = np.array([posteriors(x) for x in X]) + n_jobs = min(cpu_count() if n_jobs == -1 else n_jobs, len(X)) + X_chunks = [list(chunk) for chunk in np.array_split(np.array(X, dtype=object), n_jobs)] + scores = Parallel(n_jobs=n_jobs)(delayed(self._chunk_predict)(i+1, posteriors, chunk, verbose) for i, chunk in enumerate(X_chunks)) + scores = np.concatenate(scores) best_idxs = np.atleast_1d(scores.argmax(axis=1)) labels = self._encoder.inverse_transform(best_idxs) if original_labels else best_idxs @@ -117,7 +139,7 @@ def predict(self, X, prior='frequency', return_scores=False, original_labels=Tru else: return (labels, scores) if return_scores else labels - def evaluate(self, X, y, prior='frequency'): + def evaluate(self, X, y, prior='frequency', verbose=True, n_jobs=1): """Evaluates the performance of the classifier on a batch of observation sequences and their labels. Parameters @@ -137,6 +159,18 @@ def evaluate(self, X, y, prior='frequency'): Alternatively, class prior probabilities can be specified in an iterable of floats, e.g. `[0.1, 0.3, 0.6]`. + verbose: bool + Whether to display a progress bar or not. + + .. note:: + If both ``verbose=True`` and ``n_jobs > 1``, then the progress bars for each process + are always displayed in the console, regardless of where you are running this function from + (e.g. a Jupyter notebook). + + n_jobs: int > 0 or -1 + | The number of jobs to run in parallel. + | Setting this to -1 will use all available CPU cores. + Returns ------- accuracy: float @@ -146,7 +180,7 @@ def evaluate(self, X, y, prior='frequency'): The confusion matrix representing the discrepancy between predicted and actual labels. """ X, y = self._val.observation_sequences_and_labels(X, y) - predictions = self.predict(X, prior=prior, return_scores=False, original_labels=False) + predictions = self.predict(X, prior=prior, return_scores=False, original_labels=False, verbose=verbose, n_jobs=n_jobs) cm = confusion_matrix(self._encoder.transform(y), predictions, labels=self._encoder.transform(self._encoder.classes_)) return np.sum(np.diag(cm)) / np.sum(cm), cm @@ -183,6 +217,13 @@ def load(cls, path): with open(path, 'rb') as file: return pickle.load(file) + def _chunk_predict(self, process, posteriors, chunk, verbose): # Requires fit + """Makes predictions (scores) for a chunk of the observation sequences, for a given subprocess.""" + return np.array([posteriors(x) for x in tqdm.auto.tqdm( + chunk, desc='Classifying examples (process {})'.format(process), + disable=not(verbose), position=process-1 + )]) + @property def models(self): try: diff --git a/lib/sequentia/classifiers/knn/knn_classifier.py b/lib/sequentia/classifiers/knn/knn_classifier.py index cf8c5049..882eace1 100644 --- a/lib/sequentia/classifiers/knn/knn_classifier.py +++ b/lib/sequentia/classifiers/knn/knn_classifier.py @@ -1,7 +1,7 @@ import warnings, tqdm, tqdm.auto, numpy as np, types, pickle, marshal from joblib import Parallel, delayed from multiprocessing import cpu_count -from dtaidistance import dtw +from dtaidistance import dtw, dtw_ndim from sklearn.metrics import confusion_matrix from sklearn.preprocessing import LabelEncoder from ...internals import _Validator @@ -20,7 +20,7 @@ class KNNClassifier: k: int > 0 Number of neighbors. - classes: array-liike of str/numeric + classes: array-like of str/numeric The complete set of possible classes/labels. weighting: 'uniform' or callable @@ -60,6 +60,9 @@ class KNNClassifier: pip install -vvv --upgrade --no-cache-dir --force-reinstall dtaidistance + independent: bool + Whether or not to allow features to be warped independently from each other. See `here `_ for a good overview of both approaches. + random_state: numpy.random.RandomState, int, optional A random state object or seed for reproducible randomness. @@ -84,7 +87,7 @@ class KNNClassifier: The complete set of possible classes/labels. """ - def __init__(self, k, classes, weighting='uniform', window=1., use_c=False, random_state=None): + def __init__(self, k, classes, weighting='uniform', window=1., use_c=False, independent=False, random_state=None): self._val = _Validator() self._k = self._val.restricted_integer( k, lambda x: x > 0, desc='number of neighbors', expected='greater than zero') @@ -116,6 +119,9 @@ def __init__(self, k, classes, weighting='uniform', window=1., use_c=False, rand warnings.warn('DTAIDistance C library not available – using Python implementation', ImportWarning) self._use_c = False + self._independent = self._val.boolean(independent, 'independent') + self._dtw = self._dtwi if independent else self._dtwd + def fit(self, X, y): """Fits the classifier by adding labeled training observation sequences. @@ -131,7 +137,7 @@ def fit(self, X, y): self._X, self._y = X, self._encoder.transform(y) self._n_features = X[0].shape[1] - def predict(self, X, verbose=True, original_labels=True, n_jobs=1): + def predict(self, X, original_labels=True, verbose=True, n_jobs=1): """Predicts the label for an observation sequence (or multiple sequences). Parameters @@ -139,6 +145,9 @@ def predict(self, X, verbose=True, original_labels=True, n_jobs=1): X: numpy.ndarray (float) or list of numpy.ndarray (float) An individual observation sequence or a list of multiple observation sequences. + original_labels: bool + Whether to inverse-transform the labels to their original encoding. + verbose: bool Whether to display a progress bar or not. @@ -165,6 +174,7 @@ def predict(self, X, verbose=True, original_labels=True, n_jobs=1): raise RuntimeError('The classifier needs to be fitted before predictions are made') X = self._val.observation_sequences(X, allow_single=True) + self._val.boolean(original_labels, desc='original_labels') self._val.boolean(verbose, desc='verbose') self._val.restricted_integer(n_jobs, lambda x: x == -1 or x > 0, 'number of jobs', '-1 or greater than zero') @@ -172,9 +182,9 @@ def predict(self, X, verbose=True, original_labels=True, n_jobs=1): distances = np.array([self._dtw(X, x) for x in tqdm.auto.tqdm(self._X, desc='Calculating distances', disable=not(verbose))]) return self._output(self._find_nearest(distances), original_labels) else: - n_jobs = cpu_count() if n_jobs == -1 else n_jobs + n_jobs = min(cpu_count() if n_jobs == -1 else n_jobs, len(X)) X_chunks = [list(chunk) for chunk in np.array_split(np.array(X, dtype=object), n_jobs)] - labels = Parallel(n_jobs=min(n_jobs, len(X)))(delayed(self._chunk_predict)(i+1, chunk, verbose) for i, chunk in enumerate(X_chunks)) + labels = Parallel(n_jobs=n_jobs)(delayed(self._chunk_predict)(i+1, chunk, verbose) for i, chunk in enumerate(X_chunks)) return self._output(np.concatenate(labels), original_labels) # Flatten the resulting array def evaluate(self, X, y, verbose=True, n_jobs=1): @@ -205,7 +215,7 @@ def evaluate(self, X, y, verbose=True, n_jobs=1): """ X, y = self._val.observation_sequences_and_labels(X, y) self._val.boolean(verbose, desc='verbose') - predictions = self.predict(X, verbose=verbose, original_labels=False, n_jobs=n_jobs) + predictions = self.predict(X, original_labels=False, verbose=verbose, n_jobs=n_jobs) cm = confusion_matrix(self._encoder.transform(y), predictions, labels=self._encoder.transform(self._encoder.classes_)) return np.sum(np.diag(cm)) / np.sum(cm), cm @@ -235,6 +245,7 @@ def save(self, path): 'weighting': marshal.dumps((self._weighting.__code__, self._weighting.__name__)), 'window': self._window, 'use_c': self._use_c, + 'independent': self._independent, 'random_state': self._random_state, 'X': self._X, 'y': self._y, @@ -259,7 +270,7 @@ def load(cls, path): data = pickle.load(file) # Check deserialized object dictionary and keys - keys = set(('k', 'classes', 'weighting', 'window', 'use_c', 'random_state', 'X', 'y', 'n_features')) + keys = set(('k', 'classes', 'weighting', 'window', 'use_c', 'independent', 'random_state', 'X', 'y', 'n_features')) if not isinstance(data, dict): raise TypeError('Expected deserialized object to be a dictionary - make sure the object was serialized with the save() function') else: @@ -277,6 +288,7 @@ def load(cls, path): weighting=weighting, window=data['window'], use_c=data['use_c'], + independent=data['independent'], random_state=data['random_state'] ) @@ -290,11 +302,16 @@ def _dtw_1d(self, a, b, window): # Requires fit """Computes the DTW distance between two univariate sequences.""" return dtw.distance(a, b, use_c=self._use_c, window=window) - def _dtw(self, A, B): # Requires fit - """Computes the multivariate DTW distance as the sum of the pairwise per-feature DTW distances.""" + def _dtwi(self, A, B): # Requires fit + """Computes the multivariate DTW distance as the sum of the pairwise per-feature DTW distances, allowing each feature to be warped independently.""" window = max(1, int(self._window * max(len(A), len(B)))) return np.sum([self._dtw_1d(A[:, i], B[:, i], window=window) for i in range(self._n_features)]) + def _dtwd(self, A, B): # Requires fit + """Computes the multivariate DTW distance so that the warping of the features depends on each other, by modifying the local distance measure.""" + window = max(1, int(self._window * max(len(A), len(B)))) + return dtw_ndim.distance(A, B, use_c=self._use_c, window=window) + def _argmax(self, a): """Same as numpy.argmax but returns all occurrences of the maximum, and is O(n) instead of O(2n). From: https://stackoverflow.com/a/58652335 @@ -391,6 +408,7 @@ def __repr__(self): ('k', repr(self._k)), ('window', repr(self._window)), ('use_c', repr(self._use_c)), + ('independent', repr(self._independent)), ('classes', repr(list(self._encoder.classes_))) ] try: diff --git a/lib/test/lib/classifiers/knn/test_knn_classifier.py b/lib/test/lib/classifiers/knn/test_knn_classifier.py index 43057888..a1a9c987 100644 --- a/lib/test/lib/classifiers/knn/test_knn_classifier.py +++ b/lib/test/lib/classifiers/knn/test_knn_classifier.py @@ -20,7 +20,8 @@ 'k=1': KNNClassifier(k=1, classes=classes, random_state=rng), 'k=2': KNNClassifier(k=2, classes=classes, random_state=rng), 'k=3': KNNClassifier(k=3, classes=classes, random_state=rng), - 'weighted': KNNClassifier(k=3, classes=classes, weighting=(lambda x: np.exp(-x)), random_state=rng) + 'weighted': KNNClassifier(k=3, classes=classes, weighting=(lambda x: np.exp(-x)), random_state=rng), + 'independent': KNNClassifier(k=1, classes=classes, independent=True, random_state=rng) } for _, clf in clfs.items(): @@ -96,6 +97,18 @@ def test_predict_single_weighted_no_verbose(capsys): assert 'Calculating distances' not in capsys.readouterr().err assert prediction == 'c1' +def test_predict_single_independent_verbose(capsys): + """Verbosely predict a single observation sequence with independent warping""" + prediction = clfs['independent'].predict(x, verbose=True) + assert 'Calculating distances' in capsys.readouterr().err + assert prediction == 'c1' + +def test_predict_single_k1_no_verbose(capsys): + """Silently predict a single observation sequence with independent warping""" + prediction = clfs['independent'].predict(x, verbose=False) + assert 'Calculating distances' not in capsys.readouterr().err + assert prediction == 'c1' + def test_predict_multiple_k1_verbose(capsys): """Verbosely predict multiple observation sequences (k=1)""" predictions = clfs['k=1'].predict(X, verbose=True) @@ -124,25 +137,37 @@ def test_predict_multiple_k3_verbose(capsys): """Verbosely predict multiple observation sequences (k=3)""" predictions = clfs['k=3'].predict(X, verbose=True) assert 'Classifying examples' in capsys.readouterr().err - assert list(predictions) == ['c1', 'c1', 'c1', 'c0', 'c0', 'c0'] + assert list(predictions) == ['c1', 'c1', 'c1', 'c1', 'c0', 'c1'] def test_predict_multiple_k3_no_verbose(capsys): """Silently predict multiple observation sequences (k=3)""" predictions = clfs['k=3'].predict(X, verbose=False) assert 'Classifying examples' not in capsys.readouterr().err - assert list(predictions) == ['c1', 'c1', 'c1', 'c0', 'c0', 'c0'] + assert list(predictions) == ['c1', 'c1', 'c1', 'c1', 'c0', 'c1'] def test_predict_multiple_weighted_verbose(capsys): """Verbosely predict multiple observation sequences (weighted)""" predictions = clfs['weighted'].predict(X, verbose=True) assert 'Classifying examples' in capsys.readouterr().err - assert list(predictions) == ['c1', 'c1', 'c0', 'c0', 'c0', 'c1'] + assert list(predictions) == ['c1', 'c1', 'c0', 'c1', 'c0', 'c1'] def test_predict_multiple_weighted_no_verbose(capsys): """Silently predict multiple observation sequences (weighted)""" predictions = clfs['weighted'].predict(X, verbose=False) assert 'Classifying examples' not in capsys.readouterr().err - assert list(predictions) == ['c1', 'c1', 'c0', 'c0', 'c0', 'c1'] + assert list(predictions) == ['c1', 'c1', 'c0', 'c1', 'c0', 'c1'] + +def test_predict_multiple_independent_verbose(capsys): + """Verbosely predict multiple observation sequences with independent warping""" + predictions = clfs['independent'].predict(X, verbose=True) + assert 'Classifying examples' in capsys.readouterr().err + assert list(predictions) == ['c1', 'c1', 'c0', 'c1', 'c1', 'c0'] + +def test_predict_multiple_independent_no_verbose(capsys): + """Silently predict multiple observation sequences with independent warping""" + predictions = clfs['independent'].predict(X, verbose=False) + assert 'Classifying examples' not in capsys.readouterr().err + assert list(predictions) == ['c1', 'c1', 'c0', 'c1', 'c1', 'c0'] def test_predict_single(): """Predict a single observation sequence and don't return the original labels""" @@ -157,12 +182,12 @@ def test_predict_single_original_labels(): def test_predict_multiple(): """Predict multiple observation sequences and don't return the original labels""" predictions = clfs['k=3'].predict(X, verbose=False, original_labels=False) - assert list(predictions) == [1, 1, 1, 0, 0, 0] + assert list(predictions) == [1, 1, 1, 1, 0, 1] def test_predict_multiple_original_labels(): """Predict multiple observation sequences and return the original labels""" predictions = clfs['k=3'].predict(X, verbose=False, original_labels=True) - assert list(predictions) == ['c1', 'c1', 'c1', 'c0', 'c0', 'c0'] + assert list(predictions) == ['c1', 'c1', 'c1', 'c1', 'c0', 'c1'] # ======================== # # KNNClassifier.evaluate() # @@ -173,8 +198,8 @@ def test_evaluate(): acc, cm = clfs['k=3'].evaluate(X, y) assert acc == 0.5 assert_equal(cm, np.array([ - [1, 1, 0, 0, 0], - [2, 2, 0, 0, 0], + [0, 2, 0, 0, 0], + [1, 3, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0] @@ -249,6 +274,7 @@ def test_load_valid_no_weighting(): assert list(clf._encoder.classes_) == classes assert clf._window == 1. assert clf._use_c == False + assert clf._independent == False assert deepcopy(clf._random_state).normal() == deepcopy(rng).normal() assert_all_equal(clf._X, X) assert_equal(clf._y, clf._encoder.transform(y)) @@ -271,6 +297,7 @@ def test_load_valid_weighting(): assert list(clf._encoder.classes_) == classes assert clf._window == 1. assert clf._use_c == False + assert clf._independent == False assert deepcopy(clf._random_state).normal() == deepcopy(rng).normal() assert_all_equal(clf._X, X) assert_equal(clf._y, clf._encoder.transform(y)) diff --git a/notebooks/Pen-Tip Trajectories (Example).ipynb b/notebooks/Pen-Tip Trajectories (Example).ipynb index 5e8aadb7..cf1e023a 100644 --- a/notebooks/Pen-Tip Trajectories (Example).ipynb +++ b/notebooks/Pen-Tip Trajectories (Example).ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 3, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -56,7 +56,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -93,7 +93,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 3, "metadata": {}, "outputs": [ { @@ -120,7 +120,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -142,7 +142,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -184,7 +184,7 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -218,7 +218,7 @@ }, { "cell_type": "code", - "execution_count": 9, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -263,7 +263,7 @@ }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -293,7 +293,7 @@ }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -323,7 +323,7 @@ }, { "cell_type": "code", - "execution_count": 12, + "execution_count": 10, "metadata": {}, "outputs": [ { @@ -344,7 +344,7 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 11, "metadata": {}, "outputs": [], "source": [ @@ -383,7 +383,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 12, "metadata": {}, "outputs": [], "source": [ @@ -403,13 +403,13 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "0f3bf4349086417dbba1fba33f5fff2a", + "model_id": "3ddd5bf499294bf0bc0963ab4dbb6ece", "version_major": 2, "version_minor": 0 }, @@ -426,7 +426,7 @@ "'w'" ] }, - "execution_count": 15, + "execution_count": 13, "metadata": {}, "output_type": "execute_result" } @@ -438,13 +438,13 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "2be0f98157f84e43af99b6895902c651", + "model_id": "c2f3a92e26114f1fbb4d45058698b747", "version_major": 2, "version_minor": 0 }, @@ -461,8 +461,8 @@ "text": [ "w c d e a e b h s v c y w e v v w v v b o e l c d c p n h p y p m h d a y d b n m m a g o g c n l y\n", "\n", - "CPU times: user 5.16 s, sys: 229 ms, total: 5.39 s\n", - "Wall time: 5.43 s\n" + "CPU times: user 1.68 s, sys: 195 ms, total: 1.88 s\n", + "Wall time: 1.98 s\n" ] } ], @@ -482,7 +482,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 15, "metadata": {}, "outputs": [ { @@ -491,8 +491,8 @@ "text": [ "w c d e a e b h s v c y w e v v w v v b o e l c d c p n h p y p m h d a y d b n m m a g o g c n l y\n", "\n", - "CPU times: user 705 ms, sys: 85.8 ms, total: 791 ms\n", - "Wall time: 5.1 s\n" + "CPU times: user 721 ms, sys: 90.5 ms, total: 811 ms\n", + "Wall time: 3.24 s\n" ] } ], @@ -512,7 +512,7 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 16, "metadata": { "scrolled": true }, @@ -521,8 +521,8 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 576 ms, sys: 20 ms, total: 596 ms\n", - "Wall time: 40.4 s\n" + "CPU times: user 556 ms, sys: 17.3 ms, total: 573 ms\n", + "Wall time: 10.1 s\n" ] } ], @@ -533,12 +533,12 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 17, "metadata": {}, "outputs": [ { "data": { - "image/png": "\n", + "image/png": "\n", "text/plain": [ "
" ] @@ -552,7 +552,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Accuracy: 97.03%\n" + "Accuracy: 97.73%\n" ] } ], @@ -568,7 +568,7 @@ "\n", "While the fast C compiled functions in the [`dtaidistance`](https://github.com/wannesm/dtaidistance) package (along with the multiprocessing capabilities of Sequentia's `KNNClassifier`) help to speed up classification **a lot**, the practical use of $k$-NN becomes more limited as the dataset grows larger. \n", "\n", - "In this case, since our dataset is relatively small, classifying all test examples was completed in $\\approx40s$, which is even faster than the HMM classifier that we show below. " + "In this case, since our dataset is relatively small, classifying all test examples was completed in $\\approx10s$, which is even faster than the HMM classifier that we show below. " ] }, { @@ -599,13 +599,13 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { - "model_id": "6daa1c40652d4c539f9fda73cebb3076", + "model_id": "30052beed780408db9e7b1f1212b6404", "version_major": 2, "version_minor": 0 }, @@ -638,7 +638,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 19, "metadata": {}, "outputs": [], "source": [ @@ -648,26 +648,26 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 3min 20s, sys: 16.2 s, total: 3min 36s\n", - "Wall time: 2min 2s\n" + "CPU times: user 197 ms, sys: 13.5 ms, total: 210 ms\n", + "Wall time: 55.4 s\n" ] } ], "source": [ "%%time\n", - "acc, cm = clf.evaluate(X_test, y_test)" + "acc, cm = clf.evaluate(X_test, y_test, n_jobs=-1)" ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 21, "metadata": {}, "outputs": [ { diff --git a/setup.py b/setup.py index aa862eae..063e68bc 100644 --- a/setup.py +++ b/setup.py @@ -19,7 +19,7 @@ 'joblib>=0.14,<1' ] -VERSION = '0.10.1' +VERSION = '0.10.2' with open('README.md', 'r') as fh: long_description = fh.read()