Skip to content

Commit

Permalink
update scikit-learn to v. 1.5.0
Browse files Browse the repository at this point in the history
  • Loading branch information
maridda committed Jun 21, 2024
1 parent 303a3b5 commit 505e2df
Show file tree
Hide file tree
Showing 10 changed files with 594 additions and 7 deletions.
407 changes: 407 additions & 0 deletions .ipynb_checkpoints/nerval_jupyter-checkpoint.ipynb

Large diffs are not rendered by default.

Binary file added dist/nerval-1.1.2-py3-none-any.whl
Binary file not shown.
Binary file added dist/nerval-1.1.2.tar.gz
Binary file not shown.
167 changes: 167 additions & 0 deletions nerval.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
Metadata-Version: 2.1
Name: nerval
Version: 1.1.2
Summary: Entity-level confusion matrix and classification report to evaluate Named Entity Recognition (NER) models.
Home-page: https://github.com/maridda/nerval
Author: Mariangela D'Addato
Author-email: mdadda.py@gmail.com
Project-URL: Bug Tracker, https://github.com/maridda/nerval/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENCE.txt

# nerval
Entity-level confusion matrix and classification report to evaluate Named Entity Recognition (NER) models.


## Labelling schemes supported:
- IO
- BIO1 (IOB1)
- BIO2 (IOB2)
- IOE1
- IOE2
- IOBES
- BILOU
- BMEWO


## Options for the 'scheme' argument:
- **BIO** for the following schemes: IO / BIO1 (IOB1) / BIO2 (IOB2) / IOBES / BILOU / BMEWO
- **IOE** for the following schemes: IOE1 / IOE2
- **BIO** is the default scheme.


## Output:
- Classification report
- Confusion matrix
- Labels for the confusion matrix
- Confusion matrix plot


## How to use it:

```python
>>> from nerval import crm

>>> y_true = [['O', 'B-PER', 'I-PER', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC']]
>>> y_pred = [['O', 'B-PER', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC']]

>>> cr, cm, cm_labels = crm(y_true, y_pred, scheme='BIO')
True Entities: 2
Pred Entities: 2

True Entities with 3 or more tags: 0
Pred Entities with 3 or more tags: 0

True positives: 0
False positives (true = 'O'): 1
False positives (true <> pred): 1
ToT False positives: 2
False negatives: 1

>>> print(cr)
precision recall f1_score true_entities pred_entities
PER 0.00 0.00 0.00 1.00 0.00
LOC 0.00 0.00 0.00 1.00 1.00
PER__ 0.00 0.00 0.00 0.00 1.00
micro_avg 0.00 0.00 0.00 2.00 2.00
macro_avg 0.00 0.00 0.00 2.00 2.00
weighted_avg 0.00 0.00 0.00 2.00 2.00

>>> print(cm)
[[0 1 0 0]
[1 0 0 0]
[0 0 0 1]
[0 0 0 0]]

>>> print(cm_labels)
['LOC', 'O', 'PER', 'PER__']
```

```python
>>> from nerval import plot_confusion_matrix

>>> y_true = [['O', 'B-PER', 'I-PER', 'O', 'O', 'O', 'O', 'B-LOC', 'I-LOC']]
>>> y_pred = [['O', 'B-PER', 'O', 'O', 'O', 'O', 'O', 'O', 'B-LOC']]

>>> plot_confusion_matrix(cm, cm_labels, show=True, save=False, img_path=None, normalize=None, decimal_places=2, figsize=(15,15), SMALL_SIZE=8, MEDIUM_SIZE=12, BIGGER_SIZE=14, cmap='OrRd', xticks_rotation='vertical', title='Confusion Matrix')

>>> plot_confusion_matrix(cm, cm_labels, show=True, save=True, img_path=None)

>>> plot_confusion_matrix(cm, cm_labels, show=True, save=True, img_path=r'C:\Users\...\my_conf_matrix.png')

>>> plot_confusion_matrix(cm, cm_labels, show=False, save=True, img_path=None)

>>> plot_confusion_matrix(cm, cm_labels, show=False, save=True, img_path=r'C:\Users\...\my_conf_matrix.png')
```

### Note 1:
**y_true** and **y_pred** could be:
- flat lists
- lists of flat lists
- lists of nested lists.

Flat lists and lists of nested lists will be converted to lists of lists.


### Note 2:
The __ at the end of some entities means that true and pred have the same entity name for the first token but the prediction is somewhat different from the true label.
Examples:
```python
>>> y_true = ['B-ORG', 'I-ORG', 'I-ORG']
>>> y_pred = ['B-ORG']

>>> y_true = ['B-ORG', 'I-ORG', 'I-ORG']
>>> y_pred = ['B-ORG', 'I-ORG', 'I-ORG', 'I-ORG', 'I-ORG']

>>> y_true = ['B-ORG', 'I-ORG', 'I-ORG']
>>> y_pred = ['B-ORG', 'I-PER']

>>> y_true = ['B-ORG', 'I-ORG', 'I-ORG']
>>> y_pred = ['I-ORG', 'I-PER']
```

### Note 3:
The normalize argument could be:
- None
- 'true'
- 'pred'
- 'all'

Default is None.


### Note 4:
In case of division by zero, the result will default to zero.

### Note 5:
Parameters in function plot_confusion_matrix():
- show: show the plot (default: True)
- save: save the plot (default: False)
- img_path: where to save the plot - e.g. r'C:\Users\User\...\my_conf_matrix.png' (default: None - this means save the plot in current dir)


## Installation
```bash
pip install nerval
conda install -c conda-forge nerval
```


## License
[MIT](https://github.com/maridda/nerval/blob/main/LICENCE.txt)


## Citation
```text
@misc{nerval,
title={{nerval}: Entity-level confusion matrix and classification report to evaluate Named Entity Recognition (NER) models.},
url={https://github.com/maridda/nerval},
note={Software available from https://github.com/maridda/nerval},
author={Mariangela D'Addato},
year={2022},
}
```
11 changes: 11 additions & 0 deletions nerval.egg-info/SOURCES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
LICENCE.txt
README.md
pyproject.toml
setup.cfg
setup.py
nerval/__init__.py
nerval/nerval.py
nerval.egg-info/PKG-INFO
nerval.egg-info/SOURCES.txt
nerval.egg-info/dependency_links.txt
nerval.egg-info/top_level.txt
1 change: 1 addition & 0 deletions nerval.egg-info/dependency_links.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

1 change: 1 addition & 0 deletions nerval.egg-info/top_level.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
nerval
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
numpy==1.22.3
pandas==1.4.4
matplotlib-base==3.5.3
scikit-learn==1.1.1
scikit-learn==1.5.0
6 changes: 3 additions & 3 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
[metadata]
name = nerval
version = 1.1.1
version = 1.1.2
author = Mariangela D'Addato
author_email = mdadda.py@gmail.com
description = Entity-level confusion matrix and classification report to evaluate Named Entity Recognition (NER) models.
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/mdadda/nerval
url = https://github.com/maridda/nerval
project_urls =
Bug Tracker = https://github.com/mdadda/nerval/issues
Bug Tracker = https://github.com/maridda/nerval/issues
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Expand Down
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@

setuptools.setup(
name="nerval",
version="1.1.1",
version="1.1.2",
author="Mariangela D'Addato",
author_email="mdadda.py@gmail.com",
description="Entity-level confusion matrix and classification report to evaluate Named Entity Recognition (NER) models.",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/mdadda/nerval",
url="https://github.com/maridda/nerval",
project_urls={
"Bug Tracker": "https://github.com/mdadda/nerval/issues",
"Bug Tracker": "https://github.com/maridda/nerval/issues",
},
classifiers=[
"Programming Language :: Python :: 3",
Expand Down

0 comments on commit 505e2df

Please sign in to comment.