Releases: Lightning-AI/torchmetrics
Minor patch release
[0.9.2] - 2022-06-29
Fixed
- Fixed mAP calculation for areas with 0 predictions (#1080)
- Fixed bug where avg precision state and auroc state was not merge when using MetricCollections (#1086)
- Skip box conversion if no boxes are present in
MeanAveragePrecision
(#1097) - Fixed inconsistency in docs and code when setting
average="none"
inAvaragePrecision
metric (#1116)
Contributors
@23pointsNorth, @kouyk, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor PL compatibility patch
[0.9.1] - 2022-06-08
Added
- Added specific
RuntimeError
when metric object is on the wrong device (#1056) - Added an option to specify own n-gram weights for
BLEUScore
andSacreBLEUScore
instead of using uniform weights only. (#1075)
Fixed
- Fixed aggregation metrics when input only contains zero (#1070)
- Fixed
TypeError
when providing superclass arguments askwargs
(#1069) - Fixed bug related to state reference in metric collection when using compute groups (#1076)
Contributors
@jlcsilva, @SkafteNicki, @stancld
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Faster forward
Highligths
TorchMetrics v0.9 is now out, and it brings significant changes to how the forward method works. This blog post goes over these improvements and how they affect both users of TorchMetrics and users that implement custom metrics. TorchMetrics v0.9 also includes several new metrics and bug fixes.
Blog: TorchMetrics v0.9 — Faster forward
The Story of the Forward Method
Since the beginning of TorchMetrics, Forward has served the dual purpose of calculating the metric on the current batch and accumulating in a global state. Internally, this was achieved by calling update twice: one for each purpose, which meant repeating the same computation. However, for many metrics, calling update twice is unnecessary to achieve both the local batch statistics and accumulating globally because the global statistics are simple reductions of the local batch states.
In v0.9, we have finally implemented a logic that can take advantage of this and will only call update once before making a simple reduction. As you can see in the figure below, this can lead to a single call of forward being 2x faster in v0.9 compared to v0.8 of the same metric.
With the improvements to forward, many metrics have become significantly faster (up to 2x)
It should be noted that this change mainly benefits metrics (for example, confusionmatrix
) where calling update is expensive.
We went through all existing metrics in TorchMetrics and enabled this feature for all appropriate metrics, which was almost 95% of all metrics. We want to stress that if you are using metrics from TorchMetrics, nothing has changed to the API, and no code changes are necessary.
[0.9.0] - 2022-05-31
Added
- Added
RetrievalPrecisionRecallCurve
andRetrievalRecallAtFixedPrecision
to retrieval package (#951) - Added class property
full_state_update
that determinesforward
should callupdate
once or twice (#984,#1033) - Added support for nested metric collections (#1003)
- Added
Dice
to classification package (#1021) - Added support to segmentation type
segm
as IOU for mean average precision (#822)
Changed
- Renamed
reduction
argument toaverage
in Jaccard score and added additional options (#874)
Removed
- Removed deprecated
compute_on_step
argument (#962, #967, #979 ,#990, #991, #993, #1005, #1004, #1007)
Fixed
- Fixed non-empty state
dict
for a few metrics (#1012) - Fixed bug when comparing states while finding compute groups (#1022)
- Fixed
torch.double
support in stat score metrics (#1023) - Fixed
FID
calculation for non-equal size real and fake input (#1028) - Fixed case where
KLDivergence
could outputNan
(#1030) - Fixed deterministic for PyTorch<1.8 (#1035)
- Fixed default value for
mdmc_average
inAccuracy
(#1036) - Fixed missing copy of property when using compute groups in
MetricCollection
(#1052)
Contributors
@Borda, @burglarhobbit, @charlielito, @gianscarpe, @MrShevan, @phaseolud, @razmikmelikbekyan, @SkafteNicki, @tanmoyio, @vumichien
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
[0.8.2] - 2022-05-06
Fixed
- Fixed multi-device aggregation in
PearsonCorrCoef
(#998) - Fixed MAP metric when using a custom list of thresholds (#995)
- Fixed compatibility between compute groups in
MetricCollection
and prefix/postfix arg (#1007) - Fixed compatibility with future Pytorch 1.12 in
safe_matmul
(#1011, #1014)
Contributors
@ben-davidson-6, @Borda, @SkafteNicki, @tanmoyio
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
[0.8.1] - 2022-04-27
Changed
- Reimplemented the
signal_distortion_ratio
metric, which removed the absolute requirement offast-bss-eval
(#964)
Fixed
- Fixed "Sort currently does not support bool dtype on CUDA" error in MAP for empty preds (#983)
- Fixed
BinnedPrecisionRecallCurve
whenthresholds
argument is not provided (#968) - Fixed
CalibrationError
to work on logit input (#985)
Contributors
@DuYicong515, @krshrimali, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Faster collection and more metrics!
We are excited to announce that TorchMetrics v0.8 is now available. The release includes several new metrics in the classification and image domains and some performance improvements for those working with metrics collections.
Metric collections just got faster
Common wisdom dictates that you should never evaluate the performance of your models using only a single metric but instead a collection of metrics. For example, it is common to simultaneously evaluate the accuracy, precision, recall, and f1 score in classification. In TorchMetrics, we have for a long time provided the MetricCollection object for chaining such metrics together for an easy interface to calculate them all at once. However, in many cases, such a collection of metrics shares some of the underlying computations that have been repeated for every metric in the collection. In Torchmetrics v0.8 we have introduced the concept of compute_groups to MetricCollection that will, as default, be auto-detected and group metrics that share some of the same computations.
Thus, if you are using MetricCollections in your code, upgrading to TorchMetrics v0.8 should automatically make your code run faster without any code changes.
Many exciting new metrics
TorchMetrics v0.8 includes several new metrics within the classification and image domain, both for the functional and modular API. We refer to the documentation for the full description of all metrics if you want to learn more about them.
SpectralAngleMapper
or SAM was added to the image package. This metric can calculate the spectral similarity between given reference spectra and estimated spectra.CoverageError
was added to the classification package. This metric can be used when you are working with multi-label data. The metric works similar to thesklearn
counterpart and computes how far you need to go through ranked scores such that all true labels are covered.LabelRankingAveragePrecision
andLabelRankingLoss
were added to the classification package. Both metrics are used in multi-label ranking problems, where the goal is to give a better rank to the labels associated with each sample. Each metric gives a measure of how well your model is doing this.ErrorRelativeGlobalDimensionlessSynthesis
or ERGAS was added to the image package. This metric can be used to calculate the accuracy of Pan sharpened images considering the normalized average error of each band of the resulting image.UniversalImageQualityIndex
was added to the image package. This metric can assess the difference between two images, which considers three different factors when computed: loss of correlation, luminance distortion, and contrast distortion.ClasswiseWrapper
was added to the wrapper package. This wrapper can be used in combinations with metrics that return multiple values (such as classification metrics with the average=None argument). The wrapper will unwrap the result into adict
with a label for each value.
[0.8.0] - 2022-04-14
Added
- Added
WeightedMeanAbsolutePercentageError
to regression package (#948) - Added new classification metrics:
- Added new image metric:
- Added support for
MetricCollection
inMetricTracker
(#718) - Added support for 3D image and uniform kernel in
StructuralSimilarityIndexMeasure
(#818) - Added smart update of
MetricCollection
(#709) - Added
ClasswiseWrapper
for better logging of classification metrics with multiple output values (#832) - Added
**kwargs
argument for passing additional arguments to base class (#833) - Added negative
ignore_index
for the Accuracy metric (#362) - Added
adaptive_k
for theRetrievalPrecision
metric (#910) - Added
reset_real_features
argument image quality assessment metrics (#722) - Added new keyword argument
compute_on_cpu
to all metrics (#867)
Changed
- Made
num_classes
injaccard_index
a required argument (#853, #914) - Added normalizer, tokenizer to ROUGE metric (#838)
- Improved shape checking of
permutation_invariant_training
(#864) - Allowed reduction
None
(#891) MetricTracker.best_metric
will now give a warning when computing on metric that do not have a best (#913)
Deprecated
- Deprecated argument
compute_on_step
(#792) - Deprecated passing in
dist_sync_on_step
,process_group
,dist_sync_fn
direct argument (#833)
Removed
- Removed support for versions of Lightning lower than v1.5 (#788)
- Removed deprecated functions, and warnings in Text (#773)
WER
andfunctional.wer
- Removed deprecated functions and warnings in Image (#796)
SSIM
andfunctional.ssim
PSNR
andfunctional.psnr
- Removed deprecated functions, and warnings in classification and regression (#806)
FBeta
andfunctional.fbeta
F1
andfunctional.f1
Hinge
andfunctional.hinge
IoU
andfunctional.iou
MatthewsCorrcoef
PearsonCorrcoef
SpearmanCorrcoef
- Removed deprecated functions, and warnings in detection and pairwise (#804)
MAP
andfunctional.pairwise.manhatten
- Removed deprecated functions, and warnings in Audio (#805)
PESQ
andfunctional.audio.pesq
PIT
andfunctional.audio.pit
SDR
andfunctional.audio.sdr
andfunctional.audio.si_sdr
SNR
andfunctional.audio.snr
andfunctional.audio.si_snr
STOI
andfunctional.audio.stoi
Fixed
- Fixed device mismatch for
MAP
metric in specific cases (#950) - Improved testing speed (#820)
- Fixed compatibility of
ClasswiseWrapper
with theprefix
argument ofMetricCollection
(#843) - Fixed
BestScore
on GPU (#912) - Fixed Lsum computation for
ROUGEScore
(#944)
Contributors
@ankitaS11, @ashutoshml, @Borda, @hookSSi, @justusschock, @lucadiliello, @quancs, @rusty1s, @SkafteNicki, @stancld, @vumichien, @weningerleon, @yassersouri
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
[0.7.3] - 2022-03-22
Fixed
- Fixed unsafe log operation in
TweedieDeviace
for power=1 (#847) - Fixed bug in MAP metric related to either no ground truth or no predictions (#884)
- Fixed
ConfusionMatrix
,AUROC
andAveragePrecision
on GPU when running in deterministic mode (#900) - Fixed NaN or Inf results returned by
signal_distortion_ratio
(#899) - Fixed memory leak when using
update
method with tensor whererequires_grad=True
(#902)
Contributors
@mtailanian, @quancs, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
JOSS paper
[0.7.2] - 2022-02-10
Fixed
- Minor patches in JOSS paper.
Improve mAP performance
[0.7.1] - 2022-02-03
Changed
- Used
torch.bucketize
in calibration error whentorch>1.8
for faster computations (#769) - Improve mAP performance (#742)
Fixed
- Fixed check for available modules (#772)
- Fixed Matthews correlation coefficient when the denominator is 0 (#781)
Contributors
@Borda, @ramonemiliani93, @SkafteNicki, @twsl
If we forgot someone due to not matching commit email with GitHub account, let us know :]
New NLP metrics and improved API
We are excited to announce that TorchMetrics v0.7 is now publicly available. This release is pretty significant. It includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API, and some other great features. TorchMetrics thus now has over 60+ metrics, and the package is more user-friendly than ever.
NLP metrics - Text package
Text package is a part of TorchMetrics as of v0.5. With the growing capability of language generation models, there is also a real need to have reliable evaluation metrics. With several added metrics and unified API, TorchMetrics makes the usage of various metrics even easier! TorchMetrics v0.7 newly includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. Furthermore, it also supports other metrics - Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. Last but not least, we also made possible the evaluation of the ROUGE score using multiple references.
Argument unification
Importantly, all text metrics assume preds, target input order with these explicit keyword arguments. If different naming was used before v0.7, it is deprecated and completely removed in v0.8.
Import and naming changes
TorchMetrics v0.7 brings more extensive and minor changes to how metrics should be imported. The import changes directly impact v0.7, meaning that you will most likely need to change the import statement for some specific metrics. All naming changes follow our standard deprecation process, meaning that in v0.7, any metric that is renamed will still work but raise an error asking to use the new metric name. From v0.8, the old metric names will no longer be available.
[0.7.0] - 2022-01-17
Added
- Added NLP metrics:
- Added
MultiScaleSSIM
into image metrics (#679) - Added Signal to Distortion Ratio (
SDR
) to audio package (#565) - Added
MinMaxMetric
to wrappers (#556) - Added
ignore_index
to retrieval metrics (#676) - Added support for multi references in
ROUGEScore
(#680) - Added a default VSCode devcontainer configuration (#621)
Changed
- Scalar metrics will now consistently have additional dimensions squeezed (#622)
- Metrics having third party dependencies removed from global import (#463)
- Untokenized for
BLEUScore
input stay consistent with all the other text metrics (#640) - Arguments reordered for
TER
,BLEUScore
,SacreBLEUScore
,CHRFScore
now the expected input order is predictions first and target second (#696) - Changed dtype of metric state from
torch.float
totorch.long
inConfusionMatrix
to accommodate larger values (#715) - Unify
preds
,target
input argument's naming across all text metrics (#723, #727)bert
,bleu
,chrf
,sacre_bleu
,wip
,wil
,cer
,ter
,wer
,mer
,rouge
,squad
Deprecated
- Renamed IoU -> Jaccard Index (#662)
- Renamed text WER metric: (#714)
functional.wer
->functional.word_error_rate
WER
->WordErrorRate
- Renamed correlation coefficient classes: (#710)
MatthewsCorrcoef
->MatthewsCorrCoef
PearsonCorrcoef
->PearsonCorrCoef
SpearmanCorrcoef
->SpearmanCorrCoef
- Renamed audio STOI metric: (#753, #758)
audio.STOI
toaudio.ShortTimeObjectiveIntelligibility
functional.audio.stoi
tofunctional.audio.short_time_objective_intelligibility
- Renamed audio PESQ metrics: (#751)
functional.audio.pesq
->functional.audio.perceptual_evaluation_speech_quality
audio.PESQ
->audio.PerceptualEvaluationSpeechQuality
- Renamed audio SDR metrics: (#711)
functional.sdr
->functional.signal_distortion_ratio
functional.si_sdr
->functional.scale_invariant_signal_distortion_ratio
SDR
->SignalDistortionRatio
SI_SDR
->ScaleInvariantSignalDistortionRatio
- Renamed audio SNR metrics: (#712)
functional.snr
->functional.signal_distortion_ratio
functional.si_snr
->functional.scale_invariant_signal_noise_ratio
SNR
->SignalNoiseRatio
SI_SNR
->ScaleInvariantSignalNoiseRatio
- Renamed F-score metrics: (#731, #740)
functional.f1
->functional.f1_score
F1
->F1Score
functional.fbeta
->functional.fbeta_score
FBeta
->FBetaScore
- Renamed Hinge metric: (#734)
functional.hinge
->functional.hinge_loss
Hinge
->HingeLoss
- Renamed image PSNR metrics (#732)
functional.psnr
->functional.peak_signal_noise_ratio
PSNR
->PeakSignalNoiseRatio
- Renamed image PIT metric: (#737)
functional.pit
->functional.permutation_invariant_training
PIT
->PermutationInvariantTraining
- Renamed image SSIM metric: (#747)
functional.ssim
->functional.scale_invariant_signal_noise_ratio
SSIM
->StructuralSimilarityIndexMeasure
- Renamed detection
MAP
toMeanAveragePrecision
metric (#754) - Renamed Fidelity & LPIPS image metric: (#752)
image.FID
->image.FrechetInceptionDistance
image.KID
->image.KernelInceptionDistance
image.LPIPS
->image.LearnedPerceptualImagePatchSimilarity
Removed
- Removed
embedding_similarity
metric (#638) - Removed argument
concatenate_texts
fromwer
metric (#638) - Removed arguments
newline_sep
anddecimal_places
fromrouge
metric (#638)
Fixed
- Fixed MetricCollection kwargs filtering when no
kwargs
are present in update signature (#707)
Contributors
@ashutoshml, @Borda, @cuent, @Fariborzzz, @getgaurav2, @janhenriklambrechts, @justusschock, @karthikrangasai, @lucadiliello, @mahinlma, @mathemusician, @mona0809, @mrleu, @puhuk, @quancs, @SkafteNicki, @stancld, @twsl
If we forgot someone due to not matching commit email with GitHub account, let us know :]