PyTorch has launched TorchMetrics v0.7. The fresh release includes several new metrics (mainly for NLP), naming and import changes, general improvements to the API among other features.

Version 0.7 has 60+ metrics that cover eight domains including audio, classification, image, pairwise, detection, regression, information retrieval, and text. The team has also unified TorchMetrics API across all domains and has reached over 600 stars. Currently, 1,500 repositories use TorchMetrics.

The changes in NLP metrics include text package which in the v 0.7 includes a couple of machine translation metrics such as chrF, chrF++, Translation Edit Rate, or Extended Edit Distance. It now also supports other metrics — Match Error Rate, Word Information Lost, Word Information Preserved, and SQuAD evaluation metrics. The team has also made possible the evaluation of the ROUGE score using multiple references.

Other updates include Translation Edit Rate (TER) which represents a normalized minimum number of edits required for a hypothesis that matches one of the provided references. Edits comprise insertion, deletion, and substation on a word or n-gram word level. Extended Edit Distance is also one of the updates that operate at a character level, and it extends Levenshtein distance using jump operations. 

Match Error Rate (MRE), a metric used to evaluate automatic speech recognition (ASR) systems, has been added. Word Information Lost (WIL) and Word Information Preserved (WIP) are two interwoven metrics used predominantly to evaluate ASR output. SQuAD, which stands for Stanford Question Answering Dataset, is another update. It is a machine comprehension dataset of 100,000+ pairs of questions and Wikipedia articles with selected answers. The dataset is used for the evaluation of extractive models.

TorchMetrics v0.7 brings more extensive changes to how metrics should be imported. The import changes directly impact v0.7, which will require developers to change the import statement for some specific metrics. All naming changes follow the standard deprecation process. From v0.8, the old metric names will no longer be available.

A complete list of metrics undergoing a naming change can be found in the changelog.