Skip to content

This is a repository for a Jupyter based tool to calculate Greedy Matching, Vector Extrema and Average Embedding evaluation metrics for generative AI chatbots

License

Notifications You must be signed in to change notification settings

aron-radvanyi/GENERATIVE_AI_Embedding_Metrics_Calculator

Repository files navigation

Description

This is a Jupyter Notebook file to calculate the 3 most popular Word Embedding-based metrics with Python to evaluate a generative conversational chatbot's answering performance for dialogue texts.

The 3 metrics implemented:

  • Greedy Matching score, the cosine similarity matching between the 300d vectors of the reference answer and the chatbot's answer

  • Embedding average score, average cosine similarity between vectors

  • Vector Extrema score, min and max score of cosine similarity

Example Usage:

(see "EMBEDDING_METRICS_TEST_EXAMPLE")

Screenshot: pic

References:

  • A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Word Similarity Metrics Using Word to Word Similarity Metrics. Vasile Rus, Mihai Lintean. 2012. Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, NAACL 2012.
  • Bootstrapping Dialog Systems with Word Embeddings. G. Forgues, J. Pineau, J. Larcheveque, R. Tremblay. 2014. Workshop on Modern Machine Learning and Natural Language Processing, NIPS 2014.
  • Sai, A. B., Mohankumar, A. K., and Khapra, M. M. (2022). A survey ofevaluation metrics used for nlg systems. ACM Computing Surveys (CSUR),55(2):1–39.