IcelandicEval

A repository of utilities to generate Icelandic evaluation data sets for LLMs. The data sets are mostly about word inflection and grammatical correctness.

calc-freq.py

This utility program generates evaluation data for LLMs, typically OpenAI's GPT-4, to test proficiency in Icelandic. The data consists of lists of noun phrases, where each phrase contains an adjective and a noun, and the task is to inflect the adjective and noun together in all four cases (nominative, accusative, dative, genitive), in singular as well as plural.

The final output of the program is a set of three JSONL files, each containing a number of samples. The samples are bucketed into three categories, easy, medium and hard, depending on the frequency of the adjectives and nouns used in each sample. Each sample is an LLM prompt and an ideal completion.

Usage

Clone this repo into a directory, create a virtualenv and install the requirements:

    git clone https://github.com/mideind/IcelandicEval.git
    cd IcelandicEval
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt

The nouns.csv and adjectives.csv files need to be present in the data directory. They were originally created by querying the BÍN database (bin.arnastofnun.is), for example (in the psql command line):

    psql> \copy (select ord, ofl from bin2023
        where ofl in ('kk', 'kvk', 'hk')) to 'data/nouns.csv' with csv;

    psql> \copy (select ord from bin2023 where ofl = 'lo')
        to 'data/adjectives.csv' with csv;

Then, given those files, this program is run to generate randomly sampled, bucketed lists of nouns and adjectives respectively. The buckets are created by frequency of occurrence of the word forms in the icegrams database, with bucket 0 containing the least frequent words and bucket 2 the most frequent. The bucket files are created in the data directory, under the names nouns-{0,1,2}.txt and adj-{0,1,2}.txt.

    python calc-freq.py --nouns
    python calc-freq.py --adjectives

Finally, after the buckets 0-2 have been created, the final evaluation samples can be generated. The number of samples desired from each bucket can be passed in as a command line parameter, defaulting to 20.

    python calc-freq.py --generate [N, default 20]

The results are found in three JSONL files, named data/icelandic-inflection-{easy,medium,hard}/samples.jsonl. They are in a format that is suitable for use with OpenAI's evals suite (see github.com/openai/evals).

License

This software is under the MIT License. Consult the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calc-freq.py		calc-freq.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IcelandicEval

calc-freq.py

Usage

License

About

Releases

Packages

Languages

License

mideind/IcelandicEval

Folders and files

Latest commit

History

Repository files navigation

IcelandicEval

calc-freq.py

Usage

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages