Skip to content

🧠 Understanding the memory context of AI models by testing prompt lengths exceeding their context limits.

License

Notifications You must be signed in to change notification settings

terminalcommandnewsletter/ai-memory-overflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Memory Overflow

A project by @tercmd (on Twitter)

Stars for the AI Memory Overflow repo

Understanding the memory context of AI models by testing prompt lengths exceeding their context limits. Contributions welcome!

Table of Contents

Testing ChatGPT

Test Circumstances

The gpt-3.5-turbo model has a context length of 4096 tokens. The prompt, larger than 4096 tokens, consisted of blocks (5 characters separated by -) and asked the AI model to provide the first and last block in the list. With a context length of 4096 tokens, each prompt averaged 973.33 blocks.

So, how does ChatGPT perform over a hundred tests?

A line graph titled "% Blocks Remembered" with the subtitle "Prompt > 4097 tokens, 100 tests." The graph shows the percentage of blocks remembered by the ChatGPT model over 100 tests. The line hovers around 78%-79% with occasional dips below 50% and one drop below 20%.

Can't view the image? Click here!

At most, ChatGPT retains context for 79.45% of blocks.

It must be noted that these results only pertain to the gpt-3.5-turbo model. Additionally, the test results do not account for instances when ChatGPT responded with code instead of a direct response.

In instances where the model responded with blocks not present in the list, I either retested the prompt in a new conversation, or tested with a new prompt when ChatGPT attempted to fill in the remaining characters and provided an invalid block.

In cases it responded with a truncated block or with a block with incorrect capitalization, the correct, complete version of the block was considered.

Also, with a context length of 3000 tokens (lower than 4096 tokens), ChatGPT remembers context for all blocks, responding with the correct first and last block every time.

Try it yourself!

To run a test on a model (by OpenAI, as this uses their library tiktoken), follow these steps:

  1. Install requirements using pip install tiktoken.

  2. Clone this repo into a folder (git clone https://github.com/terminalcommandnewsletter/ai-memory-overflow.git) as the script requires files in utils/ to run.

  3. In the terminal, run python main.py -m [MODEL]. Requires Python 2.6 or later.

To see all options, run python main.py -h.

Use the --old/-u flag to use the same concatenation used in the testing (not perfect, since it adds the separator at the beginning).

Also, to check the index of a block, use the --check/-x flag which will ask for the last block the AI sees and print the response as "<block index>,<number of blocks>,<percentage blocks remembered>".

If you have tested any model extensively (not in the repo), you can contribute to the repo.

Final Test Results

Test Number Tested in Model Tested Context Length Number of Characters per Block Average (Mean) % Remembered Lowest % Remembered Highest % Remembered Standard Deviation Number of Tests Tested by Test data (without headers, format: "last" block index,number of blocks,percentage remembered)
1 May 2023 gpt-3.5-turbo 4096 tokens 5 75.99% 13.04% 79.45% 11.24% 100 @terminalcommandnewsletter Test data

License

The license for this repo can be found in the COPYING file.

This program comes with ABSOLUTELY NO WARRANTY; for details, check COPYING. This is free software, and you are welcome to redistribute it under certain conditions; for details, check COPYING.

About

🧠 Understanding the memory context of AI models by testing prompt lengths exceeding their context limits.

Topics

Resources

License

Stars

Watchers

Forks

Languages