Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clearer error message for unknown example type #1202

Merged
merged 6 commits into from
May 14, 2024

Conversation

milocress
Copy link
Contributor

@milocress milocress commented May 14, 2024

Manual Tests:

ift-mpt-7b-lrhex4-hsukuh

Fails with

[rank0]: RemoteTraceback:
[rank0]: """
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/lib/python3/dist-packages/multiprocess/pool.py", line 125, in
[rank0]: worker
[rank0]:     result = (True, func(*args, **kwds))
[rank0]:                     ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3/dist-packages/datasets/utils/py_utils.py", line 678, in
[rank0]: _write_generator_to_queue
[rank0]:     for i, result in enumerate(func(**kwargs)):
[rank0]:   File "/usr/lib/python3/dist-packages/datasets/arrow_dataset.py", line 3517, in
[rank0]: _map_single
[rank0]:     example = apply_function_on_filtered_inputs(example, i, offset=offset)
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3/dist-packages/datasets/arrow_dataset.py", line 3416, in
[rank0]: apply_function_on_filtered_inputs
[rank0]:     processed_inputs = function(*fn_args, *additional_args, **fn_kwargs)
[rank0]:                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llm-foundry/llmfoundry/data/finetuning/tasks.py", line 889, in
[rank0]: dataset_mapper
[rank0]:     return tokenize_formatted_example(example, tokenizer)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llm-foundry/llmfoundry/data/finetuning/tasks.py", line 408, in
[rank0]: tokenize_formatted_example
[rank0]:     example_format = _get_example_type(example)
[rank0]:                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/llm-foundry/llmfoundry/data/finetuning/tasks.py", line 150, in
[rank0]: _get_example_type
[rank0]:     raise UnknownExampleTypeError(str(example.keys()))
[rank0]: llmfoundry.utils.exceptions.UnknownExampleTypeError: "Found keys
[rank0]: KeysView({'prompt': 'hello, ', 'response': 'world!', 'random_extra_key': 'sup'})
[rank0]: in dataset. Unknown example type. For prompt and response finetuning, the valid
[rank0]: prompt keys are {'prompt'} and the valid response keys are {'completion',
[rank0]: 'response'}. For chat finetuning, the allowed keys are {'messages'}"
[rank0]: """

which is what we want

We have been getting this error:

Traceback (most recent call last):
  File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3/dist-packages/multiprocess/pool.py", line 579, in _handle_results
    task = get()
           ^^^^^
  File "/usr/lib/python3/dist-packages/multiprocess/connection.py", line 254, in recv
    return _ForkingPickler.loads(buf.getbuffer())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dill/_dill.py", line 303, in loads
    return load(file, ignore, **kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dill/_dill.py", line 289, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/dill/_dill.py", line 444, in load
    obj = StockUnpickler.load(self)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/llmfoundry/utils/exceptions.py", line 85, in __init__
    f'Found keys {example.keys()} in dataset. Unknown example type. For prompt and response '
                  ^^^^^^^^^^^^

This PR fixes this by checking if example is a string before calling keys().

@milocress milocress changed the title Milo/unknown example type Clearer error message for unknown example type May 14, 2024
@milocress milocress requested review from KuuCi and dakinggg May 14, 2024 15:40
@milocress milocress merged commit 8274c6c into mosaicml:main May 14, 2024
9 checks passed
@milocress milocress deleted the milo/unknown-example-type branch May 14, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants