Clear cache upon version upgrade #566

lapp0 · 2024-01-22T19:09:52Z

Fixes #561

Smoke test:

[nix-shell:~/p/outlines]$ git branch --show-current
clear-cache-upon-version-upgrade


[nix-shell:~/p/outlines]$ python3 -c "from outlines.fsm.fsm import RegexFSM; from outlines.models.transformers import TransformerTokenizer; RegexFSM('a', TransformerTokenizer('gpt2'))" && echo "success"
success


[nix-shell:~/p/outlines]$ git checkout benl/fsm-enhancements
[nix-shell:~/p/outlines]$ pip install . -q
[nix-shell:~/p/outlines]$ python3 -c "from outlines.fsm.fsm import RegexFSM; from outlines.models.transformers import TransformerTokenizer; RegexFSM('a', TransformerTokenizer('gpt2'))" && echo "success"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/andrew/p/outlines/outlines/fsm/fsm.py", line 120, in __init__
    self.states_to_token_maps, self.empty_token_ids = create_states_mapping(
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)


[nix-shell:~/p/outlines]$ git merge clear-cache-upon-version-upgrade
Merge made by the 'ort' strategy.

[nix-shell:~/p/outlines]$ pip install . -q
[nix-shell:~/p/outlines]$ python3 -c "from outlines.fsm.fsm import RegexFSM; from outlines.models.transformers import TransformerTokenizer; RegexFSM('a', TransformerTokenizer('gpt2'))" && echo "success"
success

lapp0 · 2024-01-22T19:51:29Z

Didn't have the ~~pytest-mocker~~ pytest-mock dependency in pyproject.toml, I expect pytest to pass now.

rlouf · 2024-01-22T19:52:57Z

Thanks! Would you mind rebasing your branch on main to get rid of the merge commit?

lapp0 · 2024-01-22T21:11:51Z

Errors are related to transformers. I'm thinking it's because I changed pyproject.toml which resulted in upgrading a few packages

tests/generate/test_integration_transfomers.py:20: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
outlines/models/transformers.py:220: in transformers
    model = AutoModelForCausalLM.from_pretrained(model_name, **model_kwargs)
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:566: in from_pretrained
    return model_class.from_pretrained(
/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/transformers/modeling_utils.py:3527: in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

checkpoint_file = '/home/runner/.cache/huggingface/hub/models--hf-internal-testing--tiny-random-GPTJForCausalLM/snapshots/1fff390baa45cb187903ebdd269c975bb9ed7386/pytorch_model.bin'

    def load_state_dict(checkpoint_file: Union[str, os.PathLike]):
        """
        Reads a PyTorch checkpoint file, returning properly formatted errors if they arise.
        """
        if checkpoint_file.endswith(".safetensors") and is_safetensors_available():
            # Check format of the archive
            with safe_open(checkpoint_file, framework="pt") as f:
                metadata = f.metadata()
            if metadata.get("format") not in ["pt", "tf", "flax"]:
                raise OSError(
                    f"The safetensors archive passed at {checkpoint_file} does not contain the valid metadata. Make sure "
                    "you save your model with the `save_pretrained` method."
                )
            return safe_load_file(checkpoint_file)
        try:
            if (
                is_deepspeed_zero3_enabled() and torch.distributed.is_initialized() and torch.distributed.get_rank() > 0
            ) or (is_fsdp_enabled() and not is_local_dist_rank_0()):
                map_location = "meta"
            else:
                map_location = "cpu"
            extra_args = {}
            # mmap can only be used with files serialized with zipfile-based format.
            if (
                isinstance(checkpoint_file, str)
                and map_location != "meta"
                and version.parse(torch.__version__) >= version.parse("2.1.0")
                and is_zipfile(checkpoint_file)
            ):
                extra_args = {"mmap": True}
            return torch.load(
                checkpoint_file,
                map_location=map_location,
                weights_only=is_torch_greater_or_equal_than_1_13,
                **extra_args,
            )
        except Exception as e:
            try:
                with open(checkpoint_file) as f:
                    if f.read(7) == "version":
                        raise OSError(
                            "You seem to have cloned a repository without having git-lfs installed. Please install "
                            "git-lfs and run `git lfs install` followed by `git lfs pull` in the folder "
                            "you cloned."
                        )
                    else:
                        raise ValueError(
                            f"Unable to locate the file {checkpoint_file} which is necessary to load this pretrained "
                            "model. Make sure you have saved the model properly."
                        ) from e
            except (UnicodeDecodeError, ValueError):
>               raise OSError(
                    f"Unable to load weights from pytorch checkpoint file for '{checkpoint_file}' "
                    f"at '{checkpoint_file}'. "
                    "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True."
                )
E               OSError: Unable to load weights from pytorch checkpoint file for '/home/runner/.cache/huggingface/hub/models--hf-internal-testing--tiny-random-GPTJForCausalLM/snapshots/1fff390baa45cb187903ebdd269c975bb9ed7386/pytorch_model.bin' at '/home/runner/.cache/huggingface/hub/models--hf-internal-testing--tiny-random-GPTJForCausalLM/snapshots/1fff390baa45cb187903ebdd269c975bb9ed7386/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

Investigating...

lapp0 · 2024-01-22T21:27:38Z

Works for me on transformers=4.36.2, but errors on 4.37.0 (released 11 hours ago).

I'll pin transformers to 4.36.2 in this PR as well.

rlouf · 2024-01-22T21:46:29Z

I get the same error locally, we should open an issue for that.

lapp0 · 2024-01-22T21:53:22Z

Done
#568

rlouf · 2024-01-23T06:04:16Z

Thank you so much for fixing this!

lapp0 force-pushed the clear-cache-upon-version-upgrade branch from 1295221 to 4ae0c8f Compare January 22, 2024 19:54

rlouf changed the title ~~clear cache upon version upgrade~~ Clear cache upon version upgrade Jan 22, 2024

Andrew Lapp added 6 commits January 22, 2024 16:06

clear cache upon version upgrade

ee9578a

add explanatory comment

ec3126e

ensure we reset cache at start of test

52e923f

add pytest-mocker to dependencies

712c6ad

use correct pytest-mock name...

f4270d6

pin transformers to avoid test errors

4302f7e

lapp0 force-pushed the clear-cache-upon-version-upgrade branch from fb3f7e5 to 4302f7e Compare January 22, 2024 22:06

rlouf merged commit 8a0bafc into outlines-dev:main Jan 23, 2024
5 checks passed

lapp0 mentioned this pull request Jan 25, 2024

Error in outlines.generate.choice: create_states_mapping throws ValueError: not enough values to unpack (expected 3, got 2) #585

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear cache upon version upgrade #566

Clear cache upon version upgrade #566

lapp0 commented Jan 22, 2024

lapp0 commented Jan 22, 2024 •

edited

Loading

rlouf commented Jan 22, 2024

lapp0 commented Jan 22, 2024 •

edited

Loading

lapp0 commented Jan 22, 2024

rlouf commented Jan 22, 2024

lapp0 commented Jan 22, 2024

rlouf commented Jan 23, 2024

Clear cache upon version upgrade #566

Clear cache upon version upgrade #566

Conversation

lapp0 commented Jan 22, 2024

lapp0 commented Jan 22, 2024 • edited Loading

rlouf commented Jan 22, 2024

lapp0 commented Jan 22, 2024 • edited Loading

lapp0 commented Jan 22, 2024

rlouf commented Jan 22, 2024

lapp0 commented Jan 22, 2024

rlouf commented Jan 23, 2024

lapp0 commented Jan 22, 2024 •

edited

Loading

lapp0 commented Jan 22, 2024 •

edited

Loading