Add gemma 2 #31659

ArthurZucker · 2024-06-27T14:54:05Z

What does this PR do?

Adds support for gemma2

…emma-2

LysandreJik

Good! Only need to update to the 2 in the integration tests

LysandreJik · 2024-06-27T14:57:59Z

tests/models/gemma2/test_modeling_gemma2.py

+class Gemma2ModelTester(GemmaModelTester):
+    config_class = Gemma2Config
+    model_class = Gemma2Model
+    for_causal_lm_class = Gemma2ForCausalLM
+    for_sequence_class = Gemma2ForSequenceClassification
+    for_token_class = Gemma2ForTokenClassification


…o add-gemma-2

* inital commit * Add doc * protect? * fixup stuffs * update tests * fix build documentation * mmmmmmm config attributes * style * nit * uodate * nit * Fix docs * protect some stuff --------- Co-authored-by: Lysandre <lysandre@huggingface.co>

turboderp · 2024-07-02T11:44:50Z

Please excuse this if I'm just not reading the code correctly (?), but I'm struggling to understand the intended function of the hybrid cache. Here, in modeling_gemma2.py as well as the two other attention functions, the past keys/values are updated like so:

        if past_key_value is not None:
            # sin and cos are specific to RoPE models; cache_position needed for the static cache
            cache_kwargs = {
                "sin": sin,
                "cos": cos,
                "sliding_window": self.sliding_window,
                "cache_position": cache_position,
            }
            key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)

While the function definition for HybridCache.update looks like:

    def update(
        self,
        key_states: torch.Tensor,
        value_states: torch.Tensor,
        layer_idx: int,
        cache_kwargs: Optional[Dict[str, Any]] = None,
        sliding_window: Optional[int] = None,
    ) -> Tuple[torch.Tensor]:
        cache_position = cache_kwargs.get("cache_position")
        self.key_cache[layer_idx] = self.key_cache[layer_idx].to(device=key_states.device)
        self.value_cache[layer_idx] = self.value_cache[layer_idx].to(device=value_states.device)
        k_out = self.key_cache[layer_idx]
        v_out = self.value_cache[layer_idx]
        if sliding_window:
            update_fn = self._sliding_update
        else:
            update_fn = self._static_update
    ...

It doesn't read the sliding_window argument from the kwargs, so the default value is always used and the _sliding_update function is never selected, even on layers that use a sliding_window. Is this right?

ArthurZucker · 2024-07-09T19:57:20Z

that is a good catch and not intended, we will update this and fix it. The typo is that we should either pass the sliding windo directly, or get the sliding window form the cache kwargs. I think this stems from a will to make it compile compatible, and a type!

ArthurZucker added 2 commits June 27, 2024 16:48

inital commit

2fceb10

Merge branch 'main' of github.com:huggingface/transformers into add-g…

b6bca39

…emma-2

LysandreJik approved these changes Jun 27, 2024

View reviewed changes

LysandreJik and others added 5 commits June 27, 2024 17:07

Add doc

e954071

protect?

67a86ef

fixup stuffs

6066834

Merge branch 'add-gemma-2' of github.com:huggingface/transformers int…

73d2a24

…o add-gemma-2

update tests

d6d6351

ArthurZucker added the run-slow label Jun 27, 2024

ArthurZucker added 3 commits June 27, 2024 17:16

fix build documentation

360aaae

mmmmmmm config attributes

0029ad6

style

445d10e

ArthurZucker marked this pull request as ready for review June 27, 2024 15:18

ArthurZucker and others added 6 commits June 27, 2024 17:24

nit

396f1e4

uodate

033f0da

nit

173972c

Fix docs

57d83ef

protect some stuff

d8126e9

Merge branch 'add-gemma-2' of github.com:huggingface/transformers int…

c0fbdea

…o add-gemma-2

LysandreJik merged commit 0cf60f1 into main Jun 27, 2024
8 of 25 checks passed

LysandreJik deleted the add-gemma-2 branch June 27, 2024 15:36

ArthurZucker mentioned this pull request Jul 26, 2024

TinyModel addition #31804

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gemma 2 #31659

Add gemma 2 #31659

ArthurZucker commented Jun 27, 2024

LysandreJik left a comment

LysandreJik Jun 27, 2024

turboderp commented Jul 2, 2024

ArthurZucker commented Jul 9, 2024

Add gemma 2 #31659

Add gemma 2 #31659

Conversation

ArthurZucker commented Jun 27, 2024

What does this PR do?

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Jun 27, 2024

Choose a reason for hiding this comment

turboderp commented Jul 2, 2024

ArthurZucker commented Jul 9, 2024