Embeddings and hidden states of Agro-NT model (New to this field so please excuse my question if it is really naive/stupid.) #70

NikeeShrestha · 2024-06-14T21:38:19Z

How do embeddings layers saved from inference notebook in the github and hidden states from the Hugging face inference notebook differ from each other? When I compare the these two outputs for a sequence, they are different. If I want to do downstreams classification task, which one should be best to work with? They both have same dimension.

Github Inference model loading and using:

parameters, forward_fn, tokenizer, config = get_pretrained_model(
model_name=model_name,
embeddings_layers_to_save=(20,),
attention_maps_to_save=((1, 4), (7, 18)),
max_positions=26,
# output_hidden_states=True,
# If the progress bar gets stuck at the start of the model wieghts download,
# you can set verbose=False to download without the progress bar.
verbose=True
)

Hugging Face Inference:

outs = agro_nt_model(
torch_batch_tokens,
attention_mask=attention_mask,
encoder_attention_mask=attention_mask,
output_hidden_states=True,
)

dallatt · 2024-08-19T08:55:30Z

Hello @NikeeShrestha

What is referred to as embeddings in the output of the model on this github is stictly equivalent to the hidden_states in the output of the model on Hugging Face. HuggingFace will return the embeddings coming out of every transformer block while here you specify the ones you want.

If you compare the last hidden state out of the HF model with the embedding from the 40th layer of the agro_nt model, you should get the same value!

Do not hesitate if you have any other questions :)

hongruhu · 2024-08-22T07:19:22Z

Hi just wanted to ask a follow-up question:

I was wondering when using the hidden_state from Hugging Face models, would the embedding be the output from the last layer?

For example,
(1) for the 500m human model, the embedding should be output['hidden_state'][-1] (25th layer including the 1st encoding layer) with the shape of [batch_size, max_length, 1280],
(2) for the 2.5B multi species model, it should be output['hidden_state'][-1] (33th layer) with the shape of [batch_size, max_length, 2560].

If so, I was wondering why the readme file of the github main page used '20' for the 500m human model:

# Get pretrained model
parameters, forward_fn, tokenizer, config = get_pretrained_model(
    model_name="500M_human_ref",
    embeddings_layers_to_save=(20,),
    max_positions=32,
)
forward_fn = hk.transform(forward_fn)

For the embedding, should we use the [max_length, 1280] 2d matrix as each sequence's embedding? or should we average on the max_length dimension to make each sequence's embedding become a 1280-element vector?

dallatt · 2024-08-22T14:25:53Z

Hello @hongruhu ,

The 20th layer is an arbitrary choice as embedding of intermediate layers can also be interesting to use. Indeed if you want the final embedding layer of the "500m human model", since there are 24 layers, you should use embeddings_layers_to_save=(24,).

As for the representation, a very common practice is to average the embeddings of the tokens across the sequence length dimension! You can an example of this in the example notebook here.

Best regards,
Hugo

dallatt closed this as completed Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings and hidden states of Agro-NT model (New to this field so please excuse my question if it is really naive/stupid.) #70

Embeddings and hidden states of Agro-NT model (New to this field so please excuse my question if it is really naive/stupid.) #70

NikeeShrestha commented Jun 14, 2024

dallatt commented Aug 19, 2024

hongruhu commented Aug 22, 2024

dallatt commented Aug 22, 2024

Embeddings and hidden states of Agro-NT model (New to this field so please excuse my question if it is really naive/stupid.) #70

Embeddings and hidden states of Agro-NT model (New to this field so please excuse my question if it is really naive/stupid.) #70

Comments

NikeeShrestha commented Jun 14, 2024

dallatt commented Aug 19, 2024

hongruhu commented Aug 22, 2024

dallatt commented Aug 22, 2024