Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings and hidden states of Agro-NT model (New to this field so please excuse my question if it is really naive/stupid.) #70

Closed
NikeeShrestha opened this issue Jun 14, 2024 · 3 comments

Comments

@NikeeShrestha
Copy link

How do embeddings layers saved from inference notebook in the github and hidden states from the Hugging face inference notebook differ from each other? When I compare the these two outputs for a sequence, they are different. If I want to do downstreams classification task, which one should be best to work with? They both have same dimension.

Github Inference model loading and using:

parameters, forward_fn, tokenizer, config = get_pretrained_model(
model_name=model_name,
embeddings_layers_to_save=(20,),
attention_maps_to_save=((1, 4), (7, 18)),
max_positions=26,
# output_hidden_states=True,
# If the progress bar gets stuck at the start of the model wieghts download,
# you can set verbose=False to download without the progress bar.
verbose=True
)

Hugging Face Inference:

outs = agro_nt_model(
torch_batch_tokens,
attention_mask=attention_mask,
encoder_attention_mask=attention_mask,
output_hidden_states=True,
)

@dallatt
Copy link
Collaborator

dallatt commented Aug 19, 2024

Hello @NikeeShrestha

What is referred to as embeddings in the output of the model on this github is stictly equivalent to the hidden_states in the output of the model on Hugging Face. HuggingFace will return the embeddings coming out of every transformer block while here you specify the ones you want.

If you compare the last hidden state out of the HF model with the embedding from the 40th layer of the agro_nt model, you should get the same value!

Do not hesitate if you have any other questions :)

@dallatt dallatt closed this as completed Aug 19, 2024
@hongruhu
Copy link

Hi just wanted to ask a follow-up question:

I was wondering when using the hidden_state from Hugging Face models, would the embedding be the output from the last layer?

For example,
(1) for the 500m human model, the embedding should be output['hidden_state'][-1] (25th layer including the 1st encoding layer) with the shape of [batch_size, max_length, 1280],
(2) for the 2.5B multi species model, it should be output['hidden_state'][-1] (33th layer) with the shape of [batch_size, max_length, 2560].

If so, I was wondering why the readme file of the github main page used '20' for the 500m human model:

# Get pretrained model
parameters, forward_fn, tokenizer, config = get_pretrained_model(
    model_name="500M_human_ref",
    embeddings_layers_to_save=(20,),
    max_positions=32,
)
forward_fn = hk.transform(forward_fn)

For the embedding, should we use the [max_length, 1280] 2d matrix as each sequence's embedding? or should we average on the max_length dimension to make each sequence's embedding become a 1280-element vector?

@dallatt
Copy link
Collaborator

dallatt commented Aug 22, 2024

Hello @hongruhu ,

The 20th layer is an arbitrary choice as embedding of intermediate layers can also be interesting to use. Indeed if you want the final embedding layer of the "500m human model", since there are 24 layers, you should use embeddings_layers_to_save=(24,).

As for the representation, a very common practice is to average the embeddings of the tokens across the sequence length dimension! You can an example of this in the example notebook here.

Best regards,
Hugo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants