The influence of t (temperature) in the E5 Model paper #1588

daegonYu · 2024-06-27T06:16:58Z

Describe
Model I am using (UniLM, MiniLM, LayoutLM ...): E5

hello. I am a student studying sentence similarity.

“Paper: Text Embeddings by Weakly-Supervised Contrastive Pre-training”
While reading this paper, a question arose. The point is that t is 0.01. In the SimCSE paper, the sentence similarity is set to 0.05 for the task (STS), and in other papers, the sentence similarity is set to 0.02, but in this paper, the sentence similarity was set to 0.01. Can you tell us what effects can be achieved by lowering the temperature?

intfloat · 2024-06-28T09:06:01Z

Hi @daegonYu ,

This is a hyperparameter for tuning. Empirically we observe that lower temperature will lead to better performance but might cause training instability under float16 precision for large models. A lower temperature allows the logits to vary in a wider range and thus has more flexibility.

daegonYu · 2024-07-01T07:33:03Z

“A lower temperature allows the logits to vary in a wider range and thus has more flexibility.” This can be interpreted as saying that embeddings make it easier to learn more diverse expressions. But in "https://huggingface.co/intfloat/multilingual-e5-base"

3. Why does the cosine similarity scores distribute around 0.7 to 1.0?

This is a known and expected behavior as we use a low temperature 0.01 for InfoNCE contrastive loss.

For text embedding tasks like text retrieval or semantic similarity, what matters is the relative order of the scores instead of the absolute values, so this should not be an issue.

If embeddings can be expressed in wider range, I think cosine similarity should be distributed over a wide range. Cosine similarity is distributed between 0.7 and 1.0. It's difficult to understand because it seems like something contradictory. Simply put, I wonder why lowering the temperature allows learning a wider range of logits.

intfloat · 2024-07-01T08:29:32Z

The logits are calculated with cosine_similarity / t. Therefore, the logits will fall in [-100, 100] with t = 0.01 and [-50, 50] with t=0.02, etc.

However, this does not mean the learned cosine similarity will be in a wider range. On the contrary, the cosine similarity tends to concentrate as the temperature becomes lower.

daegonYu · 2024-07-04T02:30:00Z

All right. I understand what you said, but why does "the cosine similarity tends to concentrate as the temperature becomes lower." Can you tell if this is happening?

donglixp assigned intfloat Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The influence of t (temperature) in the E5 Model paper #1588

The influence of t (temperature) in the E5 Model paper #1588

daegonYu commented Jun 27, 2024

intfloat commented Jun 28, 2024

daegonYu commented Jul 1, 2024 •

edited

Loading

intfloat commented Jul 1, 2024

daegonYu commented Jul 4, 2024

The influence of t (temperature) in the E5 Model paper #1588

The influence of t (temperature) in the E5 Model paper #1588

Comments

daegonYu commented Jun 27, 2024

intfloat commented Jun 28, 2024

daegonYu commented Jul 1, 2024 • edited Loading

intfloat commented Jul 1, 2024

daegonYu commented Jul 4, 2024

daegonYu commented Jul 1, 2024 •

edited

Loading