Transformer tutorial multiplying with sqrt(d_model) #2849

RogerJL · 2024-04-27T07:45:10Z

tutorials/beginner_source/translation_transformer.py

Line 135 in 5e772fa

return self.embedding(tokens.long()) * math.sqrt(self.emb_size)

src = self.embedding(src) * math.sqrt(self.d_model)

shouln't this be

src = self.embedding(src) / math.sqrt(self.d_model)

at least that is the impression I got when reading the "Attention is all you need" paper.
Or is there some new research finding that multiplying is better?

cc @sekyondaMeta @svekars @kit1980 @subramen @albanD

IvanLauLinTiong · 2024-06-04T17:26:09Z

/assigntome

kit1980 · 2024-06-10T22:22:31Z

Looking online, there are some discussions and looks like * is actually correct.
https://stackoverflow.com/questions/56930821/why-does-embedding-vector-multiplied-by-a-constant-in-transformer-model

I'm closing this, if someone more knowledgeable wants to correct me - please re-open.

RogerJL · 2024-06-11T09:15:25Z

Noted that this scaling was in a different place than the scaling in "Attention is all you need"
When rereading the paper, I noticed this hidden in 3.4 Embeddings and Softmax text

"In the embedding layers, we multiply those weights by √dmodel."

Closing is correct!

svekars added intro core Tutorials of any level of difficulty related to the core pytorch functionality labels May 15, 2024

sekyondaMeta added easy docathon-h1-2024 and removed intro core Tutorials of any level of difficulty related to the core pytorch functionality labels Jun 4, 2024

github-actions bot assigned IvanLauLinTiong Jun 4, 2024

IvanLauLinTiong mentioned this issue Jun 8, 2024

Fix Translation Transformer Tutorial #2915

Closed

4 tasks

kit1980 closed this as completed Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformer tutorial multiplying with sqrt(d_model) #2849

Transformer tutorial multiplying with sqrt(d_model) #2849

RogerJL commented Apr 27, 2024 •

edited by pytorch-bot bot

Loading

IvanLauLinTiong commented Jun 4, 2024

kit1980 commented Jun 10, 2024

RogerJL commented Jun 11, 2024

Transformer tutorial multiplying with sqrt(d_model) #2849

Transformer tutorial multiplying with sqrt(d_model) #2849

Comments

RogerJL commented Apr 27, 2024 • edited by pytorch-bot bot Loading

IvanLauLinTiong commented Jun 4, 2024

kit1980 commented Jun 10, 2024

RogerJL commented Jun 11, 2024

RogerJL commented Apr 27, 2024 •

edited by pytorch-bot bot

Loading