Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Layers] Add qwenRope support for Qwen1.0 in CB mode #449

Merged
merged 1 commit into from
Jun 17, 2024

Conversation

abenmao
Copy link
Contributor

@abenmao abenmao commented Jun 14, 2024

No description provided.


#pragma omp parallel for collapse(2)
for (int head = 0; head < heads; ++head) {
for (int seq = 0; seq < totSeqLen; ++seq) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For next step, considering first token, should we swap the 2 loops to make each thread accessing contiguous memory? may deserve to test such implementation.
OK for current version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it~

@changqi1
Copy link
Contributor

changqi1 commented Jun 17, 2024

I think this kernel APIs on continuous batching version and on continuous batching version are the same. Next step. we could merge two into one kernel API.
OK for current version.

@abenmao
Copy link
Contributor Author

abenmao commented Jun 17, 2024

I think this kernel APIs on continuous batching version and on continuous batching version are the same. Next step. we could merge two into one kernel API. OK for current version.

Yes, maybe we can remove the older version in the next step.

@abenmao abenmao merged commit 8bd8d68 into intel:main Jun 17, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants