{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":539057023,"defaultBranch":"main","name":"TransformerEngine","ownerLogin":"NVIDIA","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2022-09-20T15:20:26.000Z","ownerAvatar":"https://github.com/avatars/u/1728152?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1725904185.0","currentOid":""},"activityList":{"items":[{"before":"9acad13c0b76e4e7a18a94a4a87f6200f3362162","after":"0c63affc51303ff9e1de0654739dc0c045d7274d","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-09T19:22:18.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://github.com/avatars/u/4879686?s=80&v=4"},"commit":{"message":"Merge branch 'main' into te_llama_tutorial_enhancement","shortMessageHtmlLink":"Merge branch 'main' into te_llama_tutorial_enhancement"}},{"before":"047a50722780e7b647f9107783e210021190edc3","after":"2a9845e1d93440d3c0f65427985e66208d09eff8","ref":"refs/heads/main","pushedAt":"2024-09-09T18:34:45.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://github.com/avatars/u/8398980?s=80&v=4"},"commit":{"message":"Added Adobe analytics to the documentation (#1162)\n\nSigned-off-by: Przemyslaw Tredak ","shortMessageHtmlLink":"Added Adobe analytics to the documentation (#1162)"}},{"before":"0ec5eed716fac735b15c660671fa1fb45bd4a455","after":"9acad13c0b76e4e7a18a94a4a87f6200f3362162","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-09T17:54:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pre-commit-ci[bot]","name":null,"path":"/apps/pre-commit-ci","primaryAvatarUrl":"https://github.com/avatars/in/68672?s=80&v=4"},"commit":{"message":"[pre-commit.ci] auto fixes from pre-commit.com hooks\n\nfor more information, see https://pre-commit.ci","shortMessageHtmlLink":"[pre-commit.ci] auto fixes from pre-commit.com hooks"}},{"before":"a67805d337b3114ada85d52998edc7db1a671cb0","after":"0ec5eed716fac735b15c660671fa1fb45bd4a455","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-09T17:54:18.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://github.com/avatars/u/4879686?s=80&v=4"},"commit":{"message":"Merge branch 'main' into te_llama_tutorial_enhancement","shortMessageHtmlLink":"Merge branch 'main' into te_llama_tutorial_enhancement"}},{"before":null,"after":"a67805d337b3114ada85d52998edc7db1a671cb0","ref":"refs/heads/te_llama_tutorial_enhancement","pushedAt":"2024-09-09T17:49:45.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"sudhakarsingh27","name":"Sudhakar Singh","path":"/sudhakarsingh27","primaryAvatarUrl":"https://github.com/avatars/u/4879686?s=80&v=4"},"commit":{"message":"allow tutorial to download the model weights automatically\n\nSigned-off-by: Sudhakar Singh ","shortMessageHtmlLink":"allow tutorial to download the model weights automatically"}},{"before":"bdea56fc023014eaf52a171047b641b3bfdded70","after":"047a50722780e7b647f9107783e210021190edc3","ref":"refs/heads/main","pushedAt":"2024-09-09T14:30:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"[PyTorch] Propagate fp8 scale-inverse modification to `GroupedLinear` (#1128)\n\n* propagate scale_inv modification to GroupedLinear\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* optimization for separate scale_inv of weights and single output\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* let grouped gemm support different input combinations\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* fix type\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* add contiguous check\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* use len() instead of isinstance\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* fix ut\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n---------\r\n\r\nSigned-off-by: Xin Yao \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"[PyTorch] Propagate fp8 scale-inverse modification to GroupedLinear ("}},{"before":"206c1d9220ed70bd5d4959194934e9cb6740e0fd","after":"bdea56fc023014eaf52a171047b641b3bfdded70","ref":"refs/heads/main","pushedAt":"2024-09-05T18:40:00.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Revert \"[C] Suppress 128-D warning from cudnn-frontend\" (#1161)\n\nRevert \"[C] Suppress 128-D warning from cudnn-frontend (#1158)\"\r\n\r\nThis reverts commit 206c1d9220ed70bd5d4959194934e9cb6740e0fd.\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Revert \"[C] Suppress 128-D warning from cudnn-frontend\" (#1161)"}},{"before":"e192e616a97dee458eba499814da2f930529214b","after":"3b9db334ba2c8ef1d063d8d859c6cb5b7414bd5a","ref":"refs/heads/revert-1158-suppress_compile_warning","pushedAt":"2024-09-05T18:32:40.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Revert \"[C] Suppress 128-D warning from cudnn-frontend (#1158)\"\n\nThis reverts commit 206c1d9220ed70bd5d4959194934e9cb6740e0fd.\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Revert \"[C] Suppress 128-D warning from cudnn-frontend (#1158)\""}},{"before":null,"after":"e192e616a97dee458eba499814da2f930529214b","ref":"refs/heads/revert-1158-suppress_compile_warning","pushedAt":"2024-09-05T18:30:14.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Revert \"[C] Suppress 128-D warning from cudnn-frontend (#1158)\"\n\nThis reverts commit 206c1d9220ed70bd5d4959194934e9cb6740e0fd.","shortMessageHtmlLink":"Revert \"[C] Suppress 128-D warning from cudnn-frontend (#1158)\""}},{"before":"215db88dcd294ef4ddbbb635cd51956a35fc1e4f","after":"206c1d9220ed70bd5d4959194934e9cb6740e0fd","ref":"refs/heads/main","pushedAt":"2024-09-05T18:21:33.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"[C] Suppress 128-D warning from cudnn-frontend (#1158)\n\nsuppress 128D warning from cudnn-frontend\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>","shortMessageHtmlLink":"[C] Suppress 128-D warning from cudnn-frontend (#1158)"}},{"before":"454e389502ad4ed4f90b0990a631fe12bdf968fd","after":"215db88dcd294ef4ddbbb635cd51956a35fc1e4f","ref":"refs/heads/main","pushedAt":"2024-09-05T18:03:24.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"phu0ngng","name":"Phuong Nguyen","path":"/phu0ngng","primaryAvatarUrl":"https://github.com/avatars/u/36155692?s=80&v=4"},"commit":{"message":"[PyTorch] Implement Fp8 padding and unpadding module (#1129)\n\n* [TE/PyTorch][MoE] Add FP8 padding and unpadding module \r\n\r\n 1. Add multi-tensor padding kernel for FP8 with padding size = 16.\r\n 2. Add FP8Padding and Fp8Unpadding module\r\n 3. Add Padded GroupedLinear unit tests\r\n\r\n---------\r\n\r\nSigned-off-by: beinggod \r\nCo-authored-by: Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>","shortMessageHtmlLink":"[PyTorch] Implement Fp8 padding and unpadding module (#1129)"}},{"before":"5fafeb0efef60d6f10574bb4366cdc5a8db7192d","after":"454e389502ad4ed4f90b0990a631fe12bdf968fd","ref":"refs/heads/main","pushedAt":"2024-09-05T17:54:08.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Added offloading support FP8 attention (#1131)\n\n* Added offloading support FP8 attention\r\n\r\nSigned-off-by: Selvaraj Anandaraj \r\n\r\n* Update transformer_engine/pytorch/attention.py\r\n\r\nCo-authored-by: Kirthi Shankar Sivamani \r\nSigned-off-by: Selvaraj Anandaraj \r\n\r\n* Fix\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n---------\r\n\r\nSigned-off-by: Selvaraj Anandaraj \r\nSigned-off-by: Selvaraj Anandaraj \r\nSigned-off-by: Kirthi Shankar Sivamani \r\nCo-authored-by: Selvaraj Anandaraj \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Added offloading support FP8 attention (#1131)"}},{"before":"247850e8bd9f13ccce729fd8204066193d27a167","after":"5fafeb0efef60d6f10574bb4366cdc5a8db7192d","ref":"refs/heads/main","pushedAt":"2024-09-05T05:57:30.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"yaox12","name":"Xin Yao","path":"/yaox12","primaryAvatarUrl":"https://github.com/avatars/u/3831900?s=80&v=4"},"commit":{"message":"[PyTorch] FP8 MHA with RoPE and Miscellaneous Improvements (#1100)\n\n* fp8 mha with rope\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* avoid index select in cast ops\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* avoid index select in fused_attn_fwd\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* rename is_first_module_in_mha to fp8_output\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* resolve comments\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* resolve comments\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* move transpose to backward for fp8 input\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* fix ut\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* resolve comments\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* update argument list for CP\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* fix for FA3\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* remove unnecessary copy of scale_inv\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* skip fp8 dpa/mha tests when fa3 is not available\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n* fix a merge bug\r\n\r\nSigned-off-by: Xin Yao \r\n\r\n---------\r\n\r\nSigned-off-by: Xin Yao \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"[PyTorch] FP8 MHA with RoPE and Miscellaneous Improvements (#1100)"}},{"before":"af9f2fae2206816575ee6f0cdd4ed310a8e086f0","after":"247850e8bd9f13ccce729fd8204066193d27a167","ref":"refs/heads/main","pushedAt":"2024-09-04T05:04:51.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"yaox12","name":"Xin Yao","path":"/yaox12","primaryAvatarUrl":"https://github.com/avatars/u/3831900?s=80&v=4"},"commit":{"message":"Add user to TE CI (#1155)\n\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Add user to TE CI (#1155)"}},{"before":"232101816e52fc63ca9b0885ef19db2c37770b5a","after":null,"ref":"refs/heads/dependabot/github_actions/dot-github/workflows/actions/download-artifact-4.1.7","pushedAt":"2024-09-04T00:05:26.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://github.com/avatars/in/29110?s=80&v=4"}},{"before":"ddc5774d522133e27392b515fc77650f635a7b11","after":"af9f2fae2206816575ee6f0cdd4ed310a8e086f0","ref":"refs/heads/main","pushedAt":"2024-09-04T00:05:19.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://github.com/avatars/u/8398980?s=80&v=4"},"commit":{"message":"Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows (#1154)\n\nBumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3 to 4.1.7.\r\n- [Release notes](https://github.com/actions/download-artifact/releases)\r\n- [Commits](https://github.com/actions/download-artifact/compare/v3...v4.1.7)\r\n\r\n---\r\nupdated-dependencies:\r\n- dependency-name: actions/download-artifact\r\n dependency-type: direct:production\r\n...\r\n\r\nSigned-off-by: dependabot[bot] \r\nCo-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows (#…"}},{"before":null,"after":"232101816e52fc63ca9b0885ef19db2c37770b5a","ref":"refs/heads/dependabot/github_actions/dot-github/workflows/actions/download-artifact-4.1.7","pushedAt":"2024-09-03T22:12:11.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://github.com/avatars/in/29110?s=80&v=4"},"commit":{"message":"Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows\n\nBumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3 to 4.1.7.\n- [Release notes](https://github.com/actions/download-artifact/releases)\n- [Commits](https://github.com/actions/download-artifact/compare/v3...v4.1.7)\n\n---\nupdated-dependencies:\n- dependency-name: actions/download-artifact\n dependency-type: direct:production\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows"}},{"before":"669b8164b4cb4591ed01f8ba45b4aeebc090b334","after":"a7e9d3e7d9015f9233c5e768263c8f7b9c26953e","ref":"refs/heads/release_v1.10","pushedAt":"2024-09-03T21:12:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Improvements for building wheels (#1148)\n\n* Improvements for wheels\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* fix\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* Fixes for wheel build\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* Move package finder to common\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* format\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* Fixes\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* Lint\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* fix\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* FIx\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* Fix CI and distributed test\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* fix\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n* fix paddle ci\n\nSigned-off-by: Kirthi Shankar Sivamani \n\n---------\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Improvements for building wheels (#1148)"}},{"before":"93f00a79933ed2260ee2442b602dc72d42019eed","after":"ddc5774d522133e27392b515fc77650f635a7b11","ref":"refs/heads/main","pushedAt":"2024-09-03T16:25:29.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"[PyTorch] Add contiguous check for `te_grouped_gemm` (#1146)\n\n[PyTorch] Add contiguous check for grouped gemm\r\n\r\nSigned-off-by: beinggod \r\nCo-authored-by: beinggod \r\nCo-authored-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"[PyTorch] Add contiguous check for te_grouped_gemm (#1146)"}},{"before":"9437ceb2b7947857c979d5a7a2ed60cd4e667a88","after":"93f00a79933ed2260ee2442b602dc72d42019eed","ref":"refs/heads/main","pushedAt":"2024-09-03T16:24:52.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Improvements for building wheels (#1148)\n\n* Improvements for wheels\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* fix\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Fixes for wheel build\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Move package finder to common\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* format\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Fixes\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Lint\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* fix\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* FIx\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* Fix CI and distributed test\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* fix\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n* fix paddle ci\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani \r\n\r\n---------\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Improvements for building wheels (#1148)"}},{"before":"61f8415f502e9f6bb2b0b58eb27d28921735acf3","after":"669b8164b4cb4591ed01f8ba45b4aeebc090b334","ref":"refs/heads/release_v1.10","pushedAt":"2024-09-01T15:14:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://github.com/avatars/u/8398980?s=80&v=4"},"commit":{"message":"Update cudnn-frontend to v1.6.1 (#1108)\n\n* update FE to 1.6\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* update to 1.6.1-rc for testing\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* update to fe 1.6.1\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n---------\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Update cudnn-frontend to v1.6.1 (#1108)"}},{"before":"932878643c050664ead6a2ba9b4a616a8e2bae8e","after":"61f8415f502e9f6bb2b0b58eb27d28921735acf3","ref":"refs/heads/release_v1.10","pushedAt":"2024-08-31T00:18:37.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"ptrendx","name":"Przemyslaw Tredak","path":"/ptrendx","primaryAvatarUrl":"https://github.com/avatars/u/8398980?s=80&v=4"},"commit":{"message":"Fix QKV dtype in the bwd of FP8+CP (#1134)\n\n* fix qkv_dtype of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* config cp correction dtype of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* code style change\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* always do FP8 CP correction in FP32\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n---------\r\n\r\nSigned-off-by: Xiaowei Ren \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>\r\nCo-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>","shortMessageHtmlLink":"Fix QKV dtype in the bwd of FP8+CP (#1134)"}},{"before":"aecd5a8fae4cbb73c5fa53dc607c9c83ac3626d3","after":"9437ceb2b7947857c979d5a7a2ed60cd4e667a88","ref":"refs/heads/main","pushedAt":"2024-08-30T05:44:16.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cyanguwa","name":"Charlene Yang","path":"/cyanguwa","primaryAvatarUrl":"https://github.com/avatars/u/8636796?s=80&v=4"},"commit":{"message":"Fix QKV dtype in the bwd of FP8+CP (#1134)\n\n* fix qkv_dtype of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* config cp correction dtype of FP8+CP\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* code style change\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n* always do FP8 CP correction in FP32\r\n\r\nSigned-off-by: Xiaowei Ren \r\n\r\n---------\r\n\r\nSigned-off-by: Xiaowei Ren \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>\r\nCo-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>","shortMessageHtmlLink":"Fix QKV dtype in the bwd of FP8+CP (#1134)"}},{"before":"8ddac3df41c0304ac9efe0c1e9b23c93326979a8","after":"aecd5a8fae4cbb73c5fa53dc607c9c83ac3626d3","ref":"refs/heads/main","pushedAt":"2024-08-30T05:39:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cyanguwa","name":"Charlene Yang","path":"/cyanguwa","primaryAvatarUrl":"https://github.com/avatars/u/8636796?s=80&v=4"},"commit":{"message":"[PyTorch] Fix FP8 logic related to FA2/FA3 (#1141)\n\n* fix FP8 logic when FA3 is not installed\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* minor tweak to make logic more explicit\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* minor fixes\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* limit FA3 warning to Hopper and NVTE_FLASH_ATTN=1\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* prefer fused attn for FP8\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n---------\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>","shortMessageHtmlLink":"[PyTorch] Fix FP8 logic related to FA2/FA3 (#1141)"}},{"before":null,"after":"3844009d62940e0844f7779a717a4e5c52777113","ref":"refs/heads/proxy-tensor","pushedAt":"2024-08-30T01:07:42.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"timmoon10","name":"Tim Moon","path":"/timmoon10","primaryAvatarUrl":"https://github.com/avatars/u/4406448?s=80&v=4"},"commit":{"message":"Fix linter warnings\n\nSigned-off-by: Tim Moon ","shortMessageHtmlLink":"Fix linter warnings"}},{"before":"4ddb0a7bea787294282d0fe0715adf5ea4a39779","after":"8ddac3df41c0304ac9efe0c1e9b23c93326979a8","ref":"refs/heads/main","pushedAt":"2024-08-29T22:06:53.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"phu0ngng","name":"Phuong Nguyen","path":"/phu0ngng","primaryAvatarUrl":"https://github.com/avatars/u/36155692?s=80&v=4"},"commit":{"message":"[PyTorch] Remove `dtype` from args of permutation (#1145)\n\n* remove dtype from args\r\n* update docs with permutation ops\r\n\r\n---------\r\n\r\nSigned-off-by: Xin Yao ","shortMessageHtmlLink":"[PyTorch] Remove dtype from args of permutation (#1145)"}},{"before":"7fc50f489b8184fbd93efd4e48140ad0264e362b","after":"4ddb0a7bea787294282d0fe0715adf5ea4a39779","ref":"refs/heads/main","pushedAt":"2024-08-27T13:50:06.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Hide non-necessary symbols from shared object (#1136)\n\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Hide non-necessary symbols from shared object (#1136)"}},{"before":"4ec66c77752f716188eeb20059d72917946ea6b0","after":"7fc50f489b8184fbd93efd4e48140ad0264e362b","ref":"refs/heads/main","pushedAt":"2024-08-24T04:13:43.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"ksivaman","name":"Kirthi Shankar Sivamani","path":"/ksivaman","primaryAvatarUrl":"https://github.com/avatars/u/36168853?s=80&v=4"},"commit":{"message":"Bump cudnn-frontend version to 1.6.1 (#1133)\n\nbump cudnn-frontend version\r\n\r\nSigned-off-by: Kirthi Shankar Sivamani ","shortMessageHtmlLink":"Bump cudnn-frontend version to 1.6.1 (#1133)"}},{"before":"901e5d2b335878aa11f81dab5ddb12fbfad4322a","after":"4ec66c77752f716188eeb20059d72917946ea6b0","ref":"refs/heads/main","pushedAt":"2024-08-24T00:01:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"timmoon10","name":"Tim Moon","path":"/timmoon10","primaryAvatarUrl":"https://github.com/avatars/u/4406448?s=80&v=4"},"commit":{"message":"Let user limit number of architectures, to improve build time (#1126)\n\n* Limit number of architectures build\r\n\r\nSigned-off-by: Lukasz Pierscieniewski \r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Lukasz Pierscieniewski \r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>\r\nCo-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>","shortMessageHtmlLink":"Let user limit number of architectures, to improve build time (#1126)"}},{"before":"2215fa5c7557b66034068816020f9f611019e457","after":"901e5d2b335878aa11f81dab5ddb12fbfad4322a","ref":"refs/heads/main","pushedAt":"2024-08-23T22:00:41.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cyanguwa","name":"Charlene Yang","path":"/cyanguwa","primaryAvatarUrl":"https://github.com/avatars/u/8636796?s=80&v=4"},"commit":{"message":"Add support for flash-attn 3 (#1019)\n\n* WIP: add fa3\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* WIP: clean up\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* WIP: add benchmarks\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* differentiate func/varlen_func\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* WIP: add FP8 fwd support\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* add FA3 FP8 fwd code and test\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* fix assert for FA3\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* fix FA3 FP8 logic and add tests\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* update FA2 to <=2.6.3\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* tweak unit tests for base/mask\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* fix lint\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* fix lint\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n* fix lint\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* set constraints for FA3 for sm90 and causal_bottom_right\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* revert debug changes in benchmark script\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\n\r\n* [pre-commit.ci] auto fixes from pre-commit.com hooks\r\n\r\nfor more information, see https://pre-commit.ci\r\n\r\n---------\r\n\r\nSigned-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>\r\nCo-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>","shortMessageHtmlLink":"Add support for flash-attn 3 (#1019)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEsUbrGgA","startCursor":null,"endCursor":null}},"title":"Activity · NVIDIA/TransformerEngine"}