-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Issues: microsoft/DeepSpeed
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] 1-bit LAMB not compatible with bf16
bug
Something isn't working
training
#5708
opened Jun 28, 2024 by
catid
[BUG] Fine-tuned model outputs are empty.
bug
Something isn't working
training
#5706
opened Jun 28, 2024 by
IYIAscension
on Activation Checkpointing
bug
Something isn't working
training
#5704
opened Jun 28, 2024 by
ChaunceyWang
[BUG] Mixed-precision: fp16 will cast input_ids into torch.cuda.HalfTensor instead of Long or Int.
#5701
opened Jun 28, 2024 by
zhaoyang02
Tensor(hidden states)missing across GPU in Pipeline Parallelism Training[BUG]
bug
Something isn't working
training
#5696
opened Jun 25, 2024 by
Youngluc
[BUG] Regression: 0.14.3 causes grad_norm to be zero
bug
Something isn't working
training
#5692
opened Jun 21, 2024 by
rosario-purple
[ERROR] [launch.py:321:sigkill_handler] exits with return code = -11
bug
Something isn't working
training
#5690
opened Jun 21, 2024 by
shag1802
Running out of CPU memory. Dataset is loaded for each created process
bug
Something isn't working
training
#5689
opened Jun 21, 2024 by
MikeMitsios
[BUG] inference ValueError
bug
Something isn't working
inference
#5685
opened Jun 19, 2024 by
zxrneu
[BUG] Logs full of FutureWarning when training with nightly PyTorch
bug
Something isn't working
training
#5682
opened Jun 18, 2024 by
rosario-purple
[BUG] Using and Building DeepSpeedCPUAdam
bug
Something isn't working
training
#5677
opened Jun 18, 2024 by
oabuhamdan
[BUG] 'Invalidate trace cache' with Seq2SeqTrainer+predict_with_generate+Zero3
bug
Something isn't working
inference
#5662
opened Jun 14, 2024 by
Osterlohe
does DeepSpeed support AMSP (a new DP shard strategy)
enhancement
New feature or request
#5661
opened Jun 14, 2024 by
guoyejun
Fail to use zero_init to construct llama2 with deepspeed zero3 and bnb!
#5660
opened Jun 14, 2024 by
CHNRyan
RuntimeError: Error building extension 'cpu_adam', because /usr/bin/ld: can not find -lcurand,help!
#5659
opened Jun 14, 2024 by
hekaijie123
[BUG] Running llama2-7b step3 with tensor parallel and HE fails due to incompatible shapes
bug
Something isn't working
deepspeed-chat
Related to DeepSpeed-Chat
#5656
opened Jun 13, 2024 by
ShellyNR
[BUG] oneapi/ccl.hpp: No such file or directory.
bug
Something isn't working
training
#5653
opened Jun 12, 2024 by
weiji14
RuntimeError: still have inflight params[BUG]
bug
Something isn't working
training
#5648
opened Jun 12, 2024 by
iszengxin
Inference with the MoE based GPT model trained by ds_pretrain_gpt_345M_MoE128.sh [BUG]
bug
Something isn't working
inference
#5647
opened Jun 12, 2024 by
haoranlll
[BUG] File not found in autotuner cache in multi-node setting on SLURM
bug
Something isn't working
training
#5646
opened Jun 12, 2024 by
jubueche
Why doesn't deepspeed stage 3 allow a batch size of 1 with multiple GPUs?
bug
Something isn't working
training
#5645
opened Jun 12, 2024 by
AceMcAwesome77
[BUG] RuntimeError encountered when generating tokens from a Meta-Llama-3-8B-Instruct model initialized with 4-bit or 8-bit quantization
bug
Something isn't working
compression
#5644
opened Jun 11, 2024 by
Atry
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.