Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Running Demo #23

Closed
AlexKashi opened this issue Apr 11, 2022 · 19 comments
Closed

Error Running Demo #23

AlexKashi opened this issue Apr 11, 2022 · 19 comments

Comments

@AlexKashi
Copy link

After following the installation instructions, I get the following error running Cuda 11.6 on an RTX 2080ti

Traceback (most recent call last):
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 332, in <module>
    main()  # pylint: disable=no-value-for-parameter
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 317, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "/home/alex/Spring-2022/CV/DogeGAN/resources/stylegan_xl/train.py", line 104, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/train.py", line 49, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/training_loop.py", line 339, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/training/loss.py", line 121, in accumulate_gradients
    loss_Gmain.backward()
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/alex/miniconda3/envs/sgxl/lib/python3.9/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 144, in backward
    grad_weight = Conv2dGradWeight.apply(grad_output, input)
  File "/home/alex/Spring-2022/CV/GAN/resources/stylegan_xl/torch_utils/ops/conv2d_gradfix.py", line 173, in forward
    return torch._C._jit_get_operation(name)(weight_shape, grad_output, input, padding, stride, dilation, groups, *flags)
RuntimeError: No such operator aten::cudnn_convolution_transpose_backward_weight
@AlexKashi AlexKashi changed the title Error Running Error Running Demo Apr 11, 2022
@xl-sr
Copy link
Contributor

xl-sr commented Apr 14, 2022

seems like an issue with your cuda-toolkit + pytorch. You could try downgrading to cuda 11.3.

@AlexKashi
Copy link
Author

Downgraded to 11.3

nvidia-smi
Driver Version: 465.31 CUDA Version: 11.3
nvcc -V
release 11.3, V11.3.58

I still get the error when running:

python train.py --outdir=./training-runs/pokemon --cfg=stylegan3-t --data=./data/pokemon64.zip --gpus=1 --batch=1 --mirror=1 --snap 10 --batch-gpu 1 --kimg 10000 --syn_layers 10

@xl-sr
Copy link
Contributor

xl-sr commented Apr 16, 2022

do you get the same error when running the StyleGAN3 repo?

@chae-won-kim
Copy link

Try removing the $HOME/.cache/torch_extensions folder and running the training code again.

I encountered the same error when my PyTorch version was 1.8. I updated my PyTorch version from 1.8 to 1.10, then tried running the training code, but still had the same issue. Once I removed $HOME/.cache/torch_extensions and re-ran the code, I was able to solve the error.

@ghost
Copy link

ghost commented Apr 19, 2022

Hi there when do they release the large model?

@xl-sr
Copy link
Contributor

xl-sr commented Apr 19, 2022

@chae-won-kim thanks for the tip, will add it to the README :)

@19Ply3 I just added the ImageNet512 and FFHQ512 models, the megapixel models will be added soon, see README.

@xl-sr xl-sr closed this as completed Apr 19, 2022
@woctezuma woctezuma mentioned this issue Apr 19, 2022
@Youzebin
Copy link

HELLO, when i removed $HOME/.cache/torch_extensions and re ran the code, i still had the saem issue, do you know why??

@Youzebin
Copy link

when i re ran the code, $HOME/.cache/torch_extensions will appear again. so i think this can not solve the problem.

@Youzebin
Copy link

HELLO, when i removed $HOME/.cache/torch_extensions and re ran the code, i still had the saem issue, do you know why??

same

@woctezuma
Copy link

I think you need to downgrade CUDA as well.

@Youzebin
Copy link

I think you need to downgrade CUDA as well
I downgraded cuda to 11.3, and re ran the code. but this problem can not be solved

@Youzebin
Copy link

I think you need to downgrade CUDA as well.

can you train the model now? if you can, did you meet the same question as me??

@chae-won-kim
Copy link

chae-won-kim commented Apr 20, 2022

@Youzebin
It would be really helpful if you shared your environment settings :/
Also, try referring to the stylegan3 troubleshooting page to set up your environment.

@Youzebin
Copy link

nvidia-smi
Driver Version: 470.82.00 CUDA Version: 11.4
nvcc -V
release 11.3, V11.3.58
pytorch1.11.0 cu113
python3.9

thank you

@chae-won-kim
Copy link

chae-won-kim commented Apr 20, 2022

nvidia-smi
Driver Version: 470.82.00 CUDA Version: 11.4
nvcc -V
release 11.3, V11.3.58
pytorch1.11.0 cu113
python3.9

What about your gcc version?

@Youzebin
Copy link

9.3.0

@ghost
Copy link

ghost commented Apr 20, 2022

Hi xl-sr does stylegan xl large model allows you to use your own pictures?

@Youzebin
Copy link

您好 xl-sr stylegan xl large model 是否允许您使用自己的图片?
yes.
Hello, it is possible to use your own pictures, but if your pictures are conditional dataset, you need to add some functions to dataset_tool.py, if they are unconditional dataset, you can prepare the data according to the readme, the pokemon in the readme is a folder of pictures, I think your own pictures are also A folder

@Youzebin
Copy link

In the meantime, this issue is closed, if you want to ask other questions, you can create a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants