Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto move input to proper device for inference #1412

Closed
tcwalther opened this issue Apr 8, 2020 · 7 comments · Fixed by #1905
Closed

Auto move input to proper device for inference #1412

tcwalther opened this issue Apr 8, 2020 · 7 comments · Fixed by #1905
Assignees
Labels
discussion In a discussion stage feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement
Milestone

Comments

@tcwalther
Copy link

tcwalther commented Apr 8, 2020

Does PyTorch Lightning provide abstractions for inference? In particular, does it provide ways of automatically handling the transfer to/from GPU when I call model(x), or do I need to roll my own code for that?

Example Use Case

I have a use case where I train a model on slices of a sliding window of an audio spectrogram (i.e., let's say 1 second chunks). When training is finished, I'd like to see the performance of the model on an entire file. Pseudocode:

# generate training data
X, Y = [], []
for audio_file in audio_files:
    for x, y in sliding_window(audio_file):
        X.append(x); Y.append(y)
X, Y = shuffle(X, Y)  # shuffle the slices of all files

# Train model on slices
model = ExampleModel(X, Y)
trainer = Trainer(gpus=1)
trainer.fit(model)

# Plot the performance on a whole test file:
test_Y = []
for x, _ in sliding_window(test_file)
    test_Y.append(model(x))
plt.plot(test_Y)

Notice that during training, the notion of a file is entirely gone, but when I plot my test file, I reintroduce that. Of course, in my real code, my training data X, Y is split into training, validation and test, as usual. The plotting step is an additional verification; sort of like putting the pieces together.

Problem

When the model runs on the GPU, The last part of the code becomes:

# Plot the performance on a whole test file:
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
test_Y = []
for x, _ in sliding_window(test_file)
    y = model(x.to(device)).cpu()
    test_Y.append(y)
plt.plot(test_Y)

This isn't the end of the world, but it's not as nice as the other code that PyTorch Lightning helped me refactor. I also can't call x.type_as(...) since in that loop, I have no reference type that lives on the CPU/GPU that I could refer to (or maybe I can, but I haven't figured it out).

A workaround to this is to save the model and load it again, on a CPU.

# Train model on slices
# ...
trainer.fit(model)
trainer.save_checkpoint("model.ckpt")
model = ExampleModel.load_from_checkpoint("model.ckpt")

# Plot the performance on a whole test file:
model.eval()
test_Y = []
for x, _ in sliding_window(test_file)
    test_Y.append(model(x))
plt.plot(test_Y)

While this removes the noise of the .to(device) and .cpu() calls, it adds the overhead of having to save the model every time. I also still have to manually call model.eval(). The use case of running my model on an entire audio file is not for metrics but for visual inspection; as such I always only sample a few audio files. Running the model on a CPU instead of a GPU for inference thus isn't a problem.

Question

Is there a more elegant way to achieve the above?

@tcwalther tcwalther added the question Further information is requested label Apr 8, 2020
@Borda
Copy link
Member

Borda commented Apr 8, 2020

it may be a nice feature to have ddp/tpu also for inference time...
any thoughts @PyTorchLightning/core-contributors @williamFalcon?

@Borda Borda added feature Is an improvement or enhancement discussion In a discussion stage help wanted Open to be worked on labels Apr 8, 2020
@williamFalcon
Copy link
Contributor

We should definitely automate this!
Let's do it?

@williamFalcon williamFalcon added the let's do it! approved to implement label Apr 9, 2020
@Borda Borda self-assigned this Apr 9, 2020
@Borda
Copy link
Member

Borda commented Apr 9, 2020

@tcwalther pls let me bring some light to it... what are the goal, always use the best you have responsible (GPU/TPU)?

thinking about the inference, the case I see could be to have it as a parameter... because in my case (using notebook) I have GPU but most of the "production" models won't fit there so it will crash
also, you may want to keep the output on GPU so by the inference moving it back to CPU and back to GPU does not make sense

@williamFalcon
Copy link
Contributor

williamFalcon commented Apr 9, 2020

whenever a lightningModule is used

model(x)

we put x on the proper device...

@tcwalther this is what you mean no?

@williamFalcon williamFalcon added this to the 0.7.3 milestone Apr 9, 2020
@Borda
Copy link
Member

Borda commented Apr 9, 2020

we put x on the proper device...

does it also mean that we shall estimate if the model fits available device?
just thinking about the case with a notebook with low-end GPU so the general request will return GPU is available but it does not reflect that the model does not fit in...

@tcwalther ^^

@williamFalcon williamFalcon changed the title Does pytorch-lightning provide helpers for inference, particularly regarding CPU/GPU transfer of data? Auto move input to proper device for inference Apr 10, 2020
@Borda Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020
@Borda Borda modified the milestones: 0.7.6, 0.8.0, 0.7.7 May 12, 2020
@Borda Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020
@Borda Borda modified the milestones: 0.8.0, 0.9.0 Jun 9, 2020
@Borda
Copy link
Member

Borda commented Jun 11, 2020

@tcwalther @JonathanSchmidt1 it is close to #1467, right?

@JonathanSchmidt1
Copy link

@tcwalther @JonathanSchmidt1 it is close to #1467, right?

Yes I assume fixing this, would have also fixed my problem, although a recent update of pytorch lightning already fixed my problem and I could remove my code.

@Borda Borda modified the milestones: 0.9.0, 0.8.0 Jun 18, 2020
@Borda Borda removed the question Further information is requested label Dec 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion In a discussion stage feature Is an improvement or enhancement help wanted Open to be worked on let's do it! approved to implement
Projects
None yet
4 participants