You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm performing (offline) Bayesian Optimization using pymc Marginal GP. Maximizing the (expected improvement) acquisition function requires making repeated predictions of the GP mean and variance at various points in the data domain. But I've noticed this gets slower and slower as it proceeds. For instance, the first 50 calls take ~30 seconds but the first ~800 calls take ~10 minutes. I then re-fit a new, separate GP and maximize again. The next 800 calls take 19 minutes. Re-fit, re-maximize, 800 calls: 26 minutes. There are only 5 data points at the moment and two X dimensions (one y dimension), so it shouldn't be a data-scaling issue. Re-starting the Python session doesn't fix the issue, but re-starting the computer does.
I'm suspicious this is a pytensor cache issue. At one point I got a warning message WARNING (pytensor.link.c.cmodule): Deleting (Broken cache directory [EOF]): /home/john/.pytensor/compiledir_Linux-6.5--gcp-x86_64-with-glibc2.35-x86_64-3.11.4-64/tmpe71kqgp0. Curious, I looked at that ./pytensor/compileddir... directory mid-run, and and the ~1200 sub-directories comprise 380 MB and growing. There also seems to be an accumulation in RAM overhead, though that's harder to quantify (but on the order of 20-30 GB of RAM in use while the VM is idle after a week or two of this kind of workflow). Is pytensor scanning this cache with every call to predict? Any ideas how to avoid this, or stop it from slowing things down?
Reproduceable code example:
# Sorry, I know this isn't an actual reproducible example, I'll work on putting one together.importpymcaspmfromtqdm.autoimporttrangedefdefine_GP_model(...):
...
gp=pm.gp.Marginal(...)
returnmodel, gpdeffit_GP(model):
...
withmodel:
MAP=pm.find_MAP()
for_intrange(1000):
gp.predict(
[16x2array], point=MAP, diag=True, pred_noise=True
)
for_intrange(1000):
gp.predict(
[16x2array], point=MAP, diag=True, pred_noise=True
)
for_intrange(1000):
gp.predict(
[16x2array], point=MAP, diag=True, pred_noise=True
)
Describe the issue:
I'm performing (offline) Bayesian Optimization using pymc Marginal GP. Maximizing the (expected improvement) acquisition function requires making repeated predictions of the GP mean and variance at various points in the data domain. But I've noticed this gets slower and slower as it proceeds. For instance, the first 50 calls take ~30 seconds but the first ~800 calls take ~10 minutes. I then re-fit a new, separate GP and maximize again. The next 800 calls take 19 minutes. Re-fit, re-maximize, 800 calls: 26 minutes. There are only 5 data points at the moment and two X dimensions (one y dimension), so it shouldn't be a data-scaling issue. Re-starting the Python session doesn't fix the issue, but re-starting the computer does.
I'm suspicious this is a
pytensor
cache issue. At one point I got a warning messageWARNING (pytensor.link.c.cmodule): Deleting (Broken cache directory [EOF]): /home/john/.pytensor/compiledir_Linux-6.5--gcp-x86_64-with-glibc2.35-x86_64-3.11.4-64/tmpe71kqgp0
. Curious, I looked at that./pytensor/compileddir...
directory mid-run, and and the ~1200 sub-directories comprise 380 MB and growing. There also seems to be an accumulation in RAM overhead, though that's harder to quantify (but on the order of 20-30 GB of RAM in use while the VM is idle after a week or two of this kind of workflow). Ispytensor
scanning this cache with every call topredict
? Any ideas how to avoid this, or stop it from slowing things down?Reproduceable code example:
Error message:
No response
PyMC version information:
pymc: 5.8.2
pymc-base: 5.12.0
pytensor: 2.16.3
pytensor-base: 2.19.0
Linux OS, pymc installed via mamba/conda
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: