-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
some little problems with the tutorial DINCAE_tutorial.ipynb #9
Comments
Forcing using Pkg did not resolve the issue. Packages:
[6e4b80f9] BenchmarkTools v1.3.1 |
I had previously problems to read the netCDF from the OPEnDAP URL, while it worked on a local file. Does opening a local file works for you? |
Yep, opening local file is ok (at least no crash) |
I added now these instructions:
Do you have an error message? For instance when running this line in the REPL. I cannot reproduce it on my system:
I rarely use jupyter notebook anymore because troubleshooting is sometimes quite challenging (e.g. crash without error message). |
If run in REPL, it kills the Julia session (no time to see the error message) |
What happen if you first open a Windows Cmd window, then type Does e.g. this URL work? NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1") Maybe it is related to |
Same crash with NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1") (and I already took out the #fillmismatch to be sure) Running julia.exe from a command window does the trick and provides the following error message NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1") Assertion failed: ocpanic(("state->auth.curlflags.cookiejar != NULL")), file ocinternal.c, line 566 signal (22): SIGABRT |
Windows seems to have its own curl.exe ( in C:/Windows/System32, which is of course in the PATH). |
This fails also in NCDatasets' CI: (so it is not specific to a machine or to a julia version) |
Here is the line (in current NetCDF version): Maybe creating a temporary directory failed even after insisting: |
Hard-coded forward slash ? |
Looks suspicious indeed. |
And for the record, "https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch" fails with the same error? |
julia> NCDataset("https://thredds.jpl.nasa.gov/thredds/dodsC/ncml_aggregation/OceanTemperature/modis/terra/11um/4km/aggregate__MODIS_TERRA_L3_SST_THERMAL_DAILY_4KM_DAYTIME_V2019.0.ncml#fillmismatch") Assertion failed: ocpanic(("state->auth.curlflags.cookiejar != NULL")), file ocinternal.c, line 566 signal (22): SIGABRT |
I guess it is better to report it at https://github.com/Unidata/netcdf-c/issues |
Solution for this problem: |
Process advances to next problem: ArgumentError: unable to check bounds for indices of type Missing Stacktrace: 1 @info "number of missing obs In Jupyter, the notebook continues with |
Suggestion: since people will try on CPU (as I do), add a progress indicator in the outer loop on epochs of DINCAE.reconstruct (and maybe flush stdout to make sure one can see progress). |
Strange I don't see this issue on julia 1.7.2: sst_t = copy(sst);
sst_t[(qual .> 3) .& .!ismissing.(qual)] .= missing;
sst_t[.!ismissing.(sst) .& (sst_t .> 40)] .= missing;
@show size(sst_t)
# size(sst_t) = (149, 106, 7149) Maybe a bug in julia 1.7.1 fixed in 1.7.2? I have this:
Maybe you have missing?
However, running on the CPU will be really really slow. Do you see "epoch: ??? loss: ???" when running the script from the REPL? Maybe jupyter is buffering the output? Or maybe it is even not finishing the first epoch in a reasonable time? |
Indeed, |
Its definitively buffering, when I stop the kernel under Jupyter, one sees how "far" he has gone : I would force For the missing thing, if have false & missing being false and running the .jl file under REPL is still running. Edit: REPL finished and the missing problem is missing ;-). So .jl works under REPL 1.7.1. I will try to run again under Jupyter |
OK, I added flush in the main branch. I am wondering how long it took approximatively for 11 epochs on CPU. Did julia use multiple threads? The missing thinks get quite strange:
works for me in Julia 1.7.2 (in REPL and jupyter notebook) But this works in the REPL:
but fails in jupyter notebook:
with the same julia version. But this issue disappears when opening a new notebook. |
d = [missing, 1]; (d .> 3) .& .!ismissing.(d) work on both REPL and notebook 1.7.1 when loading no packages or all packages of the tutorial. |
Now also the tutorial .jl file loaded into a notebook works ! |
🤦 OK, I must have accidentally overwritten the
|
While waiting to see if the ipynb now works, |
OK it's good to have an idea of how long it takes, I tried yesterday for several hours without having a single epoch completed. Meanwhile with GPU I have memory problems. |
OK, that is very slow (but expected on CPU). I just tested on a GPU ( NVIDIA GeForce RTX 3080) for example, it take 3 minutes for 11 epochs. To reduce GPU memory: dragon2 and hercules2 have (a few) nodes with GPUs. |
Now the notebook works fine. I do not know what happened with the missing thing; maybe I did not restart the Kernel and something inconsistent remained. |
Yesterday, I tried to recompile NetCDF 4.8.1 on Windows. I planned to debug with gdb to see why we have these crashes on windows. But I could not get msys2 gdb to be installed (msys2/MINGW-packages#6196 (comment)). In any case with NetCDF 4.8.1 this opendap issue is resolved but we have the more serious issue crashing when creating NetCDF4 files on Windows. I can document the .dodsrc work-around at NCDatasets and we link in the tutorial to this issue. |
I get crazy, I restarted the Kernel and now I get again Stacktrace: |
In any case, maybe it would be better to replace |
ok done here: |
When reducing to 10 epochs to run it on a cpu, calculations are done but plotting does not succeed: NetCDF error: Variable 'lon' not found in file ~/Data/SST-AlboranSea-example\Results\data-avg.nc (NetCDF error code: -49) Stacktrace: Looking at the file content fnameavg = "~/Data/SST-AlboranSea-example\Results\data-avg.nc" Group: / Dimensions Variables Probably related to save_epochs = 200:10:epochs which I forgot to change also when reducing epochs to 10... So maybe a warning in the code when there are no savings of epochs asked (to help identifying the problem) ? |
Yes, By the way, julia has this nice feature that the default value of a parameter (here save_epoch) can dependent on the actual value of other parameters (here epoch). Python might have this in future https://peps.python.org/pep-0671/. |
using Pkg
Pkg.add(url="https://github.com/gher-ulg/DINCAE.jl", rev="main")
Pkg.add(url="https://github.com/gher-ulg/DINCAE_utils.jl", rev="main")
worked without problem on my windows version 1.7.1 IJulia, but did NOT install CUDA nor Knet.
easily corrected by installing the packages by hand.
Later the kernel is killed when trying
ds = NCDataset(url)
So I guess it is again that NetCDF problem under windows ?
The text was updated successfully, but these errors were encountered: