Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: Unable to synchronously open file (file signature not found) #96

Open
1 task
ashbate opened this issue Aug 3, 2024 · 9 comments
Open
1 task
Labels
bug Something isn't working

Comments

@ashbate
Copy link

ashbate commented Aug 3, 2024

Bug Report

Description

When bm.extract() or bm.raster() methods are used, it can not generate the data.

Reproducibility

  • The bug is reproducible.

Steps to Reproduce

Calling the methods on jupyter notebook produces this error. I tried both on my computer and google colab. It looks like it is an OS error related to h5py


OSError                                   Traceback (most recent call last)
Cell In[16], line 2
      1 # f.close()
----> 2 ntl_r = bm_raster(
      3     continental_us,
      4     product_id="VNP46A2",
      5     date_range="2023-01-01",
      6     bearer=bearer,
      7     variable="Gap_Filled_DNB_BRDF-Corrected_NTL",
      8 )

File /opt/anaconda3/lib/python3.11/site-packages/pydantic/validate_call_decorator.py:60, in validate_call.<locals>.validate.<locals>.wrapper_function(*args, **kwargs)
     58 @functools.wraps(function)
     59 def wrapper_function(*args, **kwargs):
---> 60     return validate_call_wrapper(*args, **kwargs)

File /opt/anaconda3/lib/python3.11/site-packages/pydantic/_internal/_validate_call.py:96, in ValidateCallWrapper.__call__(self, *args, **kwargs)
     95 def __call__(self, *args: Any, **kwargs: Any) -> Any:
---> 96     res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
     97     if self.__return_pydantic_validator__:
     98         return self.__return_pydantic_validator__(res)

File /opt/anaconda3/lib/python3.11/site-packages/blackmarble/raster.py:355, in bm_raster(gdf, product_id, date_range, bearer, variable, drop_values_by_quality_flag, check_all_tiles_exist, output_directory, output_skip_if_exists)
    351 filenames = _pivot_paths_by_date(pathnames).get(date)
    353 try:
    354     # Open each GeoTIFF file as a DataArray and store in a list
--> 355     da = [
    356         rioxarray.open_rasterio(
    357             h5_to_geotiff(
    358                 f,
    359                 variable=variable,
    360                 drop_values_by_quality_flag=drop_values_by_quality_flag,
    361                 output_directory=d,
    362             ),
    363         )
    364         for f in filenames
    365     ]
    366     ds = merge_arrays(da)
    367     clipped_dataset = ds.rio.clip(
    368         gdf.geometry.apply(mapping), gdf.crs, drop=True
    369     )

File /opt/anaconda3/lib/python3.11/site-packages/blackmarble/raster.py:357, in <listcomp>(.0)
    351 filenames = _pivot_paths_by_date(pathnames).get(date)
    353 try:
    354     # Open each GeoTIFF file as a DataArray and store in a list
    355     da = [
    356         rioxarray.open_rasterio(
--> 357             h5_to_geotiff(
    358                 f,
    359                 variable=variable,
    360                 drop_values_by_quality_flag=drop_values_by_quality_flag,
    361                 output_directory=d,
    362             ),
    363         )
    364         for f in filenames
    365     ]
    366     ds = merge_arrays(da)
    367     clipped_dataset = ds.rio.clip(
    368         gdf.geometry.apply(mapping), gdf.crs, drop=True
    369     )

File /opt/anaconda3/lib/python3.11/site-packages/blackmarble/raster.py:177, in h5_to_geotiff(f, variable, drop_values_by_quality_flag, output_directory)
    174 if variable is None:
    175     variable = VARIABLE_DEFAULT.get(product_id)
--> 177 with h5py.File(f, "r") as h5_data:
    178     attrs = h5_data.attrs
    179     data_field_key = "HDFEOS/GRIDS/VNP_Grid_DNB/Data Fields"

File /opt/anaconda3/lib/python3.11/site-packages/h5py/_hl/files.py:567, in File.__init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, fs_strategy, fs_persist, fs_threshold, fs_page_size, page_buf_size, min_meta_keep, min_raw_keep, locking, alignment_threshold, alignment_interval, meta_block_size, **kwds)
    558     fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0,
    559                      locking, page_buf_size, min_meta_keep, min_raw_keep,
    560                      alignment_threshold=alignment_threshold,
    561                      alignment_interval=alignment_interval,
    562                      meta_block_size=meta_block_size,
    563                      **kwds)
    564     fcpl = make_fcpl(track_order=track_order, fs_strategy=fs_strategy,
    565                      fs_persist=fs_persist, fs_threshold=fs_threshold,
    566                      fs_page_size=fs_page_size)
--> 567     fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
    569 if isinstance(libver, tuple):
    570     self._libver = libver

File /opt/anaconda3/lib/python3.11/site-packages/h5py/_hl/files.py:231, in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
    229     if swmr and swmr_support:
    230         flags |= h5f.ACC_SWMR_READ
--> 231     fid = h5f.open(name, flags, fapl=fapl)
    232 elif mode == 'r+':
    233     fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5f.pyx:106, in h5py.h5f.open()

OSError: Unable to open file (file signature not found)

Environment

  • Operating System: macOS, GoogleColab
  • Browser: Google Chrome
  • Application Version/Commit: 2024.8.1

Additional Context

Possible Fix

Initially i moved the project folder to the desktop for possible read&write permission issues. It worked the first run then the error persisted.

@ashbate ashbate added the bug Something isn't working label Aug 3, 2024
@ashbate
Copy link
Author

ashbate commented Aug 3, 2024

It is partially solved. After numerous re-runs i realized it doesn't give me the error for the same .h5 file each run. I tried extracting data for a small country that consisted of 6 tiles, which worked (as i am getting this error for a 17 tile territory). For an hour or so i re-ran my code and eventually it worked.

Maybe the problem is in the httpx timeout parameters i am not really sure at this point.

@Skerre
Copy link

Skerre commented Aug 20, 2024

I have the same issue

@ashbate
Copy link
Author

ashbate commented Aug 22, 2024

I have the same issue

Hi,

I had this issue when I was in Istanbul as well. I think it is somehow connected to your internet speed, like there is a sweet spot where you can download files. I solved it using vpns to control my download speed. Although I am in italy i still get this error 8/10 of every run.

I am considering copying and pasting this library to my local and adjusting the h5py read timeout values.

@Skerre
Copy link

Skerre commented Aug 23, 2024

@ashbate Hi, thanks for your message. Indeed, I am in Istanbul but I use different networks to test this tool. Some of them are literally high speed and through UNDP (enhanced and unrestricted so to say). I will do some more experimentation today and see if I can download another area of interest. The files get downloaded, but they have 0 KB after. Something in the subsequent step seems to go wrong.

@ashbate
Copy link
Author

ashbate commented Sep 4, 2024

@Skerre Hi, i solved it. Basically downloaded the library's files and ran it locally with adding
timeout = httpx.Timeout(15, read=None)
to the top of the download.py and updating this method


def _download_file(
        self,
        name: str,
        skip_if_exists: bool = True,
    ):
        """Download NASA Black Marble file

        Parameters
        ----------
        names: str
             NASA Black Marble filename

        Returns
        -------
        filename: pathlib.Path
            Filename of downloaded data file
        """
        url = f"{self.URL}{name}"
        name = name.split("/")[-1]

        if not (filename := Path(self.directory, name)).exists() or not skip_if_exists:
            with open(filename, "wb+") as f:
                with httpx.stream(

                    "GET",
                    url,
                    headers={"Authorization": f"Bearer {self.bearer}"},
                    timeout=timeout
                ) as response:
                    total = int(response.headers["Content-Length"])
                    with tqdm(
                        total=total,
                        unit="B",
                        unit_scale=True,
                        leave=None,
                    ) as pbar:
                        pbar.set_description(f"Downloading {name}...")
                        for chunk in response.iter_raw():
                            f.write(chunk)
                            pbar.update(len(chunk))

hope it will work for you too. If not try tweaking the timeout value.

@Skerre
Copy link

Skerre commented Sep 5, 2024

@ashbate Dear ashbate,

thank you for the effort of resolving this. However, I tried your solution and it did not work yet. I added the timeout to the beginning of the script (also tried other places, within the functions) and literally replaced my _download function with yours. It does not really seem to make a timeout and goes into the same error as before.

image

image

@Skerre
Copy link

Skerre commented Sep 5, 2024

@ashbate I can share more details on what I am doing if you want

@ashbate
Copy link
Author

ashbate commented Sep 7, 2024

@Skerre Hi skerre, can you check your linkedin please?

@koichisato-dev
Copy link

@ashbate I had the same issue and partially solved it with your modification. In my case, I included donwload options in bm_extract like 'output_directory' and 'output_skip_if_exists=False'. This consumes some of your storage, but it completely works. This information is those who faces the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants