Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prognostic run: consolidate_metadata step failures #1942

Open
brianhenn opened this issue Jul 19, 2022 · 0 comments
Open

Prognostic run: consolidate_metadata step failures #1942

brianhenn opened this issue Jul 19, 2022 · 0 comments

Comments

@brianhenn
Copy link
Contributor

brianhenn commented Jul 19, 2022

The postprocessing of the prognostic run output has been failing several times for me recently when it tries to consolidate the metdata of an appended zarr from a segmented run. This leaves a recoverable zarr but with an incorrect consolidated metadata file, and obviously stops the job between segments with only some zarrs updated. Very roughly recently it seems like about 5-10% of the time a C384 segment's data are appended, this error appears. (It doesn't seem to happen for C48 runs as often.)

fs.cat appears to be failing with an odd message ("User project specified in the request is invalid."). I wonder if the C384 runs are making too many gcsfs API calls.

e.g.:

INFO:/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py:Consolidating metadata of vcm-ml-experiments/c384-ml/2022-07-18/nn-seed-0-c384-run/fv3gfs_run/physics_tendencies.zarr
  File "/usr/local/lib/python3.8/dist-packages/gcsfs/retry.py", line 115, in retry_request
  File "<decorator-gen-2>", line 2, in _request
    status, headers, info, contents = await self._request(
  File "/usr/local/lib/python3.8/dist-packages/gcsfs/core.py", line 386, in _call
    headers, out = await self._call("GET", u2, headers=head)
  File "/usr/local/lib/python3.8/dist-packages/gcsfs/core.py", line 735, in _cat_file
    return await fut
  File "/usr/lib/python3.8/asyncio/tasks.py", line 455, in wait_for
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 402, in _cat
    result[0] = await coro
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 25, in _runner
    raise return_result
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 71, in sync
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 91, in wrapper
    return json_loads(fs.cat(url))
  File "/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py", line 38, in maybe_get
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    metadata_with_nan = dict(zip(keys_to_get, values))
  File "/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py", line 45, in _get_metadata_fs
    meta = _get_metadata_fs(fs, root)
  File "/fv3net/workflows/post_process_run/fv3post/consolidate_metadata.py", line 53, in consolidate_metadata
    consolidate_metadata(fs, absolute_target_paths[0])
  File "/fv3net/workflows/post_process_run/fv3post/append.py", line 249, in append_zarr_along_time
    append_zarr_along_time(tmp_rundir_file, destination_file, fs)
  File "/fv3net/workflows/post_process_run/fv3post/append.py", line 282, in append_segment
    append_segment(
  File "/fv3net/workflows/prognostic_c48_run/runtime/segmented_run/append.py", line 80, in append_segment_to_run_url
    sys.exit(api.append(url))
  File "/fv3net/workflows/prognostic_c48_run/runtime/segmented_run/cli.py", line 57, in append
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 754, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1395, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1659, in invoke
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1053, in main
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1128, in __call__
    load_entry_point('prognostic-run', 'console_scripts', 'runfv3')()
  File "/usr/local/bin/runfv3", line 11, in <module>
Traceback (most recent call last):
    return await func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/gcsfs/core.py", line 378, in _request
    validate_response(status, contents, path, args)
  File "/usr/local/lib/python3.8/dist-packages/gcsfs/retry.py", line 100, in validate_response
    raise ValueError("Bad Request: %s\n%s" % (path, msg))
ValueError: Bad Request: https://storage.googleapis.com/download/storage/v1/b/vcm-ml-experiments/o/c384-ml%2F2022-07-18%2Fnn-seed-0-c384-run%2Ffv3gfs_run%2Fphysics_tendencies.zarr%2Ftendency_of_eastward_wind_due_to_fv3_physics%2F.zattrs?alt=media
User project specified in the request is invalid.
Error: exit status 1

Run on f59deea

@brianhenn brianhenn changed the title Prognostic run: ` Prognostic run: consolidate_metadata step failures Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant