Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap ChunkedEncodingError from UCObjectStore #3321

Merged
merged 4 commits into from
May 24, 2024

Conversation

irenedea
Copy link
Contributor

@irenedea irenedea commented May 24, 2024

What does this PR do?

Wraps ChunkedEncodingError in ObjectStoreTransientError to make it retry-able.

Part of handling the following error from data preparation in foundry:

Traceback (most recent call last):
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 210, in _process_chunk
    return [fn(*args) for args in chunk]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/process.py", line 210, in <listcomp>
    return [fn(*args) for args in chunk]
            ^^^^^^^^^
  File "/llm-foundry/scripts/data_prep/convert_text_to_mds.py", line 233, in download_and_convert_starargs
    return download_and_convert(*args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/llm-foundry/scripts/data_prep/convert_text_to_mds.py", line 296, in download_and_convert
    for sample in tqdm(dataset):
  File "/usr/lib/python3/dist-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/lib/python3/dist-packages/llmfoundry/data/data.py", line 120, in __iter__
    for sample in self.hf_dataset:
  File "/usr/lib/python3/dist-packages/llmfoundry/utils/data_prep_utils.py", line 113, in __iter__
    self.object_store.download_object(
  File "/usr/lib/python3/dist-packages/composer/utils/object_store/uc_object_store.py", line 189, in download_object
    for chunk in iter(lambda: resp.read(64 * 1024 * 1024), b''):
  File "/usr/lib/python3/dist-packages/composer/utils/object_store/uc_object_store.py", line 189, in <lambda>
    for chunk in iter(lambda: resp.read(64 * 1024 * 1024), b''):
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/databricks/sdk/core.py", line 405, in read
    self._buffer = next(self._content)
                   ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read, 23012 more expected)', IncompleteRead(0 bytes read, 23012 more expected))

Next steps, add retry logic in foundry.

What issue(s) does this change relate to?

Before submitting

  • Have you read the contributor guidelines?
  • Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
  • Did you update any related tests and add any new tests related to your change? (see testing)
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

@irenedea irenedea requested a review from dakinggg May 24, 2024 19:23
@irenedea irenedea enabled auto-merge (squash) May 24, 2024 21:25
@irenedea irenedea merged commit 4c56037 into mosaicml:dev May 24, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants