You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is from google/orbax#523 (comment)
I have the same problem with this man.
He says,
Hi, I am having some trouble saving a checkpoint. My code is very simple:
and works fine on my local machine. However, when running on an HPC cluster, I get the following error:
ValueError: FAILED_PRECONDITION: Error writing local file "/home/oeberhard/laurel/dat/runs/test/working_directories/1/actor_params.orbax-checkpoint-tmp-1695894959581296/_sharding": Failed to acquire lock on file: /home/oeberhard/laurel/dat/runs/test/working_directories/1/actor_params.orbax-checkpoint-tmp-1695894959581296/_sharding.__lock [OS error: Function not implemented] [source locations='tensorstore/kvstore/file/file_key_value_store.cc:676\ntensorstore/kvstore/kvstore.cc:268']
There are no access problems i.e. checkpointing with flax.training.checkpoints works perfectly. Is there maybe a way to disable the locking? Thanks!
The text was updated successfully, but these errors were encountered:
Update file locking:
On Linux, if ::fcntl(F_OFD_SETLKW) fails with ENOSYS/ENOTSUP, fallback to ::flock.
Replace FileLockTraits with the AcquireFdLock function, which returns a function pointer used to release the lock.
Better handling of errno when releasing a file lock, which was just wrong before.
Improve comments.
This may aid locking issues with some network filesystems.
#183
PiperOrigin-RevId: 658512121
Change-Id: Ie471e01b039ad108f9906813b86cd1b1d722be45
It is from
google/orbax#523 (comment)
I have the same problem with this man.
He says,
Hi, I am having some trouble saving a checkpoint. My code is very simple:
and works fine on my local machine. However, when running on an HPC cluster, I get the following error:
There are no access problems i.e. checkpointing with
flax.training.checkpoints
works perfectly. Is there maybe a way to disable the locking? Thanks!The text was updated successfully, but these errors were encountered: