Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add change from signed int to unsigned int upon copying data #33

Merged
merged 13 commits into from
Apr 12, 2022
Merged

Add change from signed int to unsigned int upon copying data #33

merged 13 commits into from
Apr 12, 2022

Conversation

martinschorb
Copy link
Contributor

I don't know if you want to test for this in all potential output formats, or if we simply trust numpy on doing it right.

Copy link
Owner

@constantinpape constantinpape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int_to_uint parameter can lead to two corner cases, please add checks that throw a value error if these occur:

  • the input dtype is not a signed integer type
  • the dtype parameter is passed to CopyVolume, but it has a different value than the correct unsigned type

cluster_tools/copy_volume/copy_volume.py Outdated Show resolved Hide resolved
cluster_tools/copy_volume/copy_volume.py Outdated Show resolved Hide resolved
cluster_tools/copy_volume/copy_volume.py Outdated Show resolved Hide resolved
cluster_tools/downscaling/downscaling_workflow.py Outdated Show resolved Hide resolved
cluster_tools/downscaling/downscaling_workflow.py Outdated Show resolved Hide resolved
cluster_tools/downscaling/downscaling_workflow.py Outdated Show resolved Hide resolved
cluster_tools/downscaling/downscaling_workflow.py Outdated Show resolved Hide resolved
Copy link
Owner

@constantinpape constantinpape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_tools/copy_volume/copy_volume.py Outdated Show resolved Hide resolved
@martinschorb
Copy link
Contributor Author

We should also add a test for this, I think the best place is https://github.com/constantinpape/cluster_tools/blob/master/test/downscaling/test_downscaling.py.

Do you want to add signed test data to the archive, or would you prefer generating it?

My suggestion would be to use the hdf5 example, simply changing its dtype and shifting its values.

@constantinpape
Copy link
Owner

My suggestion would be to use the hdf5 example, simply changing its dtype and shifting its values.

Yes, I agree that sounds like the best solution.

@martinschorb
Copy link
Contributor Author

When I try to run an import it also fails for me.

import_data/utils.py:102, in downscale(in_path, in_key, out_path, resolution, scale_factors, chunks, tmp_folder, target, max_jobs, block_shape, library, library_kwargs, metadata_format, out_key, unit, source_name, roi_begin, roi_end, int_to_uint)
    100 ret = luigi.build([t], local_scheduler=True)
    101 if not ret:
--> 102     raise RuntimeError("Downscaling failed")

Is there a way to get a more detailed idea what exactly went wrong?

@constantinpape
Copy link
Owner

It's in the test log:

 ERROR: [pid 2798] Worker Worker(salt=763965705, workers=1, host=fv-az196-745, username=runner, pid=2798) failed    CopyVolumeLocal(tmp_folder=./tmp, max_jobs=2, config_dir=./tmp/config, input_path=/home/runner/work/cluster_tools/cluster_tools/test_data/sampleA.n5, input_key=volumes/raw/s0, output_path=./tmp/data.n5, output_key=data/s0, prefix=initial_scale, dtype=None, int_to_uint=False, fit_to_roi=False, effective_scale_factor=[], dimension_separator=None, dependency=DummyTask)
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/cluster_env/lib/python3.8/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "/usr/share/miniconda/envs/cluster_env/lib/python3.8/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "/home/runner/work/cluster_tools/cluster_tools/cluster_tools/cluster_tasks.py", line 95, in run
    raise e
  File "/home/runner/work/cluster_tools/cluster_tools/cluster_tools/cluster_tasks.py", line 81, in run
    self.run_impl()
  File "/home/runner/work/cluster_tools/cluster_tools/cluster_tools/copy_volume/copy_volume.py", line 119, in run_impl
    dtype = DTYPE_MAPPING.get(dtype, dtype)
UnboundLocalError: local variable 'dtype' referenced before assignment

Copy link
Owner

@constantinpape constantinpape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good now, but still missing:

  • a check that makes sure that the dtype argument (which is available through self.dtype) is compatible with the resulting unsigned dtype if self.dtype is not None and int_to_uint is True
  • a test for int_to_uint

@martinschorb
Copy link
Contributor Author

martinschorb commented Apr 12, 2022

OK. Runs now. Now to the tests...

  • a check that makes sure that the dtype argument (which is available through self.dtype) is compatible with the resulting unsigned dtype if self.dtype is not None and int_to_uint is True

As it is now, the dtype argument is simply ignored when int_to_uint is requested (dtype gets overwritten by the unsigned data dtype).

Do you like a warning issued if both arguments are given?

@martinschorb
Copy link
Contributor Author

If we assume that the dtype provided by elf.io.open_file is a numpy.dtype The check for this being signed int should be sufficent.

@constantinpape
Copy link
Owner

If we assume that the dtype provided by elf.io.open_file is a numpy.dtype The check for this being signed int should be sufficent.

No this is more complex than you think (and also than I initially thought). See line 119. If dtype is passed as an argument none of this will make sense. Please add a check in the beginning that raises an error if self.int_to_uint is True and self.dtype is not None

@martinschorb
Copy link
Contributor Author

Doesn't in the end https://github.com/martinschorb/cluster_tools/blob/c05642dba9ec1ebc7b534c7edb03034ba8917eda/cluster_tools/copy_volume/copy_volume.py#L125 take over everthing before, and hence only ds.dtype which is coming from elf is relevant for being checked?

@constantinpape
Copy link
Owner

Doesn't in the end https://github.com/martinschorb/cluster_tools/blob/c05642dba9ec1ebc7b534c7edb03034ba8917eda/cluster_tools/copy_volume/copy_volume.py#L125 take over everthing before, and hence only ds.dtype which is coming from elf is relevant for being checked?

I overlooked this, but what you did there was not quite correct. You need to use dtype instead of ds.dtype there (I updated this already). The reason is that there are no guarantees that the dtype will be string represented as e.g.int32, it could also be <i4. elf doesn't implement any special checks for this it just returns the .dtype of the underlying storage class, e.g. the h5py dataset, zarr array or z5py dataset. That's the whole reason for the dtype mapping happening here (which could even be more complete and there is hopefully a better way to do this I could adopt eventually, but that's out of scope now).

@constantinpape
Copy link
Owner

I went ahead and fixed the remaining issues (there were a couple. I can just encourage you again to use a linter: it helps tremendously to write correct python code because you will find many issues without needing to actually run the code) and added the test.

Also, I think that the code you used to translate between dtypes was wrong:
It should be

(data-np.iinfo(data.dtype).min).astype(dtype)

instead of

(data-np.iinfo(data.dtype).min-1).astype(dtype)

see the screenshots below:

int-to-uint-incorrect

int-to-uint-correct

@constantinpape
Copy link
Owner

I ll go ahead and merge this.

@constantinpape constantinpape merged commit d12ce64 into constantinpape:master Apr 12, 2022
@martinschorb
Copy link
Contributor Author

Yes, for some reason i assumed that the minimum of int8 was -127, hence I added the offset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants