Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing read_* methods #584

Closed
zerothi opened this issue Jun 15, 2023 · 15 comments · Fixed by #586
Closed

Slicing read_* methods #584

zerothi opened this issue Jun 15, 2023 · 15 comments · Fixed by #586

Comments

@zerothi
Copy link
Owner

zerothi commented Jun 15, 2023

          In my opinion, `read_*` routines -where `*` is a word in singular- should not worry about skipping and they should just focus on reading the next thing available. Then we could get rid of all these extra arguments, which will be very hard to document in a way that doesn't confuse users, specially the different defaults for different methods/siles.

It seems clean and clear enough for me to specify the iterator externally (slicing, multi, slice,apply...), just as it is done with the BZ methods for example. If some method is very common to use with all=True, there could be an extra method with the plural word, i.e. read_energy -> read_energies as an alternative for read_energy[:].

Originally posted by @pfebrer in #578 (comment)

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

          I have a somewhat working solution, see in the `sile-slicer` branch, a test is here:
open("/tmp/test.file", 'w').write("""1

C   0.00000000  0.00000000  0.00000000
2

C   0.00000000  0.00000000  0.00000000
C   1.000000  0.00000000  0.00000000
3

C   0.00000000  0.00000000  0.00000000
C   1.000000  0.00000000  0.00000000
C   2.00000  0.00000000  0.00000000
""")

from sisl.io import xyzSile

sile = xyzSile("/tmp/test.file")
print(sile.read_geometry(), help(sile.read_geometry))
print(sile.read_geometry[0](), help(sile.read_geometry[0]))
print(sile.read_geometry[1]())
print(sile.read_geometry[1:]())

It isn't complete, but just to get an idea?
I quickly ran into problems when decoration the function, since that happens at class-creation, the function won't bind self and thus they won't be passed, hence the __get__ work-around... A bit ugly IMHO.
Other suggestions on the implementation side?

Originally posted by @zerothi in #578 (comment)

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

@tfrederiksen @pfebrer lets continue the discussion here.

Please see the updated branch 584-sile-slicer, it contains more functionality and patches the documentation.

@pfebrer did the changes correspond to your remarks?

@pfebrer

This comment was marked as outdated.

@zerothi

This comment was marked as outdated.

@pfebrer

This comment was marked as resolved.

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

Ok, it is there now! :) I think this should be fully functional! :)

Perhaps we should discuss how it should work. :)

  • slices will take care of everything,
  • should we allow double slicing? Currently this is not allowed
  • do we need performance stuff that omits certain aspects? I.e. quick-skips for non-returned indices?
  • does this cover all aspects?
  • should we have a .m method? And what should it do?
  • should we allow a dict in the slicing, e.g. read_geometry[<slice>, {...}] for additional functionality?

@pfebrer
Copy link
Contributor

pfebrer commented Jun 15, 2023

I was thinking that in the future you could have something like sanitize_selection and everything that is passed into __getitem__ should go through this sanitzer. Just like it is done for the atoms. For now I think it's ok if we just support slices, single integers or arrays of integers, mimicking the behavior of @tfrederiksen's branch. Then we can always add new functionality if it makes sense.

should we allow double slicing? Currently this is not allowed

I believe for now it's better to keep it simple and don't allow double slicing.

do we need performance stuff that omits certain aspects? I.e. quick-skips for non-returned indices?

If there are no negative indices (or you know the length of the data), I think it would be nice to stop reading when you reach the last requested item.

should we have a .m method? And what should it do?

I don't know what's the best name, but I think there should be a method, for if you need to pass more arguments than just the selection of indices. I can't tell right now what would I use this method for, but it's probably a good idea to have it. Maybe we can include it later when it is more clear what would it be useful for.

should we allow a dict in the slicing, e.g. read_geometry[, {...}] for additional functionality?

If the dict contains arguments for the function, I think it's better not to do it because it feels very hacky. Instead, one should go through the method discussed above, since it feels much more natural to pass arguments to a function.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 15, 2023

Oh by the way, I think it would be cool if the SileSlicer would have an __iter__ method that it's just like the call but yields the items read one by one. Although this sounds simple only if the indices are sorted from small to big.

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

I was thinking that in the future you could have something like sanitize_selection and everything that is passed into __getitem__ should go through this sanitzer. Just like it is done for the atoms. For now I think it's ok if we just support slices, single integers or arrays of integers, mimicking the behavior of @tfrederiksen's branch. Then we can always add new functionality if it makes sense.

Agreed!

should we allow double slicing? Currently this is not allowed

I believe for now it's better to keep it simple and don't allow double slicing.

Agreed, I really also think this is a corner case, but it could be useful for passing around objects that reads geometries

do we need performance stuff that omits certain aspects? I.e. quick-skips for non-returned indices?

If there are no negative indices (or you know the length of the data), I think it would be nice to stop reading when you reach the last requested item.

Agreed, lets try this as well.

should we have a .m method? And what should it do?

I don't know what's the best name, but I think there should be a method, for if you need to pass more arguments than just the selection of indices. I can't tell right now what would I use this method for, but it's probably a good idea to have it. Maybe we can include it later when it is more clear what would it be useful for.

Ok, we'll wait

should we allow a dict in the slicing, e.g. read_geometry[, {...}] for additional functionality?

If the dict contains arguments for the function, I think it's better not to do it because it feels very hacky. Instead, one should go through the method discussed above, since it feels much more natural to pass arguments to a function.

Agreed.

Oh by the way, I think it would be cool if the SileSlicer would have an __iter__ method that it's just like the call but yields the items read one by one. Although this sounds simple only if the indices are sorted from small to big.

Hmm. That might be a bit complicated. I think the interface would result in something like this:

for read in read_geometry[:10]:
    geom = read()

or similar? would this be expected?

@pfebrer
Copy link
Contributor

pfebrer commented Jun 15, 2023

For the iterator, I was thinking that it could already call the reader and yield the output. Although I noticed the problem that then you can't pass extra arguments. Is that why you suggested the interface? Perhaps the method .m could be useful for that.

Although now that you wrote it like that, I kind of like the functionality too. Hmm now I don' know what do I prefer 😅

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

Yeah, my problem was the arguments, hence that solution. Well, to my taste I don't particularly see the big need, doing:

for geom in read_geometry[:10]():
...

looks fine to me, albeit there may be some initial overhead.

@pfebrer
Copy link
Contributor

pfebrer commented Jun 15, 2023

Yes, but that is not the same, because it will read all the geometries. For example you can't break to stop the reading early. And you need to have enough memory to store all the geometries.

Another application of taking geometries on the fly could be to generate animations of trajectories.

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

Agreed. But the interface looks a bit complicated to me. Lets first start with the other things, and we can continue down the rabbit hole later ;)

@zerothi
Copy link
Owner Author

zerothi commented Jun 15, 2023

@tfrederiksen would you be ok with abandoning #578 and continue down this path outlined here?

I do believe that this might be more versatile in the long run, and a bit simpler for end-users, no weird arguments, just slices.
If that's agreed, I will work on moving all sile_read_multiple to this approach.

@tfrederiksen
Copy link
Contributor

Sorry, I didn't find time to study your proposal today. But from what I've seen in your exchange it looks good to me too.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants