Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an efficient way to break/span/findIndex from the right end? #172

Closed
leftaroundabout opened this issue Jun 5, 2017 · 11 comments · Fixed by #476
Closed

Is there an efficient way to break/span/findIndex from the right end? #172

leftaroundabout opened this issue Jun 5, 2017 · 11 comments · Fixed by #476

Comments

@leftaroundabout
Copy link
Contributor

Unlike with lists, for arrays these operations make quite as much sense from both ends of the data structure, yet vector only offers them from the left, i.e. searching over ascending indices.

Is it possible to write an efficient high-level implementation of the descending version, in terms of the already existing functions? Else I would like to request a right-starting version at least of findIndex, which would readily allow defining the others too.


By efficient I mean, comparable to a C implementation that runs an interrupted loop over a descending array-index. It should definitely be only O (n ­– k), not O (n) or O (k) where n is the number of elements and k the rightmost matching index.

@leftaroundabout
Copy link
Contributor Author

It definitely is possible with foldl:

findIndexR :: (a->Bool) -> V.Vector a -> Maybe Int
findIndexR pred v = (V.length v-)
  <$> foldl (\a x -> if pred x
                      then Just 1
                      else succ<$>a) Nothing v

but I reckon that's a lot slower than the Bundle based left-starting version.

@cartazio
Copy link
Contributor

cartazio commented Jun 5, 2017

if you look at the underlying implementation in https://github.com/haskell/vector/blob/master/Data/Vector/Fusion/Stream/Monadic.hs#L852

namely

-- | Yield 'Just' the index of the first element that satisfies the monadic
-- predicate or 'Nothing' if no such element exists.
findIndexM :: Monad m => (a -> m Bool) -> Stream m a -> m (Maybe Int)
{-# INLINE_FUSED findIndexM #-}
findIndexM f (Stream step t) = findIndex_loop SPEC t 0
  where
    findIndex_loop !_ s i
      = do
          r <- step s
          case r of
            Yield x s' -> do
                            b <- f x
                            if b then return $ Just i
                                 else findIndex_loop SPEC s' (i+1)
            Skip    s' -> findIndex_loop SPEC s' i
            Done       -> return Nothing

you'll see that the from the left version is tied deeply to the stream fusion representation.

a search from the right will not play well with the fusion framework as such, or at least i have trouble seeing how it would work this evening. perhaps @dolio has thoughts on this.

point being: its a valid combinator to want, but its also quite easy to quickly write in user space... and theres definitely the subtle issue of making sure it'd play nicely with the optimization / fusion expectations users have for vector.

basically: its a good idea in principal, but how do we make sure fusion fires nicely, or can it even with the current fusion framework?

@leftaroundabout
Copy link
Contributor Author

leftaroundabout commented Jun 6, 2017

Ok, once we're in the stream representation there's probably no way to get right-access efficiently. But what if we start out with a right-to-left stream? streamR seems to do just that. So, how about simply

findIndexR :: Vector v a => (a -> Bool) -> v a -> Maybe Int
findIndexR f v = fmap (length v-) . Bundle.findIndex f $ streamR v

Any caveats?

Edit the off-by-1 error pointed out: it should be fmap (length v - 1 -).

@dolio
Copy link
Contributor

dolio commented Jun 8, 2017

Three caveats.

  1. I suspect streamR causes some materialization in general. Certainly I can't think of a way to have a framework based on streams that has nice behavior for combinations of things involving streaming from both the left and right. That doesn't really matter in this case, though, because:

  2. v is used non-linearly, so you probably don't want this to fuse anyhow, because you'd be doing the work to compute v twice.

  3. It seems like there's probably an off-by-1 error. :)

leftaroundabout added a commit to leftaroundabout/vector that referenced this issue Jun 10, 2017
leftaroundabout added a commit to leftaroundabout/vector that referenced this issue Sep 26, 2017
leftaroundabout added a commit to leftaroundabout/vector that referenced this issue Jul 19, 2020
lehins pushed a commit that referenced this issue Jan 16, 2021
@toyboot4e
Copy link
Contributor

Hi 👋 Is the findIndexR implementation efficient enough as desired? Also, findIndexR is only in Data.Vector.Generic and not in for example Data.Vector.Unboxed. Is it intentional? Thank you.

@leftaroundabout
Copy link
Contributor Author

@toyboot4e the benchmark indicates that it is efficient enough.

I certainly didn't intentionally withhold this from e.g. Data.Vector.Unboxed. That would be only a matter of defining and exporting a more type-specific version, perhapt you want to make a PR for this.

@lehins
Copy link
Contributor

lehins commented Sep 18, 2023

I don't think it is possible to implement those operations without materializing the vector. In other words they do break fusion. That being said, I think it would be totally fine for those operations not be integrated into the fusion framework.

So, if someone would like to submit a PR with those operations from the right, I'd be happy to add them to the API. Naturally, documentation would have to clearly indicate that stream fusion is not supported for those operations.

@leftaroundabout I've just looked at your implementation in #174
https://github.com/leftaroundabout/vector/blob/9de90486603193cadd7048b34f3cac55d01d4f42/Data/Vector/Generic.hs#L1523

and it just the same breaks fusion. In general, whenever a vector that is supplied as an argument to the function is used twice within the function that vector will be materialized. Moreover manual iteration will be a little bit faster, it is just the implementation in the benchmark was a little to lazy that caused it to be slower, see this PR: #468

None of the functions in this PR can be implemented without also looking at the full length of the vector, therefore we can't have fusion. Of course, I might be wrong and maybe there is some way to tap into the fact that we need to reverse the vector and streamR already has the knowledge of the full length of the vector, but I am pretty skeptical about it and I don't think it is worth the trouble.

@Shimuuar
Copy link
Contributor

In fact we do have fusion support for operation that iterate vector from right. With rules like:

"streamR/unstreamR [Vector]" forall s.
  streamR (new (New.unstreamR s)) = s
...

However possible producers are rather limited. Only variants of scanr (pre/post/etc)

@lehins
Copy link
Contributor

lehins commented Sep 18, 2023

If anyone has a desire to tap into this fusion from the right for those functions, by all means feel free to do so 😁

I just don't think we should make this a requirement, at least for the initial implementation. I can certainly see how these functions can be useful.

@cartazio
Copy link
Contributor

yeah, agreed, its a useful operation to support, but even with a stream rep, will not fuse terribly often

@toyboot4e
Copy link
Contributor

I'm happy to see enthusiastic, kind vector devs!

I've submitted a findIndexR variants PR. I hope it's not terribly wrong 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants