Is there an efficient way to `break`/`span`/`findIndex` from the right end? #172

leftaroundabout · 2017-06-05T00:47:59Z

Unlike with lists, for arrays these operations make quite as much sense from both ends of the data structure, yet vector only offers them from the left, i.e. searching over ascending indices.

Is it possible to write an efficient^† high-level implementation of the descending version, in terms of the already existing functions? Else I would like to request a right-starting version at least of findIndex, which would readily allow defining the others too.

^†_{By efficient I mean, comparable to a C implementation that runs an interrupted loop over a descending array-index. It should definitely be only O (n – k), not O (n) or O (k) where n is the number of elements and k the rightmost matching index.}

The text was updated successfully, but these errors were encountered:

leftaroundabout · 2017-06-05T01:12:14Z

It definitely is possible with foldl:

findIndexR :: (a->Bool) -> V.Vector a -> Maybe Int
findIndexR pred v = (V.length v-)
  <$> foldl (\a x -> if pred x
                      then Just 1
                      else succ<$>a) Nothing v

but I reckon that's a lot slower than the Bundle based left-starting version.

cartazio · 2017-06-05T02:01:02Z

if you look at the underlying implementation in https://github.com/haskell/vector/blob/master/Data/Vector/Fusion/Stream/Monadic.hs#L852

namely

-- | Yield 'Just' the index of the first element that satisfies the monadic
-- predicate or 'Nothing' if no such element exists.
findIndexM :: Monad m => (a -> m Bool) -> Stream m a -> m (Maybe Int)
{-# INLINE_FUSED findIndexM #-}
findIndexM f (Stream step t) = findIndex_loop SPEC t 0
  where
    findIndex_loop !_ s i
      = do
          r <- step s
          case r of
            Yield x s' -> do
                            b <- f x
                            if b then return $ Just i
                                 else findIndex_loop SPEC s' (i+1)
            Skip    s' -> findIndex_loop SPEC s' i
            Done       -> return Nothing

you'll see that the from the left version is tied deeply to the stream fusion representation.

a search from the right will not play well with the fusion framework as such, or at least i have trouble seeing how it would work this evening. perhaps @dolio has thoughts on this.

point being: its a valid combinator to want, but its also quite easy to quickly write in user space... and theres definitely the subtle issue of making sure it'd play nicely with the optimization / fusion expectations users have for vector.

basically: its a good idea in principal, but how do we make sure fusion fires nicely, or can it even with the current fusion framework?

leftaroundabout · 2017-06-06T08:50:16Z

Ok, once we're in the stream representation there's probably no way to get right-access efficiently. But what if we start out with a right-to-left stream? streamR seems to do just that. So, how about simply

findIndexR :: Vector v a => (a -> Bool) -> v a -> Maybe Int
findIndexR f v = fmap (length v-) . Bundle.findIndex f $ streamR v

Any caveats?

Edit the off-by-1 error pointed out: it should be fmap (length v - 1 -).

dolio · 2017-06-08T00:40:55Z

Three caveats.

I suspect streamR causes some materialization in general. Certainly I can't think of a way to have a framework based on streams that has nice behavior for combinations of things involving streaming from both the left and right. That doesn't really matter in this case, though, because:
v is used non-linearly, so you probably don't want this to fuse anyhow, because you'd be doing the work to compute v twice.
It seems like there's probably an off-by-1 error. :)

Discussed in haskell#172

Discussed in #172

toyboot4e · 2023-09-17T01:05:43Z

Hi 👋 Is the findIndexR implementation efficient enough as desired? Also, findIndexR is only in Data.Vector.Generic and not in for example Data.Vector.Unboxed. Is it intentional? Thank you.

leftaroundabout · 2023-09-18T16:23:52Z

@toyboot4e the benchmark indicates that it is efficient enough.

I certainly didn't intentionally withhold this from e.g. Data.Vector.Unboxed. That would be only a matter of defining and exporting a more type-specific version, perhapt you want to make a PR for this.

lehins · 2023-09-18T19:02:05Z

I don't think it is possible to implement those operations without materializing the vector. In other words they do break fusion. That being said, I think it would be totally fine for those operations not be integrated into the fusion framework.

So, if someone would like to submit a PR with those operations from the right, I'd be happy to add them to the API. Naturally, documentation would have to clearly indicate that stream fusion is not supported for those operations.

@leftaroundabout I've just looked at your implementation in #174
https://github.com/leftaroundabout/vector/blob/9de90486603193cadd7048b34f3cac55d01d4f42/Data/Vector/Generic.hs#L1523

and it just the same breaks fusion. In general, whenever a vector that is supplied as an argument to the function is used twice within the function that vector will be materialized. Moreover manual iteration will be a little bit faster, it is just the implementation in the benchmark was a little to lazy that caused it to be slower, see this PR: #468

None of the functions in this PR can be implemented without also looking at the full length of the vector, therefore we can't have fusion. Of course, I might be wrong and maybe there is some way to tap into the fact that we need to reverse the vector and streamR already has the knowledge of the full length of the vector, but I am pretty skeptical about it and I don't think it is worth the trouble.

Shimuuar · 2023-09-18T19:21:26Z

In fact we do have fusion support for operation that iterate vector from right. With rules like:

"streamR/unstreamR [Vector]" forall s.
  streamR (new (New.unstreamR s)) = s
...

However possible producers are rather limited. Only variants of scanr (pre/post/etc)

lehins · 2023-09-18T19:31:17Z

If anyone has a desire to tap into this fusion from the right for those functions, by all means feel free to do so 😁

I just don't think we should make this a requirement, at least for the initial implementation. I can certainly see how these functions can be useful.

cartazio · 2023-09-18T19:55:08Z

yeah, agreed, its a useful operation to support, but even with a stream rep, will not fuse terribly often

toyboot4e · 2023-09-19T11:29:21Z

I'm happy to see enthusiastic, kind vector devs!

I've submitted a findIndexR variants PR. I hope it's not terribly wrong 🙏

leftaroundabout added a commit to leftaroundabout/vector that referenced this issue Jun 10, 2017

A version of findIndex for the last matching element.

4a5ae11

Discussed in haskell#172

leftaroundabout mentioned this issue Jun 10, 2017

A version of findIndex for the last matching element. #174

Merged

leftaroundabout added a commit to leftaroundabout/vector that referenced this issue Sep 26, 2017

A version of findIndex for the last matching element.

be4961b

Discussed in haskell#172

leftaroundabout added a commit to leftaroundabout/vector that referenced this issue Jul 19, 2020

A version of findIndex for the last matching element.

ddfe934

Discussed in haskell#172

lehins pushed a commit that referenced this issue Jan 16, 2021

A version of findIndex for the last matching element.

5698270

Discussed in #172

toyboot4e mentioned this issue Sep 19, 2023

Export findIndexR from the non-generic vector modules #469

Merged

Shimuuar mentioned this issue Jan 16, 2024

Implement spanR & breakR #476

Merged

Shimuuar closed this as completed in #476 Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an efficient way to `break`/`span`/`findIndex` from the right end? #172

Is there an efficient way to `break`/`span`/`findIndex` from the right end? #172

leftaroundabout commented Jun 5, 2017

leftaroundabout commented Jun 5, 2017

cartazio commented Jun 5, 2017

leftaroundabout commented Jun 6, 2017 •

edited

Loading

dolio commented Jun 8, 2017

toyboot4e commented Sep 17, 2023

leftaroundabout commented Sep 18, 2023

lehins commented Sep 18, 2023

Shimuuar commented Sep 18, 2023

lehins commented Sep 18, 2023

cartazio commented Sep 18, 2023

toyboot4e commented Sep 19, 2023

Is there an efficient way to break/span/findIndex from the right end? #172

Is there an efficient way to break/span/findIndex from the right end? #172

Comments

leftaroundabout commented Jun 5, 2017

leftaroundabout commented Jun 5, 2017

cartazio commented Jun 5, 2017

leftaroundabout commented Jun 6, 2017 • edited Loading

dolio commented Jun 8, 2017

toyboot4e commented Sep 17, 2023

leftaroundabout commented Sep 18, 2023

lehins commented Sep 18, 2023

Shimuuar commented Sep 18, 2023

lehins commented Sep 18, 2023

cartazio commented Sep 18, 2023

toyboot4e commented Sep 19, 2023

Is there an efficient way to `break`/`span`/`findIndex` from the right end? #172

Is there an efficient way to `break`/`span`/`findIndex` from the right end? #172

leftaroundabout commented Jun 6, 2017 •

edited

Loading