Add support for padding with sentinel value for roll* #24

bkamins · 2022-10-05T07:17:22Z

Example:

julia> rollmean(1:5, 3)
3-element Vector{Float64}:
 2.0
 3.0
 4.0

julia> runmean(1:5, 3)
5-element Vector{Float64}:
 1.0
 1.5
 2.0
 3.0
 4.0

julia> [fill(missing, 2); rollmean(1:5, 3)]
5-element Vector{Union{Missing, Float64}}:
  missing
  missing
 2.0
 3.0
 4.0

It would be nice to add a kwarg to roll* to allow such padding, e.g. by writing rollmean(1:5, 3; pad=missing).

The text was updated successfully, but these errors were encountered:

JeffreySarnoff · 2022-10-05T08:29:10Z

What would the sentinel value do, simply provide the value with which to pad?
Is this different from the idea put forth on Discourse, or is it a reminder (which would be fine)?
If it is different, please recap what the other functionality is -- how it ought behave, Thanks.

bkamins · 2022-10-05T08:32:12Z

This is a new idea. The point is that e.g. rollmean shortens a vector. While runmean keeps the vector length, but shortens the window on initial values. Often one needs rollmean (i.e. no window shortening), but also to keep the original length of the vector (e.g. if you store it in a data frame that has a fixed number of rows). In such cases you need to add some sentinel (usually missing to make the vector returned by rollmean have the original length)

JeffreySarnoff · 2022-10-05T08:44:37Z

I can see how missing may want to occupy the low indices (in the past) or may want to occupy the high indices (current data, perhaps unresolvable with fewer observations than the window length), does this filler ever have reason to be within the data sequence rather than at the start or at the end? Does this filler ever have reason to be interspersed with the observations -- if so, to what end?

bkamins · 2022-10-05T08:58:54Z

This is how pandas does it in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html:

(I did not request for this full functionality, but something simpler, but if you want to investigate it then this is I think a standard reference)

In short:

pandas allows to specify when values are filled with np.nan (I proposed to be more flexible and allow any filler) this decision is set in min_periods parameter (I proposed not to add it rollingmean assumes that min_periods is just window size), but we could add it if you wanted
pandas puts filler at the beginning or evenly in beginning and at the end; this is handled by center argument (we could have three values - fill at beginning, fill at end, fill evenly on both sides; I do not see a reason to have a filler in the middle)

JeffreySarnoff · 2022-10-05T09:01:11Z

Introducing missing suggests that each application of a rolled function must first check ismissing or dispatch differently on encountering x::Missing. I could keep a displacement count instead, that would have no slow-down. If the displacement were signed to indicate from which place to offset, the early indices or the recent indices, then perhaps your request would be met without disrupting processing. For situations where there may be missing entries within the body of the data -- no events to observe on a weekend, but all dates are tracked -- a different approach is required. Maybe a choice of LOCF or avg of most recent 2/3/? observations? Any thoughts.

JeffreySarnoff · 2022-10-05T09:05:01Z

I prefer keeping advances to that which can be stated simply, and having specific requests gel together.
Nonetheless, there will be resituating the underlying computational-ness required. So, best to have a clear view of the essentials.

Is the discourse item replaced by this refinement, or is there a distinction I may have forgotten?

bkamins · 2022-10-05T09:09:08Z

So my original proposal was an essential. Currently user needs to do padding manually:

julia> [fill(missing, 2); rollmean(1:5, 3)] # fill at front
5-element Vector{Union{Missing, Float64}}:
  missing
  missing
 2.0
 3.0
 4.0

or

julia> [missing; rollmean(1:5, 3); missing] # centered
5-element Vector{Union{Missing, Float64}}:
  missing
  missing
 2.0
 3.0
 4.0

This is quite inconvenient, especially for new users of Julia.

What I would think makes sense:

keep the "default" implementation as is and make it as fast as possible
accept that if user passes some "padding"-related keyword argument then things can be slower, but should do what user needs

This is not a restatement of https://discourse.julialang.org/t/rollingfunctions-with-a-variable-window-width/87601.
In that thread OP wanted to be able to specify different window width for each observation. So this was a different requirement (this issue is a requirement for padding not for variable width).

JeffreySarnoff · 2022-10-05T09:15:09Z

I don't know how to read "specify a different window for each observation." Windows span multiple observations by definition. Does this mean at each observation use however many earlier observations to determine the current function value and that however many may be unique to each observation [index]? If so, "no". Your request above is much more widely useful.

JeffreySarnoff · 2022-10-05T09:33:05Z

Is there a reason to actually pad rather than know where to begin/end? The data sequence belongs to the client. I would be more comfortable providing a very easy, perhaps nearly invisible way to copy with padding inserted rather than grab the information and alter its e.g. type. A keyword or two would allow this. Another function that actually modifies the source data sequence could be exported for advanced users.

roll(fn, window, dataseq; pad = :first, skip = 20)
roll(fn, window, dataseq; padding = (from = :first, count = 20))

then currying or structifying (fn, window) is possible

bkamins · 2022-10-05T09:39:54Z

Does this mean at each observation use however many earlier observations to determine the current function value and that however many may be unique to each observation [index]?

This is what OP on Discourse wanted.

bkamins · 2022-10-05T09:44:21Z

Is there a reason to actually pad rather than know where to begin/end?

This is also OK, but would require an additional step later. The point is often users want to do something like:

julia> df = DataFrame(a=1:5)
5×1 DataFrame
 Row │ a
     │ Int64
─────┼───────
   1 │     1
   2 │     2
   3 │     3
   4 │     4
   5 │     5

julia> df.m = rollmean(df.a, 2)
ERROR: ArgumentError: New columns must have the same length as old columns

and now, as you see, they get an error.

They could do:

julia> df.m = runmean(df.a, 2)
5-element Vector{Float64}:
 1.0
 1.5
 2.5
 3.5
 4.5

julia> df
5×2 DataFrame
 Row │ a      m
     │ Int64  Float64
─────┼────────────────
   1 │     1      1.0
   2 │     2      1.5
   3 │     3      2.5
   4 │     4      3.5
   5 │     5      4.5

but often they want the first observation that does not have a full window size to be some sentinel (like missing) rather than applied function just with a shorter window.

JeffreySarnoff · 2022-10-05T09:45:19Z

ahh. ic

kpa28-git · 2023-01-09T00:44:10Z

@bkamins
I'm developing a fork of RollingFunctions that has padded rolling function mappadroll: https://github.com/kevindirect/MapSlide.jl
It's a WIP, but it does work.

bkamins · 2023-01-09T08:30:41Z

Thank you for this work.

@JeffreySarnoff - what is the status of your efforts towards 1.0 release of RollingFunctions.jl?

JeffreySarnoff · 2023-01-10T16:52:02Z

reasonably good -- external constraints put the prerelease at Saturday Jan 21 and the release (or release candidate) at Sunday Feb 5th.

bkamins · 2023-01-18T11:30:54Z

@JeffreySarnoff - I saw a mention in this issue, but now I do not see it, so I am not sure what it was exactly. However, let me re-state the original request. When using rolling functions to get the result of the same length as input vector padded with some sentinel (missing is a reasonable default).

JeffreySarnoff · 2023-01-18T11:35:34Z

Got that.. What is bugging me is that I cannot process your views without including all of data frames. Would you consider a future release that makes separate a least necessary package `DataFramesCore.jl` that would do just enough so I could use it and define `expansive_view = Union{AbstractArray, AbstractDataFrame}`?

…

On Wed, Jan 18, 2023, 6:31 AM Bogumił Kamiński ***@***.***> wrote: @JeffreySarnoff <https://github.com/JeffreySarnoff> - I saw a mention in this issue, but now I do not see it, so I am not sure what it was exactly. However, let me re-state the original request. When using rolling functions to get the result of the same length as input vector padded with some sentinel (missing is a reasonable default). — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAM2VRXS7B4ODKSAQGI3HBLWS7H7TANCNFSM6AAAAAAQ5IWG7M> . You are receiving this because you were mentioned.Message ID: ***@***.***>

bkamins · 2023-01-18T13:22:39Z

What is bugging me is that I cannot process your views without including all of data frames.

Can you please explain the problem? I would assume that RollingFunctions.jl does not need to be aware of DataFrames.jl at all. It takes a vector as an input and should produce vector in output.

JeffreySarnoff · 2023-01-18T14:03:05Z

That is true. People have requested that it work with DataFrame column views, and just for general symmetry -- that we have two well developed sorts of view and there is no common superview type nor a good way to fake it with a Union is less than ideal. I will forget about it though.

bkamins · 2023-01-18T14:04:41Z

Indeed you can get a view as an input to your rolling function. But it should not matter I thought. I assumed that I would have a signature that restricts input type to AbstractVector without caring if it is a view or not.

JeffreySarnoff · 2023-01-18T14:08:31Z

Yes, I use AbstractVector{T} and also AbstractMatrix{T} (for running some function over multiple columns at one go).

bkamins · 2023-01-18T14:11:02Z

And why this is not enough?

JeffreySarnoff · 2023-01-18T14:12:29Z

these are vector examples (fyi)

padding examples start with examples that do not pad https://github.com/JeffreySarnoff/RollingFunctions.jl/blob/v1/docs/src/intro/rolling.md
and continue with examples that pad https://github.com/JeffreySarnoff/RollingFunctions.jl/blob/v1/docs/src/intro/padding.md

JeffreySarnoff · 2023-01-18T14:15:44Z

it is -- really it all goes well.
I have been musing about some shallow type hierarchies for Julia v2.

bkamins · 2023-01-18T14:17:28Z

Thank you, https://github.com/JeffreySarnoff/RollingFunctions.jl/blob/v1/docs/src/intro/padding.md seems exactly what is needed (in other frameworks there is also an option to pad in the middle so that half of padding is in front and half of padding is at end).

Conceptually this means that if we move the window is the observation to which we attach the result in:

the beginning of the window (pad last)
in the center of the window (pad both ends equally)
in the end of the window (pad at beggining)

Padding to the center can be added later if it is a problem (but it would be good to keep in mind that people sometimes want to say that Tuesday is a mean of Monday, Tuesday and Wednesday).

Regarding the types challenge - I discussed the issue with Jeff Bezanson yesterday - and it is a tough thing.

JeffreySarnoff · 2023-01-18T14:19:18Z

with padding in the middle and an odd number of paddings, which end gets more or is that another settable choice?

JeffreySarnoff · 2023-01-18T14:21:12Z

I had considered letting people specify n1 pads at the start and n2 pads at the end -- that seemed an opportunity for people to get it wrong.

bkamins · 2023-01-18T14:22:59Z

which end gets more or is that another settable choice?

😄. We seem to think alike. I was copying the functionality that people asked for, but when I did this I started thinking about exactly the same problem. Maybe a more general solution would be to add a numeric padding_offset (tentative name). Where when it is set to 0 (the default and minimum allowed size) we get padding first, and if you set it to window_size-1 (the maximum allowed size) you get padding last.

(now I see it is similar to what you said with n1 and n2, but I think it is clearer to use window_size and offset).

JeffreySarnoff · 2023-01-23T20:24:38Z

At the next-to-last minute, I decided to keep compatibility with current code.
Realigning to that -- still expect to unveil on Sunday Feb 5th.

JeffreySarnoff · 2023-02-12T04:46:41Z

Here be something (unregistered while I heal).

using Pkg
Pkg.add(url="https://github.com/JeffreySarnoff/RollingFunctions.jl",rev="main")

The entire edifice has been reworked.
You can pad at the start or at the end with almost any value.
You can copy padding values from a vector, in order.
This helps with connecting one roll to another.
This keeps the result clean when a function is not well defined until it has more than 1, 2, .. values.

The current docs reflect the tests and other working details in need of tests.

IncrementalAccumulators exists to be incorporated, and will replace the older implementations (rollmean, ...).

this would add substantive performance

I have not been able to figure out how to get @turbo to work with this. Experiments show @turbo would improve the performance of many internal functions by 2x-3x. In the example below, data is a view, and using @views on the result line adds some throughput to the more direct internal functions.

lo = 1
hi = window_span
@turbo for idx in 1:n
  result[idx] = window_fn(data[lo:hi])
  lo += 1
  hi += 1
end

fails with

ERROR: MethodError: no method matching 
    _gesp(::LayoutPointers.StridedPointer{Float64, 1, …}, 
              ::Static.StaticInt{1}, ::UnitRange{Int64}, ::Static.StaticInt{1})

bkamins · 2023-02-12T09:14:02Z

Thank you for working on this. I have checked the code and have the following comments:

Sentinel value in padding

You use Nothing as sentinel. It would be better to define an internal:

struct NoPadding end

structure and set NoPadding() as a default (and do not export NoPadding).

The reason is that Nothing is a valid value in Base Julia. In a case when someone wanted to pad with it it will not work as expected (I agree it is a corner case, but I usually prefer a clean design):

julia> rolling(sum, [1, 2, 3], 2, padding=Missing)
3-element Vector{Union{Int64, DataType}}:
  Missing
 3
 5

julia> rolling(sum, [1, 2, 3], 2, padding=Nothing) # unexpected
2-element Vector{Int64}:
 3
 5

In some cases `rolling` errors unexpectedly

Here is an example:

julia> rollmax([1, 2, 3], 2)
2-element Vector{Int64}:
 2
 3

julia> rolling(max, [1, 2, 3], 2)
ERROR: BoundsError: attempt to access 0-element Vector{Any} at index [1]

I would have expected these two operations to be equivalent (but I might be wrong).

padding is supported only for `rolling`

Example:

julia> rolling(mean, [1, 2, 3], 2, padding=missing)
3-element Vector{Union{Missing, Float64}}:
  missing
 1.5
 2.5

julia> rollmean([1, 2, 3], 2, padding=missing)
ERROR: MethodError: no method matching rollmean(::Vector{Int64}, ::Int64; padding::Missing)

I would expect padding and padlast work in both cases

In general functions lack docstrings

Example:

help?> rolling
search: rolling RollingFunctions rollmin rollmedian rollvariation rollkurtosis rollmad_normalized

  No documentation found.

  RollingFunctions.rolling is a Function.

I think it would be very useful to add them.

If you would like my help with any of these points please comment and I will gladly do it.

JeffreySarnoff · 2023-02-15T01:16:02Z

supplemental information attached (to be updated on occasion)

RollingFunctions.jl v1

notes for JeffreySarnoff, bkamins 2023-02-15

Looking Back
Looking Forward
- design goals
Technical Notes

Looking Back

tapering

When I researched rolling functions over windowed data initially (2017), there were packages and products that provided padding with a given value. I saw nothing about tapering to provide values for incomplete window spans. To the best of my knowledge (without claiming anything more) RunningFunctions.jl introduced efficient, well-structured, and functionally broad tapering to many. Now, this capability appears in many packages and in multiple languages.

genericity

The implementation followed a few stylized approaches, allowing reasonably concise and compact code to cover many sorts of windowed summarization.

omissions

most often noted

There was no internal support for padding.
There was no type-specific transfer of eltype(data) to eltype(result)
There were no function-specific accelerations
- incremental sum and other incrementable statistics

others of import

[add here]

Looking Forward

design goals

api consistancy

throughout
- style
- naming
- utilization
features
- organization
- syntax
- semantics

argument order

positional args
- datasequence must be positional to support dispatch properly.
- summarizer should be positional to support curried compilation.
- windowspan should be positional to support optimized library building.
compatible with v0.7

exportedfunction(summarizer, datasequence, windowspan)

preferred over v0.7 order

Many clients will use some preferred set of windowspans with some selected set of summarization functions, each specialized to the nature of the datasequence. This suggests that the datasequence appear after the windowspan and the summarizer args. These two orderings meet that constraint. Which is preferable in use with respect to currying and maybe some use of closures?

exportedfunction(summarizer, windowspan, datasequence)
exportedfunction(windowspan, summarizer, datasequence)

keyword args
The padding specifier could be a kwarg. As a positional arg with a default value, pads that not given and pads that are a given value and pads that are a vector all support distinct direct dispatch. The padding qualifier padlast::Bool=false seems best as a kwarg. Does that mean making padding=NoPadding a kwarg is better for consistancy? Or is the built-in dispatch as a presumably final positioned arg more beneficial?

performance

current best practice
- and better practices for situational advantage
SIMD almost everywhere
- If reasonable to do
  - @[t]turbo with function application over an indexed span
    - using views over the indexing
- Otherwise
  - Take advantage of SIMD.jl using an approach that is
    - robustly applicable
    - smooth / clear / natural
Multithreading wherever helpful

Technical Notes

taper

current approach
enhancements

pad

with a specified value
with an AbstractVector
with an NTuple?

pad then taper

with a specified value
with an AbstractVector
with an NTuple?

consistant relationships

- the unit is _tally of successive indices_

n = length(data[column])

s = window_span
u = window_prespan == s - 1

p = count of indices to pad (0 or initial indices in 1..u)
t = count of indices to taper (0 or u-p indices adjacent to the initial p)

u == p + t   ∧   p, t ∈ [0..u]

JeffreySarnoff · 2023-02-15T02:09:29Z

re: padding is supported only for rolling
This was intentional (not the lack of support) -- the prior versions of roll<op> (v0.7) are to be improved usiing implementation logic from IncrementalAccumulators. These should perform much better.

They all are of a consistent, flexible design and share a nice api (thank you).

JeffreySarnoff · 2023-02-16T03:56:15Z

The new, more capable api is call signature compatible with release v0.7.x -- that is, call signatures used with high level exports of v0.7,x (rolling and running) yield very nearly the same results when used with the new api as it is specified at present.

We could support greater generality with fewer lines of code by changing from distinguishing functions of arity1, 2, 3, and 4 by dispatch over 1, 2, 3, or 4 vectors to a Tuple form and capturing the number of args (as given by N, the number of vectors):

op( windowfunc:::Function, windowspan:::Integer, 
      data::Tuple{Vararg{<:AbstractVector,N}})  where {N}

We would code unary window functions explicitly: (a) providing well-grounded reference implementations code (b) providing evaluative optimizations and applicative speed-ups.

op( windowfunc:::Function, windowspan:::Integer, data::AbstractVector[T})  where {T}

And we could do the same with window functions of 2 args, as e.g. cov, cor are widely used
so some code optimization makes sense. While we may choose code 2 arg functions explicitly, that decision is not complexity driven. Using a generalized , shared signature implementation for 2+ arg functions may make better sense. Either way, we process the generalized input using looping with reductive computation in concert with gobbling observations to supply the args simultaneous-ish-ly.

Gaining performance (2x-3x in throughput generally, 5x-10x and in reappearing client situations) is available (sometimes not straightforward, sometimes requires additional dev work with third party packages). Work applying e.g. LoopVectorization, SIMD, ..? ?.. to our task is necessary.

JeffreySarnoff · 2023-02-16T06:17:00Z

@bkamins The previous note explains my remaining high-level API decision for v1.

(+) I prefer to support rolling/running a multi-arity functions that way.
(-) Clients now rolling cov or cor would need to wrap data as 2-tuples.

 - (old) `cor(window_span, datavec1, datavec2)
 - (new) `cor(window_span, (datavec1, datavec2))

What do see?

bkamins · 2023-02-16T08:00:04Z

I am not fully clear what the benefit of cor(window_span, (datavec1, datavec2)) over cor(window_span, datavec1, datavec2) is.

Two considerations:

With Tuple notation it seems not consistent if you should pass a single vector unwrapped or wrapped in a tuple like this (vector,).
Also you do not mention how you would want to pass weight vectors, which I think also should be thought of.
Finally, I think we should try to ensure that we can deprecate old syntax (but keep supporting it with depwarn + add a new syntax).

For example, looking at the signature of rolling I imagine that the way you could define new signature could be:

rolling(window_fn<:Function, data::AbstractVector...;
        window_span::Int, # kwarg, required
        weights::Union{AbstractVector, Nothing}=nothing, # kwarg, optional
        padding=NoPadding(): # NoPadding unexported sentinel type instance,
        padlast::Bool=false)

also with this approach the pre-1.0 release signatures could be kept as deprecated

JeffreySarnoff · 2023-02-16T10:03:23Z

Tuple wrapping the two data vectors provided for two-arg window_fns is less compelling than is tuple wrapping for 3..12 arg window_fns. Clients working with timestamped and time-aggregated (the reading at 5min intervals or the trimmed mean taken through 5min intervals) financial market data often obtain many field values for every stamped or marked time.

These 6 data items provide market information about the current trading day as it concludes: PriorClose, Open, High, Low, Close, Volume. Estimators, Indicators, and Accuracy/Loss Measures are developed through their Arithmetic and Algebraic interconnections. The more compelling use provides clients with a semantically consistent, syntactically uniform, generalized, and robust mechanism for working with the whole of it through any parts partitioning deemed worthwhile.

bkamins · 2023-02-16T10:43:51Z

Tuple wrapping the two data vectors provided for two-arg window_fns is less compelling than is tuple wrapping for 3..12 arg window_fns.

Can you please explain on an example what you mean exactly?

If we discuss convenience then I believe that not requiring to wrap values in a tuple is more convenient.
The reason is that if you have a tuple you can easily splat it. However, if you do not have a tuple and it is required then you would need to create it.

JeffreySarnoff · 2023-03-02T05:23:13Z

Why imo window_span is better as an early positional arg than as a kwarg:
window_fn and window_span in
rolling(window_fn<:Function, window_span::Int, args...; kwargs...)
are a natural subspecification pairing (roll or run some fn over some span for various data) it is convenient to simplify currying over the two when reusing an approach that has been developed for those specs (e.g. a 200-day exponential moving average).

There may be more than one data vector and there may be one weighting for each data vector [multiple weight vecs is a feature that seems would simplify some windowed summarizations]. Only one sort can be a Vararg. The data vecs are likely to change more often than the weight vecs. However weight vecs are not always used, while data vecs are. Having the weight vec[s] available as a positional arg simplifies dispatch somewhat -- so perhaps

rolling(window_fn<:Function, window_span::Int, data::AbstractVector...;
padding=NoPadding(): # NoPadding unexported sentinel type instance,
weights::Union{AbstractVector, NTuple{N,AbstractVector}, Nothing}=nothing, # kwarg, optional
)

bkamins · 2023-03-02T09:36:30Z

OK - I see your point. Let me ask on Slack on #data what users would think. Thank you!

bkamins · 2023-03-02T09:37:08Z

Ah - and also we discussed that the window_span could be a vector (different window span for different observations).

nilshg · 2023-03-02T12:03:13Z

Just to bring over what I said on Slack:

I voted for positional arg originally, as I thought footguns of the type optimize(f, lower, upper, initial) where one easily can confuse the three last positional args don't apply given that window_span is an integer. I didn't realize that both window_span and data can be vectors now, somewhat increasing the likelihood of footguns.

I then wondered whether there are any constraints about the arguments which could be used to error early and prevent users from silently getting wrong results, e.g. if we require length(data) > length(window_span) that could be used to prevent users getting it wrong.

My assumption is that the vast majority of use cases is still the simple rolling(f, 3, data) so it would be nice to keep the more succinct positional arg version.

jeremiedb · 2023-03-02T16:32:24Z

Would the integration of the equivalent of R's roll's min_obs be desirable?
I for example have use cases where I want to roll over a length of given width, but still prefer to have non missing returned if less data than the width is present:

> roll_sum(1:3, width = 2)
[1] NA  3  5

> roll_sum(1:3, width = 2, min_obs = 1)
[1] 1 3 5

> roll_sum(1:3, width = 4, min_obs = 2)
[1] NA  3  6

Note that in the last case, although it may seem wrong to specify a width longer than the actual data, such situation may legitimately happens in cases where such rolling function is passed to a grouped DataFrame whose length isn't guaranteed to be >= window_span.

JeffreySarnoff · 2023-03-02T17:13:02Z

To roll a function F over a data vector of length N using a window_span of W where W > N is the same as applying F to the (entire) data vector. That is reasonable iff N is large enough so F(data) is numeric (e.g. N > k for some k that depends on the nature of F .. which of F([]), F([x]), F([x1,x2]) is legit). The responsibility would be on the caller.

Note that there are analytic issues with doing this; consider situations where the result is compared or combined with other results that have been applied to data of length N >= W. This "feature" should not be used in such situations. I think it may be overkill to provide a keyword arg to permit this -- so it would be all on the user. For less experienced users, that is problematic. What do you think?

JeffreySarnoff · 2023-03-02T18:37:01Z

from @nilshg : Maybe my point wasn't clear - length(window_span) < length(data) to me means the vector of window spans is shorter than the data, i.e. it wouldn't make sense to supply 10 window lenghts if I only have 5 observations.

Functions will ignore any extra spans (spans in a vector of window spans beyond where cumsum(spans) exceeds length(data). If the last useful span extends beyond length(data), either the prior span could be extended or the that last span could be shrunk or that last span could be trimmed. Preferences?

jeremiedb · 2023-03-02T19:17:10Z

I'm not strongly opiniated on this, just thought that min_obs could be a nice to have flexibility. Since it's meant to be an optional kwarg that defaults to width, I wouldn't expect this to be exposed so much to foot shooting as the user needs to be intentional in its usage of this option.

JeffreySarnoff · 2023-03-07T23:00:58Z

There was no post giving an example for wanting to process distinct window_spans while traversing a vector (e.g. the first 12 obs, then the next 18 obs ..). My guess is that the request was to process a sequence of window_spans, one-at-a-time, without explicitly coding the loop.

bkamins · 2023-03-08T06:41:21Z

If it is not clear how winow_spans as a vector should be processed now I guess we can skip it, but I would propose to have a design that ensures it can be added (in general: whatever is available in R or pandas sooner or later will be asked for from my DataFrames.jl experience 😄).

JeffreySarnoff · 2023-03-08T06:45:40Z

One way to allow support for that: there would be a kwarg windows_spans or some such that would override the positional arg if it were set to a vector or tuple.

JeffreySarnoff · 2023-03-08T06:47:30Z

Otherwise, it becomes difficult to know the data vectors from the span vector (without always making the span a kwarg -- which we discussed above)

bkamins · 2023-03-08T06:51:16Z

Otherwise, it becomes difficult to know the data vectors from the span vector (without always making the span a kwarg -- which we discussed above)

This is the reason I preferred kwarg over positional arg as I believed it is more flexible for future development.

JeffreySarnoff · 2023-03-08T07:00:16Z

I will allow an Int and allow NTuple{N,Int} for window_span without implementing the NTuple version yet.

BeitianMa · 2023-03-11T08:15:02Z

In the future, will the keyword of padding be applied to rolling functions in general? As I do research in asset pricing, I often run rolling regressions of stock returns on certain other variables, and it would be handy if the following procedure worked

beta = rolling(get_beta, df.return, df.char, rolling_window, padding=missing)

But despite now needing to manually add padding values, your package has already given me a 20x+ speedup over Python's rolling.apply() solution, so thank you very much for your work!

JeffreySarnoff · 2023-04-09T09:34:33Z

@BeitianMa yes and it is working in the dev v1 Pkg.add(url="https://github.com/JeffreySarnoff/RollingFunctions.jl",rev="v1")
maybe like this beta = rolling(get_beta, rolling_window, df.return, df.char; padding=missing) JUST FOR TESTING
I assume your get_beta works this way get_beta(df.return[..], df.char[..])

JeffreySarnoff · 2023-04-09T09:34:47Z

@bkamins I have settled on an implementation pattern

rolling(func, span, datavec1; padding<=nopadding>, padlast<=false>)
and, where weights<:StatsBase.AbstractWeights
rolling(func, span, datavec1, weights1; padding<=nopadding>, padlast<=false>)
are passing their tests, however more comprehensive tests are indicated

JeffreySarnoff · 2023-06-19T04:12:15Z

On development version 0.9.75 v1 adjacent
Test coverage is decent and all tests pass.

the image of v1

Next is to comb the docs into relevance. meanwhile:

# rolling advances one step at a time

rolling(func, window_span, data_vec [, data_vec2[, data_vec3]] 
           [, StatsBase.weights [,weights2[, weights3]] ];
           padding=nopadding, atend=false)

rolling(func, window_span, data_mat
            [, StatsBase.weights [...]]; 
            padding=nopadding, atend=false)
#          func is applied to each [[individually] weighted] column of the data_mat

# tiling gulps span elements and then advanced over those same elements,
#    to gulp the successor span elements ..

tiling(func, window_span_is_tile_span, data_vec [, data_vec2[, data_vec3]] 
           [, StatsBase.weights [,weights2[, weights3]] ];
           padding=nopadding, atend=false)

tiling(func, window_span_is_tile_span, data_mat 
            [, StatsBase.weights [...]]; 
           padding=nopadding, atend=false)
#         func is applied to each [[individually] weighted] column of the data_mat

# running tapers the function at the start or at the end of the data

running(func, window_span, data_vec [, data_vec2[, data_vec3]] 
              [, StatsBase.weights [,weights2[, weights3]] ];
             atend=false)

running(func, window_span, data_mat \
             [, StatsBase.weights [...]];
             atend = false)
#           func is applied to each [[individually] weighted] column of the data_mat

[roll|tile|run]_
where _ is {min, max, extrema, mean, sum, var, std, cov, cor}

JeffreySarnoff · 2023-07-14T00:20:02Z

Early access to RollingFunctions.jl v1 is given as WindowedFunctions.jl.
This version does more, has padding, and provides tiling. (see the docs).

add the package
`Pkg.add(url="https://github.com/JeffreySarnoff/WindowedFunctions.jl")`

This version is radically redesigned. The new approach offers the most requested capabilities,
provides much flexibility, and performs well. All of the widely used call signatures will remain
supported for many months following the official release of v1, identified as deprecated calls
with detailed information about how to revise them.

It would be a great help if you would take this prerelease for a walk.
Any issues or suggestions should be posted here.

Add support for padding with sentinel value for roll* #24

Add support for padding with sentinel value for roll* #24

Comments

bkamins commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022

bkamins commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022

bkamins commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022 • edited Loading

bkamins commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022 • edited Loading

bkamins commented Oct 5, 2022 • edited Loading

bkamins commented Oct 5, 2022

JeffreySarnoff commented Oct 5, 2022

kpa28-git commented Jan 9, 2023

bkamins commented Jan 9, 2023

JeffreySarnoff commented Jan 10, 2023 • edited Loading

bkamins commented Jan 18, 2023

JeffreySarnoff commented Jan 18, 2023 via email • edited Loading

bkamins commented Jan 18, 2023

JeffreySarnoff commented Jan 18, 2023

bkamins commented Jan 18, 2023

JeffreySarnoff commented Jan 18, 2023

bkamins commented Jan 18, 2023

JeffreySarnoff commented Jan 18, 2023

JeffreySarnoff commented Jan 18, 2023

bkamins commented Jan 18, 2023 • edited Loading

JeffreySarnoff commented Jan 18, 2023

JeffreySarnoff commented Jan 18, 2023

bkamins commented Jan 18, 2023

JeffreySarnoff commented Jan 23, 2023

JeffreySarnoff commented Feb 12, 2023

bkamins commented Feb 12, 2023

Sentinel value in padding

In some cases rolling errors unexpectedly

padding is supported only for rolling

In general functions lack docstrings

JeffreySarnoff commented Feb 15, 2023 • edited Loading

RollingFunctions.jl v1

notes for JeffreySarnoff, bkamins 2023-02-15

Looking Back

tapering

genericity

omissions

most often noted

others of import

Looking Forward

design goals

api consistancy

argument order

performance

Technical Notes

taper

pad

pad then taper

consistant relationships

JeffreySarnoff commented Feb 15, 2023

JeffreySarnoff commented Feb 16, 2023 • edited Loading

JeffreySarnoff commented Feb 16, 2023

bkamins commented Feb 16, 2023 • edited Loading

JeffreySarnoff commented Feb 16, 2023

bkamins commented Feb 16, 2023

JeffreySarnoff commented Mar 2, 2023

bkamins commented Mar 2, 2023

bkamins commented Mar 2, 2023

nilshg commented Mar 2, 2023

jeremiedb commented Mar 2, 2023

JeffreySarnoff commented Mar 2, 2023 • edited Loading

JeffreySarnoff commented Mar 2, 2023

jeremiedb commented Mar 2, 2023

JeffreySarnoff commented Mar 7, 2023

bkamins commented Mar 8, 2023

JeffreySarnoff commented Mar 8, 2023

JeffreySarnoff commented Mar 8, 2023

bkamins commented Mar 8, 2023

JeffreySarnoff commented Mar 8, 2023

BeitianMa commented Mar 11, 2023 • edited Loading

JeffreySarnoff commented Apr 9, 2023

JeffreySarnoff commented Oct 5, 2022 •

edited

Loading

JeffreySarnoff commented Oct 5, 2022 •

edited

Loading

bkamins commented Oct 5, 2022 •

edited

Loading

JeffreySarnoff commented Jan 10, 2023 •

edited

Loading

JeffreySarnoff commented Jan 18, 2023 via email •

edited

Loading

bkamins commented Jan 18, 2023 •

edited

Loading

In some cases `rolling` errors unexpectedly

padding is supported only for `rolling`

JeffreySarnoff commented Feb 15, 2023 •

edited

Loading

JeffreySarnoff commented Feb 16, 2023 •

edited

Loading

bkamins commented Feb 16, 2023 •

edited

Loading

JeffreySarnoff commented Mar 2, 2023 •

edited

Loading

BeitianMa commented Mar 11, 2023 •

edited

Loading

JeffreySarnoff commented Apr 9, 2023 •

edited

Loading

JeffreySarnoff commented Jun 19, 2023 •

edited

Loading