Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [CompilerPerf] Seq - the next generation. #2745

Closed
wants to merge 122 commits into from

Conversation

manofstick
Copy link
Contributor

@manofstick manofstick commented Mar 31, 2017

A rewrite of Seq. This will be a branch which is maintained from the head of #2587, which is the underlying tech.

Things to consider about to exist [WIP] status (unedited from comments below):

  • The main one is should ISeq be exposed beyond it's bare minimum rrequirements. I think it has value, especially as I have made Seq a bit more of the swiss-army knife, using the ISeq, List or Array modules are most appropriate.

    [dsyme] Resolution: No. I think the focus for now should be on having ISeq be as private as possible while still getting performance gains

  • Should an FSharpFunc be part of the public ISeq interface? It's relatively easy to convert that signature of Fold to take an interface for the Folder creation.

    [dsyme] Resolution: No. I think the focus for now should be on having ISeq be as private as possible while still getting performance gains

  • I have for the most part really tried to minimize object construction. This could be further reduced to collapse combined ITransformFactory objects into the same objects that implement ISeq. It comes at the expense of an extra generic parameter to those objects though, so I think it would just make more runtime code bloat. I tried it and it also slowed code down, so I didn't go ahead.

    [dsyme] ok, marking as resolved, we won't do this

  • On the flipside of runtime code bloat I could move the main iteration code out of the fold methods so that that code doesn't rely on the Result generic parameter (and adding another layer in between Activity and Folder which doesn't take that parameter). I didn't take time to measure runtime size, but it did slow things down a tad, so I didn't go ahead.

    [dsyme] ok, marking as resolved, we won't do this

  • I'm not really happy with the Thin and Enumerable monikers I have used. So maybe a alternative consistent naming should be applied.

    [dsyme] ok, I'll make comments on naming as I do the code review

  • Decide if helper classes should be public. I'm thinking the IdentityFactory and ComposeFactory (at the moment, even internally, I have them duplicated - they are trivial implementations, but maybe should be exposed externally)

    [dsyme] What are the pros here?
    [manofstick] Well as ISeq will be private, this is moot.

  • Possibly add a register function which would add a hook into ofSeq. This would allow alternative collections types to then expose the ISeq functionality. i.e. at the moment there is ISeq.ofResizeArrayUnchecked to utilize the ISeq functionality for ResizeArray, but ISeq wouldn't get used otherwise. Register might take something like seq<'a> -> option<ISeq<'a>> and just iterate through.

    [dsyme] Resolution: No. We don't want this kind of type-registry state in FSharp.Core

  • Possibly add a force function. This is kind of the equivalent of toList, toArray etc. but more flexible and efficient. i.e. if it already is an appropriate Thin ISeq then that type is just returned directly, if it a Delayed type then it is executed and that is returned, or otherwise it would populate a ResizeArray and the ISeq.ofResizeArrayUnchecked would be run.

    [dsyme] Resolution: I think we can do without this for now

  • Decide if the functionality in EnumerableBase (bad name!) should be exposed. This could allow Length to be optimized for alternative collections

    [dsyme] Resolution: I think we can do without this for now

  • Decide if ISeq should carry all the API of List/Array/Seq. The previous release just got all those APIs in sync, but maybe we should only have the functions that actually make sense at that level... I'm thinking findBack etc...

    [dsyme] Resolution: No. I think the focus for now should be on having ISeq be as private as possible while still getting performance gains

  • ISeq was approached as the inner implementation of Seq, as as such makes extensive use of inline. Maybe wind back in some cases if considered excessive.

    [dsyme] Yes, this is a concern, need to look at this and FSHarp.Core size

@manofstick manofstick force-pushed the manofstick-iseq-part2 branch 4 times, most recently from 6afd2a3 to fb00c6a Compare April 1, 2017 07:30
@manofstick
Copy link
Contributor Author

manofstick commented Apr 1, 2017

@dsyme

I'm starting to create a "real world" collection of tests, I'm not sure if these are what you were thinking of previous comment for adding to the build?

So far I'm mining @theburningmonk Euler Problem set efforts. I'm basically coping them verbatum, except where they exclusively don't use Seq, in which case I am modifying them (such as problem 1, which originally was on List, but I changed to Seq) where they contain a combination of Seq as well as List or Array items I have left them in the original state.

The collated tests are temporarily here, until I find a better home for them. I'll do a follow up post with current timings, which I will update.

Note: the site of tests are not particularly optimised. Performance was obviously not a consideration when this code was written. I might provide slightly modified versions that do run faster at some stage, especially in cases where we are seeing some negative performance impacts.

@manofstick
Copy link
Contributor Author

manofstick commented Apr 2, 2017

TheBurningMonk - Euler - 5

This is worse under both 32-bit and 64-bit. The reason is that the inner loop is the function isEvenlyDividedByAll which calls Seq.forall on numbers, which is an array, and the forall condition is broken regularly very early in the array. In the new Seq this requires a wrapper class for the Array, and creation of a Fold object. It can run via executeThin which makes it a little less heavy, but still requires some initialization and cleanup.

Some alternative coding are available here.

5a - This modifcation calls Seq.ofArray up front so the type checks don't need to be performed at each step.
5b - This modification just changes the Seq.forall to an Array.forall

32-bit

Test Old time (ms) New time (ms) Percent
TheBurningMonk - Euler - 5 3557 4930 139%
TheBurningMonk - Euler - 5a 5302 3292 62%
TheBurningMonk - Euler - 5b 1485 974 66%

64-bit

Test Old time (ms) New time (ms) Percent
TheBurningMonk - Euler - 5 3415 3541 104%
TheBurningMonk - Euler - 5a 4530 1962 43%
TheBurningMonk - Euler - 5b 1533 918 60%

So after either change the new version is significantly faster again.

@manofstick
Copy link
Contributor Author

@dsyme

Just so I'm not wasting my time; is this the sort of information that you are after? Obviously still need to do the full matrix of all function, but as an adjunct...

@dsyme
Copy link
Contributor

dsyme commented Apr 3, 2017

@manofstick It's fascinating - and yes, very helpful!

@manofstick manofstick force-pushed the manofstick-iseq-part2 branch 3 times, most recently from d987910 to 0527641 Compare April 6, 2017 06:33
@manofstick
Copy link
Contributor Author

manofstick commented Apr 7, 2017

Latest timings (link to my data collation function so I can create this more easily next time)

... deleted ... more results below ...

@manofstick manofstick force-pushed the manofstick-iseq-part2 branch 6 times, most recently from 3773ea8 to 882ee69 Compare April 12, 2017 08:54
@manofstick
Copy link
Contributor Author

OK; had a bit of a speedbump with nested concat, but think back on track. Anyway, gives an opportunity to boast about performance. In a particularly nasty case for the old Seq I present:

for i = 1 to 10 do
    let sw = System.Diagnostics.Stopwatch.StartNew ()

    let count = 
        Seq.initInfinite Seq.singleton
        |> Seq.scan Seq.append Seq.empty
        |> Seq.concat
        |> Seq.take 1000000
        |> Seq.sum

    if count <> 470700300 then failwith ":-("

    printfn "%d (%d)" sw.ElapsedMilliseconds count

Then the time difference between the old and the new is (after warm up) 60ms vs 16000s! So over 250x faster...

@manofstick manofstick force-pushed the manofstick-iseq-part2 branch 2 times, most recently from 882ee69 to 5726129 Compare April 13, 2017 08:25
@dsyme
Copy link
Contributor

dsyme commented Apr 13, 2017

An initial test indicates that this doesn't hurt the overall perf of the compiler

https://github.com/Microsoft/visualfsharp/blob/master/tests/scripts/compiler-perf-results.txt#L74

startup-to-parseonly parseonly-to-check   check-to-nooptimize  check-to-optimize    nooptimize-to-debug
10.27                27.58                44.91                59.12                51.85               

v. some recent times on master:

10.12                30.98                45.96                61.91                51.16               
10.31                30.51                46.54                62.74                51.05               

That may even indicate an improvement, e.g. the parseonly-to-check appears to improve - but note that the times are pretty flaky - regular +/- 10% - so it's within the margin of error.

I'll do more test runs but they above figures are vaguely indicative and would show up a big regression (note they are measuring wall-time for process completion)

@manofstick
Copy link
Contributor Author

@dsyme well that's promising!

I have made a couple of architectural changes. I have pushed the "bare bones" types up in to prim_types, which has then allowed me to derive seq computational expressions and will the range operators from ISeq.

There is still room for improvement in regards to the computational expressions as really the ISeq root shouldn't need to call any virtual functions to iterate the state machine, but I would need to do some slight modifications to the state machine generation logic, which I'm unwilling to do at this current juncture.

I was keen to add ISeq to list though, to save the need for the wrapper, but I couldn't even get the proto-compiler to compile even with just empty functions (I ponder if this is related to generic function on generic interface in prim types? Anyway I don't really know, nor did I attempt any sort to find out. I thought someone more knowledgeable in that area (ummm, you!) might come to a quicker diagnosis.)

Anyway, should be enjoying my Easter in the mountains, damn modern telecommunications.

@dsyme
Copy link
Contributor

dsyme commented Apr 14, 2017

Anyway, should be enjoying my Easter in the mountains, damn modern telecommunications.

Yes you should. Paste a picture of the mountains in here please. Then turn your phone and computer off. :)

@manofstick
Copy link
Contributor Author

Argh! The mountains are riddled with little monsters!

tmp_4859-img_20170414_144523-1582246922

See you on the other side! If I survive!

@manofstick
Copy link
Contributor Author

Back on deck; I've obviously broken something because the build isn't finishing... Will investigate when I get an opportunity.

@dsyme
Copy link
Contributor

dsyme commented May 8, 2018

Also, looking at the perf results, we have 13 cases of 💔, where regressions are > 30%. All of those are a bit "scary"....

Looking at those again, I think nearly all are explicit uses of GetEnumerator.

@manofstick
Copy link
Contributor Author

@dsyme

Hmmm. Look, its not going to kill me if you don't slurp this PR in. I think it was a neat re-imagining of how Seq could be implemented, but everything is a matter of trade offs. Personally those trade offs have meant that pretty much all Seq code in our (my works) code base reflexively gets written as seq comprehensions, rather than using Seq composition, but I might be a little more performance conscious than most (but obviously not too much, or I would ditch the whole stuff and write my code in Rust or something!)

One of the most egregious cases - Seq.init(Infinite)? - has been lessened a bit by cleaning up of lazy in .net core (which reminds me that I should submit a PR to add "inline" to LazyExtensions.Create to avoid extra creation of superfluous object.) (Don't get me wrong, even after both theses are done this is still a wildly dubious Seq.init implementation that really should never have existed and basically still renders it useless if you care about performance at all...)

This really is a case of the whole or none - mixing and matching between "normal" pull seqs and these push seqs comes at a performance penalty.

A "smaller" improvement could just make filter/map special cases - like Linq does.

Or just accept status quo.

Hmmm.

@manofstick
Copy link
Contributor Author

Oh, and with the optimization changes that have occurred within the JIT it's probably worth rerunning the test suite, as they may have had effects for good or for ill...

@dsyme
Copy link
Contributor

dsyme commented May 9, 2018

@manofstick Possible minimal cuts:

  1. Do nothing, close this PR

  2. Add just enough so that the "built in" ways of making sequences (ranges, GeneratedSequenceBase) support a push-sequence interpretation, i.e. add a minimal IPushEnumerable interface and have GeneratedSequenceBase implement it. Then put the rest in a library FSharp.Collections.FastSeq where open FSharp.Collections.FastSeq replaces Seq.* via out-scoping.

  3. Do (1) plus a push-sequence implementation for map/filter

  4. Do everything you have but hide the new ISeq module

  5. Do everything you have

@dsyme
Copy link
Contributor

dsyme commented May 9, 2018

@manofstick If it's easy for you to re-run the performance tests that would be great.

@abelbraaksma
Copy link
Contributor

FWIW, I'd like at least some of these to be merged, as building this by hand is rather tricky to get right and the performance gains are significant in quite a few scenarios. I'd suggest to start with a minimal set and see what feedback that brings.

@manofstick
Copy link
Contributor Author

manofstick commented May 9, 2018

@dsyme

(0) easiest, but defeatist and a little luddite.
(1) sounds good...
(2) is bad - you need to have all the fold operations as well to make it worth while (that said the System.Linq implementation collapses Select/Where together - so an implementation along those lines might still be appropriate - and much, much simpler and easier than the whole effort here)
(3) bloat for no benefit
(4) The needs of the many out weight the needs of the few. And if the few can get it with an install-package ... and an open ... then I think that's satisfying pretty much everyone.

So if you're happy with (1) then so am I.

The final question would be do you think this has any relevance beyond Seq? i.e. should it exist in System.Linq (i.e. would it ever natively be implemented on ResizeArray etc.)?

I'm think probably not, because it is a little bit more than just the interface (i.e. you have to support the Activity object, which isn't a role model of encapsulation...)

I'm just asking because once it's embedded in FSharp.Core then this possibility is off the table.

@zpodlovics
Copy link

zpodlovics commented May 10, 2018

@manofstick FYI:

"We present the first approach that represents the full general-
ity of stream processing and eliminates overheads, via the use of
staging. It is based on an unusually rich semantic model of stream
interaction. We support any combination of zipping, nesting (or
flat-mapping), sub-ranging, filtering, mapping—of finite or infi-
nite streams. Our model captures idiosyncrasies that a program-
mer uses in optimizing stream pipelines, such as rate differences
and the choice of a “for” vs. “while” loops. Our approach delivers
hand-written–like code, but automatically. It explicitly avoids the
reliance on black-box optimizers and sufficiently-smart compilers,
offering highest, guaranteed and portable performance.
Our approach relies on high-level concepts that are then readily
mapped into an implementation. Accordingly, we have two distinct
implementations: an OCaml stream library, staged via MetaOCaml,
and a Scala library for the JVM, staged via LMS. In both cases, we
derive libraries richer and simultaneously many tens of times faster
than past work.
We greatly exceed in performance the standard
stream libraries available in Java, Scala and OCaml, including the
well-optimized Java 8 streams." [1] [2] [3]

[1] https://strymonas.github.io/
[2] https://github.com/raw/strymonas/strymonas.github.io/master/strymonas.pdf
[3] https://arxiv.org/abs/1612.06668
[4] https://palladin.github.io/StreamsPresentation/index.html

@manofstick
Copy link
Contributor Author

@zpodlovics this implementation is closer to Nessos's Stream library but attempting to retain full compatibility with the f# Seq implementation. It uses no runtime compilation.

@zpodlovics
Copy link

@manofstick True, and sorry for the noise. I just wanted to share some additional ideas that may worth to explore.

@dsyme
Copy link
Contributor

dsyme commented May 11, 2018

@manofstick

... because it is a little bit more than just the interface (i.e. you have to support the Activity object, which isn't a role model of encapsulation...

I think this is my biggest issue with what you've done.

@manofstick
Copy link
Contributor Author

@dsyme

... because it is a little bit more than just the interface (i.e. you have to support the Activity object, which isn't a role model of encapsulation...

I think this is my biggest issue with what you've done.

Hahaha, it did make me feel dirty!

Although this could be better if it was defined in C#, because they i could of used the protected modified. (obviously the decision has been made, based on lambda grounds, that the protected keyword would not exist in f#, but possibly we could cheat and add a compiler attribute to make it happen? I mean the compiler is happy dealing with deriving from c# classes with protected members....)

Hmm. And really the whole State thing could be thrown away - that was just done to make up for the fact that fsharp object expressions when they captured mutable values they wrapped them in FSharpRef even when they are fully scoped within the object expression - so this was done to minimize garbage (the same issue occurs with general lambdas, seq expression, etc.) A modification to the compiler could eradicate the need for this - making the code simpler, and possibly providing some wider benefits as well. Hmm...

Thoughts on any of this?

@manofstick
Copy link
Contributor Author

@dsyme

The conversation here dried up, so just trying to revive it...

Not that I'm finding much time (hmm... nothing seems to change!) but I am keen for the external nuget FastSeq package as previously discussed.

So I think you were in agreement there? So maybe I'll close this PR, and create a new one with just the changes in prim-types.fs (i.e. additional interface, supporting types and list/GeneratedSequenceBase & ranges implemenations) and then go ahead and create a new FastSeq project.

In the FastSeq I would expose both the Seq namespace overlay as well as the inline ISeq namespace (i.e. so you could modify your code to make use of the inlining versions)

Any thoughts on the root namespace of FastSeq? namespace FSharp.Collections.FastSeq? which contains the Seq (overlay) and ISeq (underlying inlines)?

@dsyme
Copy link
Contributor

dsyme commented Jun 4, 2018

So I think you were in agreement there?

I'm still not sure. What I'd love is if there was some simple truly abstract interface that could admit multiple future implementations. The types in prim-types.fsi are still very concrete.

This sort of thing feels acceptable:

    type ISeqConsumer<'Idx,'T,'Result> = 
        interface ISeqConsumer
        member Result : 'Result with get, set
        member HaltedIdx : 'Idx with get
        override ChainComplete : 'Idx -> unit
        override ChainDispose : unit -> unit

    type ISeqTransform<'T,'U> = interface ... end

    type IConsumableSeq<'T> =
        inherit System.Collections.Generic.IEnumerable<'T>
        abstract Transform<'U> : ... ISeqTransform<'T,'U> -> IConsumableSeq<'U>
        abstract Consume<'Idx, 'Result> : getConsumer:('Idx -> ISeqConsumer<'Idx,'T,'Result>) -> 'Result

If we could have something as small as that and there was basically no public implementation code that would be great (it would have an internalized implementation on list and ranges) . But I can't see how to make two implementations (one internal and one external) cooperate together.

What happens if the whole thing goes in a "FastSeq" package? How problematic is that?

@manofstick
Copy link
Contributor Author

@dsyme

OK, when I get a chance I'll have a little play to see if I can bash it into a more acceptable shape.

@manofstick
Copy link
Contributor Author

@KevinRansom

Hmmm. This was a good piece of work, but once again if it had been there in the early days I think it would have been good, but now... Hmmm. A bit too invasive.

Maybe I'll try again at some stage to do another effort that is less grandiose...

@manofstick manofstick closed this Sep 17, 2018
@ntwilson
Copy link

@manofstick, sorry to hear this project is being scrapped :(. It's a little hard to follow all the twists and turns in this set of issues. Can you point me to the best way to start using this work if it's not going to be included in FSharp.Core? Is there a Nuget package that has everything needed?

@KevinRansom
Copy link
Member

@manofstick,

I think this is important work, and for sure I have enjoyed seeing it evolve. seq is so fundamental a part of our programming model that any attempt to modernize it was going to be a tough row to hoe. Do you think it is better to close it, and start again from ground up or just leave it around for us to mull over a while.

This is my way of saying, it is not a PR I regarded as orphaned, merely 'bloody hard'.

Kevin

@manofstick
Copy link
Contributor Author

For anyone who was interested in this, this has been revised in a somewhat morphed form into an (almost there!) complete re-implementaion of System.Linq - called Cistern.Linq - it's part of the plumbing! (Currently failing only a handful of tests taken from corefx. From an implementation perspective, the main part to do is fashion an OrderBy solution, currently has the corefx one which, although it interacts correctly, does not do so efficiently as I haven't had time to attack it yet).

It's got a partially complete f# wrapper, and an exceedingly rudimentary example. Hopefully over the next few weeks/months these will become more fleshed out.

I haven't current fashioned a nuget package suite, but will do so when I get time. The constant bane of my life :-)

Anyway, any calls for assistance have usually fallen on deaf ears, but hey, worth another call out if anyone wants help, in whatever fashion!

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Oct 11, 2019

@manofstick, I'd love to help, where can I start, any particular area? It would be great to move this forward.

Maybe this should be reopened?

@manofstick
Copy link
Contributor Author

manofstick commented Oct 11, 2019

@abelbraaksma

I don't think this should be reopened (at this stage at least). It's too big a change to be sucked into the main repository without the code being battled tested for a while... (and people at MS-HQ have never really embraced my big PRs, I have always wanted them to be joint efforts, because personally I find them quite important, but alas, other than some code reviews, I have yet to gain any traction... Really just to the point where I have given up)

Anyway, come over to Cistern.Linq and help out over there if you wish. There the core is written in C#, but I have F# wrapper with the goal of being highly API compatible with Seq (I currently has stubs to Seq filling in missing parts). Here is an of usage . Anyway, I am still pondering how highly API compatible, as there are some different decisions, such as, for example, I don't throw exceptions on IEnumerator<>.Current, pre/pos .MoveNext, which is consistent with API documentation (it just says its undefined behaviour), but not consistent with the current Seq implementation

So if you want to help, from least knowledge to most, you could:

  • Add some benchmarks (C# or F#)
  • Add some test code (C# or F#). Currently these are just (basically) just copies of the fx-core System.Linq and fsharp Seq tests, but need more fleshing out.
  • Continuing fleshing out the Seq replacement
  • Work on the rest of the stuff (Here I probably need to stop and document the architecture and the break it down into what I have left. I would only be willing to do this if there was going to be a big commitment of time, and this would probably take me a while).

Any assistance would be appreciated. Thanks,

@manofstick
Copy link
Contributor Author

...and just for jiggles I modified one of my csharp benchmarks into f#... (base.Words is from this github location, and the test uses a subset of the words, sorted and unsorted)

[<Benchmark(Baseline = true)>]
member __.FSharpSeq () =
    base.Words
    |> Seq.filter (fun w -> w.Length >= 3)
    |> Seq.groupBy (fun w -> struct (w.[0], w.[1], w.[2]))
    |> dict

[<Benchmark>]
member __.CisternLinq() =
    base.Words
    |> Linq.filter (fun w -> w.Length >= 3)
    |> Linq.groupBy (fun w -> struct (w.[0], w.[1], w.[2]))
    |> Linq.dict
Method Sorted WordsCount Mean Ratio
FSharpSeq False 0 371.5 ns 1.00
CisternLinq False 0 463.4 ns 1.25
FSharpSeq False 1 739.5 ns 1.00
CisternLinq False 1 639.7 ns 0.86
FSharpSeq False 10 4,708.2 ns 1.00
CisternLinq False 10 1,558.1 ns 0.33
FSharpSeq False 1000 467,011.5 ns 1.00
CisternLinq False 1000 124,777.0 ns 0.27
FSharpSeq False 466544 217,686,731.1 ns 1.00
CisternLinq False 466544 56,379,337.8 ns 0.26
FSharpSeq True 0 373.0 ns 1.00
CisternLinq True 0 444.9 ns 1.19
FSharpSeq True 1 400.3 ns 1.00
CisternLinq True 1 531.2 ns 1.33
FSharpSeq True 10 3,176.3 ns 1.00
CisternLinq True 10 1,210.9 ns 0.38
FSharpSeq True 1000 319,389.3 ns 1.00
CisternLinq True 1000 76,071.3 ns 0.24
FSharpSeq True 466544 138,408,123.3 ns 1.00
CisternLinq True 466544 30,112,740.8 ns 0.22

And so you can see why I think Seq should be replaced - in this not particularly contrived example we run about 4 times faster...

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Oct 17, 2019

And so you can see why I think Seq should be replaced - in this not particularly contrived example we run about 4 times faster...

@manofstick Wow, impressive!

I'm currently out of the country, but should be behind my desk next week and I'll have a look at the tasks you mentioned. I'll probably download, compile and run it first to get a feeling for it. Awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.