Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ListEnumerable, Choose, and some cleanup #1

Merged
merged 2 commits into from
Oct 5, 2016

Conversation

liboz
Copy link
Collaborator

@liboz liboz commented Oct 2, 2016

Here's some stuff. Not super convinced about all the name choices especially things like ComposeWithMapThenFilter could be misinterpreted on reading.

I implemented the common interfaces for the enumerable in an abstract class, since it seems like all those things are just copy/paste. I made an abstract class for the enumerator for the interfaces, which again seem all the same.

@manofstick
Copy link
Owner

I had tried a common based type, but it had decreased performance a bit (not much!), and as that was what I was trying to maximize I decided the duplication wasn't too bad. But I'll suck this in and see how it goes. Did you do any measuring around using list directly rather than just using is enumerable interface?

(Have to wait for tomorrow, as it's 2am for me and there is a merge conflict...)

@liboz
Copy link
Collaborator Author

liboz commented Oct 2, 2016

Alright I did some measurements for using the interface or not. It seems to make no difference effectively (except for the list one which is new), probably because of JIT optimizations. I included tests using Linq instead and it seems that we lose out to Linq (which composes Where/Select) for smaller collection sizes and on 32bit always. However, we seem to be better on larger collections on 64bit.

Here's my test code, which basically does a map, a filter, another map, then a skip. The "Test" implementation were made copying the code before my PR and renaming it.

Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4710HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435779 ticks, Resolution=410.5463 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=32-bit RELEASE
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1586.0

Type=map Mode=Throughput LaunchCount=5
TargetCount=50

Method Platform Jit count Median StdDev Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
interfaceListTest X64 RyuJit 10 1,008.2709 ns 54.0688 ns 3.29 - - 305.47
normalListTest X64 RyuJit 10 1,419.6018 ns 30.3701 ns 3.13 - - 288.59
linqList X64 RyuJit 10 581.7170 ns 11.3070 ns 2.74 - - 255.61
interfaceListTest X86 LegacyJit 10 1,547.1038 ns 27.0424 ns 1.74 - - 162.34
normalListTest X86 LegacyJit 10 2,297.6952 ns 37.2671 ns 1.67 - - 163.93
linqList X86 LegacyJit 10 571.4733 ns 11.0320 ns 1.51 - - 140.43
interfaceListTest X64 RyuJit 100 2,792.3886 ns 165.8630 ns 10.61 - - 997.61
normalListTest X64 RyuJit 100 5,356.4023 ns 208.2418 ns 10.43 - - 962.43
linqList X64 RyuJit 100 3,453.7481 ns 82.1612 ns 9.96 - - 930.08
interfaceListTest X86 LegacyJit 100 4,855.0264 ns 81.7235 ns 5.29 - - 489.19
normalListTest X86 LegacyJit 100 10,200.9718 ns 182.4908 ns 4.93 - - 498.48
linqList X86 LegacyJit 100 3,335.3972 ns 70.6497 ns 5.13 - - 476.62
interfaceListTest X64 RyuJit 10000 196,591.3276 ns 6,387.8765 ns 799.63 - - 74,983.68
normalListTest X64 RyuJit 10000 420,774.2517 ns 7,049.9768 ns 788.41 - - 76,813.82
linqList X64 RyuJit 10000 307,575.5375 ns 4,181.0057 ns 797.72 - - 75,709.82
interfaceListTest X86 LegacyJit 10000 356,910.2501 ns 3,079.4752 ns 407.02 - - 37,902.15
normalListTest X86 LegacyJit 10000 852,072.7609 ns 9,892.3626 ns 370.00 - - 38,177.49

@manofstick
Copy link
Owner

manofstick commented Oct 2, 2016

Not sure if the gist is correct?

interfaceArrayTest/normalArrayTest and linterfaceSeqTest/normalSeqTest both look identical. in normalListTest you are using Seq.xxxTest functions though?

(I'll merge the stuff in when I get home tonight)

@liboz
Copy link
Collaborator Author

liboz commented Oct 2, 2016

Well that's unfortunate. Time to run it again! At least this confirms the list thing works well haha.

@liboz
Copy link
Collaborator Author

liboz commented Oct 3, 2016

Attempt #2. It seems the Interface performance about the same as normal, maybe even slightly faster on 64bit. But on the LegacyJit, it's completely horrible. I'll investigate a bit more into it tomorrow.

Method Platform Jit count Median StdDev Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
interfaceArrayTest X64 RyuJit 10 859.6220 ns 18.8630 ns 6.60 - - 308.76
normalArrayTest X64 RyuJit 10 851.4250 ns 22.6248 ns 6.68 - - 314.80
linqArray X64 RyuJit 10 332.7774 ns 18.2214 ns 5.17 - - 243.52
interfaceSeqTest X64 RyuJit 10 906.0117 ns 48.4145 ns 7.29 - - 341.35
normalSeqTest X64 RyuJit 10 894.1038 ns 56.6851 ns 7.17 - - 335.56
linqSeq X64 RyuJit 10 446.0308 ns 18.8094 ns 5.76 - - 272.07
interfaceArrayTest X86 LegacyJit 10 1,441.2098 ns 58.9495 ns 3.32 - - 158.75
normalArrayTest X86 LegacyJit 10 1,223.2680 ns 26.5222 ns 3.37 - - 161.48
linqArray X86 LegacyJit 10 322.9686 ns 6.3181 ns 2.75 - - 129.79
interfaceSeqTest X86 LegacyJit 10 1,456.7698 ns 29.5325 ns 3.67 - - 171.66
normalSeqTest X86 LegacyJit 10 1,292.0409 ns 68.4122 ns 3.66 - - 171.04
linqSeq X86 LegacyJit 10 441.5263 ns 17.5869 ns 3.10 - - 146.11
interfaceArrayTest X64 RyuJit 100 2,485.0482 ns 80.5840 ns 20.97 - - 991.93
normalArrayTest X64 RyuJit 100 2,536.0454 ns 48.3978 ns 21.53 - - 1,015.08
linqArray X64 RyuJit 100 2,220.7585 ns 169.9264 ns 19.56 - - 915.75
interfaceSeqTest X64 RyuJit 100 3,053.3722 ns 55.2690 ns 21.67 - - 1,015.13
normalSeqTest X64 RyuJit 100 3,085.0530 ns 65.6371 ns 21.33 - - 999.03
linqSeq X64 RyuJit 100 3,127.4820 ns 57.9461 ns 19.62 - - 937.55
interfaceArrayTest X86 LegacyJit 100 4,613.7045 ns 74.6421 ns 10.78 - - 508.02
normalArrayTest X86 LegacyJit 100 2,949.8946 ns 60.9587 ns 10.43 - - 491.01
linqArray X86 LegacyJit 100 2,065.6032 ns 31.6969 ns 10.05 - - 475.55
interfaceSeqTest X86 LegacyJit 100 5,075.2700 ns 67.6481 ns 10.87 - - 525.84
normalSeqTest X86 LegacyJit 100 3,237.8004 ns 68.9151 ns 10.78 - - 521.33
linqSeq X86 LegacyJit 100 2,946.5525 ns 105.4030 ns 10.23 - - 487.39
interfaceArrayTest X64 RyuJit 10000 183,551.0708 ns 5,416.8893 ns 1,577.43 - - 75,042.83
normalArrayTest X64 RyuJit 10000 192,904.0788 ns 3,723.8423 ns 1,605.67 - - 76,416.23
linqArray X64 RyuJit 10000 209,303.6285 ns 10,735.2405 ns 1,479.54 - - 71,279.82
interfaceSeqTest X64 RyuJit 10000 247,437.6243 ns 11,786.4968 ns 1,586.07 - - 76,512.07
normalSeqTest X64 RyuJit 10000 251,534.3670 ns 14,243.1859 ns 1,561.09 - - 75,315.55
linqSeq X64 RyuJit 10000 316,217.2560 ns 13,175.3323 ns 1,362.88 - - 65,726.57
interfaceArrayTest X86 LegacyJit 10000 362,314.9075 ns 11,109.7373 ns 794.62 - - 37,571.77
normalArrayTest X86 LegacyJit 10000 198,320.3128 ns 5,114.5473 ns 816.97 - - 38,466.14
linqArray X86 LegacyJit 10000 196,597.0408 ns 3,467.4121 ns 741.25 - - 35,032.02
interfaceSeqTest X86 LegacyJit 10000 407,330.8657 ns 9,801.1707 ns 751.00 - - 35,838.62
normalSeqTest X86 LegacyJit 10000 228,376.3897 ns 5,432.6813 ns 764.05 - - 36,142.17
linqSeq X86 LegacyJit 10000 278,931.9162 ns 3,762.1045 ns 779.66 - - 36,845.62

@manofstick
Copy link
Owner

I ponder if your test is running correctly.

I get similar performance to the Linq libraries, vastly better than existing Seq.

@manofstick
Copy link
Owner

Not saying that it is, but maybe you should try before your changes.

@manofstick
Copy link
Owner

I'm cloning your repository, and having a bit of a play around to see if that is what is negatively affecting performance (from your test suite). So sorry for any delays in merging this PR.

@manofstick
Copy link
Owner

manofstick commented Oct 3, 2016

OK; I'm not going to pull this PR in because it negatively affects performance; although not to the scale that you were seeing in your test suite; so I'm not sure what is going on there.

So if I ran the following code, linking to an original FSharp.Core, your PR's FSharp.Core and my PR's FSharp.Core

open System.Diagnostics

[<EntryPoint>]
let main argv = 
    let data = seq {
        let mutable x = 0L
        while x < 10000000L do
            yield x
            x <- x + 1L }

    for i = 1 to 10 do
        let sw = Stopwatch.StartNew ()

        let total =
            data
            |> Seq.map (fun n -> n * 5L)
            |> Seq.filter (fun n -> n % 7L <> 0L)
            |> Seq.map (fun n -> n / 11L)
            |> Seq.sum

        printfn "%d (%d)" sw.ElapsedMilliseconds total

    0

And I also ran the following for Linq comparison:

open System.Diagnostics
open System.Linq

[<EntryPoint>]
let main argv = 
    let data = seq {
        let mutable x = 0L
        while x < 10000000L do
            yield x
            x <- x + 1L }

    for i = 1 to 10 do
        let sw = Stopwatch.StartNew ()

        let total =
            data
             .Select(fun n -> n * 5L)
             .Where(fun n -> n % 7L <> 0L)
             .Select(fun n -> n / 11L)
             .Sum()

        printfn "%d (%d)" sw.ElapsedMilliseconds total

    0
version 32-bit 64-bit
original 1420 648
liboz 901 332
manofstick 739 326
linq 745 373

Post Factory changes

version 32-bit 64-bit
manofstick 721 308

These numbers do jump around a little from run to run, but not much.

You can see that your changes had a negative affect on the 32-bit version.

And you can see that the PR give basically the same performance as the linq version.

OK, this is for very large amounts of data, lets try with the very small:

open System.Diagnostics
open System.Linq

[<EntryPoint>]
let main argv = 
    let data = seq {
        let mutable x = 0L
        while x < 10L do
            yield x
            x <- x + 1L }

    for i = 1 to 10 do
        let sw = Stopwatch.StartNew ()
        let mutable total = 0L

        for j = 1 to 1000000 do
            total <-
                total + (data
                          |> Seq.map (fun n -> n * 5L)
                          |> Seq.filter (fun n -> n % 7L <> 0L)
                          |> Seq.map (fun n -> n / 11L)
                          |> Seq.sum)

        printfn "%d (%d)" sw.ElapsedMilliseconds total

    0
version 32-bit 64-bit
original 2181 1076
liboz 1949 1020
manofstick 1734 1108
linq 986 326

Linq clearly wins the day here. We could modify the code to minimize object creation, which is what the linq stuff does (i.e. shares enumerable/enumerator for first instance; i create more intermediate state combinations...) But really lots of small ones are not really my interest; times are pretty fast in those cases anyway...

Post unbox.any changes:

version 32-bit 64-bit
manofstick 1199 662

Post factory changes:

version 32-bit 64-bit
manofstick 1190 682

@liboz
Copy link
Collaborator Author

liboz commented Oct 3, 2016

Yea, running your test gets me similar results, though your test suite should probably do a JIT warmup, so the first iteration isn't a decent amount slower than the other ones. It appears the interface makes no difference on 64bit, but is about 30% slower on 32bit. I wonder what crazy thing happened with BenchmarkDotNet on 32bit though...

I will rebase this PR and make it so that it only has the ListEnumerator and Choose instead of the interface stuff.

Also on another note, I think one major reason we do worse than Linq for small collections is due to the explicit upcast in F# resulting in extra unboxing and that effect dominating the performance for small collections. For small collections, that's probably a 30% cost. See here: dotnet#1469

@manofstick
Copy link
Owner

@liboz

I put in a fix to remove the unbox.any (sorry it was before I saw you had updated this, so back to conflict, if you could fix that would be great)

So now the performance on short seqs I'd better; still not as good as Linq library though.

…o manofstick

# Conflicts:
#	src/fsharp/FSharp.Core/seq.fs
@liboz
Copy link
Collaborator Author

liboz commented Oct 4, 2016

@manofstick did the merge. I think the remaining performance difference is from the reusing the same object that Linq does, which I'm not sure we want to deal with.

@manofstick manofstick mentioned this pull request Oct 4, 2016
99 tasks
@liboz
Copy link
Collaborator Author

liboz commented Oct 4, 2016

Looking a bit closer, since I didn't see your test suite, I imagine there is also a hit in performance for upcasting the SeqComponents

@manofstick
Copy link
Owner

@liboz

I was just running the test from the other day (I.e the one where the Linq libraries smashed us for performance)

The upcasting of the SeqComponents is OK as it is class based inheritance. The f# compiler only seems to insert unbox.any if you are upcasting to an interface. I assume this is to correctly handle the car where an interface is implemented on a value type, but when I asked about why it did it previously all I heard was the crickets...

All being well, when I get home tonight I'll merge this PR in.

In the parent PR I have made a list of all the Seq functions that exist, so if you want to hack away at any more, go for your life, just put a comment on whatever ones your taking. And maybe create individual PRs for each?

@liboz
Copy link
Collaborator Author

liboz commented Oct 4, 2016

Sounds good, I'll try to get a few of them soon.

@manofstick manofstick merged commit 8c5ad63 into manofstick:manofstick-seq-composer Oct 5, 2016
@liboz liboz mentioned this pull request Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants