Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lift restrictions for matching of binaries and maps #6415

Merged

Conversation

bjorng
Copy link
Contributor

@bjorng bjorng commented Nov 2, 2022

Lift restrictions for matching of binaries and maps

There has always been an implementation limitation for matching
of binaries (for technical reasons). For example:

foo(Bin) ->
    <<A:8>> = <<X:4,Y:4>> = Bin,
    {A,X,Y}.

This would fail to compile with the following message:

t.erl:5:5: binary patterns cannot be matched in parallel using '='
%    5|     <<A:8>> = <<X:4,Y:4>> = Bin,
%     |     ^

This commit lifts this restriction, making the example legal.

A restriction for map matching is also lifted, but before we can
describe that, we'll need a digression to talk about the = operator.

The = operator can be used for two similar but slightly differently
purposes.

When used in a pattern in a clause, for example in a function head,
both the left-hand and right-hand side operands must be patterns:

Pattern1 = Pattern2

For example:

bar(#{a := A} = #{b := B}) -> {A, B}.

The following example will not compile because the right-hand side
is not a pattern but an expression:

wrong(#{a := A} = #{b => B}) -> {A, B}.

t.erl:4:23: illegal pattern
%    4| wrong(#{a := A} = #{b => B}) -> {A, B}.
%     |                       ^

Used in this context, the = operator does not imply that the two
patterns are matched in any particular order. Attempting to use a
variable matched out on the left-hand side on the right-hand side, or
vice versa, will fail:

also_wrong1(#{B := A} = #{b := B}) -> {A,B}.
also_wrong2(#{a := A} = #{A := B}) -> {A,B}.

t.erl:6:15: variable 'B' is unbound
%    6| also_wrong1(#{B := A} = #{b := B}) -> {A,B}.
%     |               ^

t.erl:7:27: variable 'A' is unbound
%    7| also_wrong2(#{a := A} = #{A := B}) -> {A,B}.
%     |                           ^

The other way to use = is in a function body. Used in this way,
the right-hand side must be an expression:

Pattern = Expression

For example:

foobar(Value) ->
    #{a := A} = #{a => Value},
    A.

Used in this context, the right-hand side of = must not be a pattern:

illegal_foobar(Value) ->
    #{a := A} = #{a := Value},
    A.

t.erl:18:21: only association operators '=>' are allowed in map construction
%   18|     #{a := A} = #{a := Value},
%     |                     ^

When used in a body context, the value of the = operator is the
value of its right-hand side operand. When multiple = operators are
combined, they are evaluted from right to left. That means that any
number of patterns can be matched at once:

Pattern1 = Pattern2 = ... = PatternN = Expr

which is equivalent to:

Var = Expr
PatternN = Var
   .
   .
   .
Pattern2 = Var
Pattern1 = Var

Given that there is a well-defined evaluation order from right to
left, one would expect that the following example would be legal:

baz(M) ->
    #{K := V} = #{k := K} = M,
    V.

It is not. In Erlang/OTP 25 or earlier, the compilation fails with the
following message:

t.erl:28:7: variable 'K' is unbound
%   28|     #{K := V} = #{k := K} = M,
%     |       ^

That restriction is now lifted, making the example legal.

Closes #6348
Closes #6444
Closes #6467

@bjorng bjorng added team:VM Assigned to OTP team VM enhancement labels Nov 2, 2022
@bjorng bjorng added this to the OTP-26.0 milestone Nov 2, 2022
@bjorng bjorng self-assigned this Nov 2, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

CT Test Results

       3 files     361 suites   44m 36s ⏱️
2 467 tests 2 420 ✔️ 47 💤 0
6 645 runs  6 595 ✔️ 50 💤 0

Results for commit fe049b3.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Nov 2, 2022
@RobinMorisset
Copy link
Contributor

Thank you for this change! It makes the language a bit simpler, and I like that it clarifies the semantics of assignment.

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch 2 times, most recently from bcf9221 to 2983a25 Compare November 7, 2022 04:55
@RobinMorisset
Copy link
Contributor

This PR seems to have introduced #6445

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from 2983a25 to 517da04 Compare November 8, 2022 05:35
@bjorng
Copy link
Contributor Author

bjorng commented Nov 8, 2022

@RobinMorisset Thanks for catching this problem. Fixed.

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch 2 times, most recently from 1435879 to 0dcf977 Compare November 8, 2022 07:19
Copy link
Contributor

@kikofernandez kikofernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I only have a minor suggestion to simplify a function implementation.

lib/compiler/src/v3_core.erl Outdated Show resolved Hide resolved
@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch 2 times, most recently from 908223c to 705a255 Compare November 10, 2022 11:55
@RobinMorisset
Copy link
Contributor

This PR introduces an erlc crash on the following code:

-module(validator).
-compile([export_all]).

f(#{[] := <<_>>, [] := <<_>>}) -> 
    ok.

It results in the following error message:

Function: f/1
minimized/validator.erl: internal error in pass beam_validator_strong:
exception error: no case clause matching {test_tail,8}
  in function  beam_validator:validate_bs_match/4 (beam_validator.erl, line 1743)
  in call from beam_validator:branch/4 (beam_validator.erl, line 2889)
  in call from beam_validator:validate_instrs/4 (beam_validator.erl, line 333)
  in call from beam_validator:validate_1/5 (beam_validator.erl, line 276)
  in call from beam_validator:validate_0/4 (beam_validator.erl, line 114)
  in call from beam_validator:validate/2 (beam_validator.erl, line 56)
  in call from compile:beam_validator_1/3 (compile.erl, line 1663)
  in call from compile:fold_comp/4 (compile.erl, line 408)

I could not reproduce this bug on master, only with this PR.

@bjorng I'm not sure whether such PR-specific bug reports should go here or in their own issue, please tell me if you have any preference (or more generally if I should report the bugs I find in a different way).

@bjorng
Copy link
Contributor Author

bjorng commented Nov 13, 2022

As long as a PR is not merged, I prefer bug reports to go into that PR as comments.

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from 6e2be21 to f138b60 Compare November 13, 2022 09:48
@bjorng
Copy link
Contributor Author

bjorng commented Nov 13, 2022

I have pushed a bug fix. Thanks for noticing the problem.

@RobinMorisset
Copy link
Contributor

Here is another testcase that only crashes erlc with this PR:

-module(kernel_to_ssa5).
-compile([export_all]).

f1() ->
    ok.

f2() ->
    #{Y := _} = (Y = ((_ = X) = f1())).

error message:

Function: f2/0
minimized/kernel_to_ssa5.erl: internal error in pass beam_kernel_to_ssa:
exception error: bad key: 'Y'
  in function  map_get/2
     called as map_get('Y',
 #{1 => {b_var,1},
   '@ssa_bool' => {b_var,'@ssa_bool'},
   '@ssa_ret' => {b_var,'@ssa_ret'}})
     *** argument 1: not present in map
  in call from beam_kernel_to_ssa:ssa_arg/2 (beam_kernel_to_ssa.erl, line 1127)
  in call from beam_kernel_to_ssa:select_extract_map/4 (beam_kernel_to_ssa.erl, line 544)
  in call from beam_kernel_to_ssa:select_map_val/5 (beam_kernel_to_ssa.erl, line 537)
  in call from beam_kernel_to_ssa:select_map/5 (beam_kernel_to_ssa.erl, line 529)
  in call from beam_kernel_to_ssa:match_cg/3 (beam_kernel_to_ssa.erl, line 177)
  in call from beam_kernel_to_ssa:do_match_cg/3 (beam_kernel_to_ssa.erl, line 167)
  in call from beam_kernel_to_ssa:cg/2 (beam_kernel_to_ssa.erl, line 142)

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from f138b60 to 5e3769e Compare November 15, 2022 05:29
@bjorng
Copy link
Contributor Author

bjorng commented Nov 15, 2022

Thanks! Fixed.

@RobinMorisset
Copy link
Contributor

RobinMorisset commented Nov 15, 2022

Here is a testcase that breaks with the latest version of this PR and not on master:

f(X, <<Y>>) when ((Y > X) xor X); (not X); X ->
    ok.

Error message:

Sub pass sanitize
Function: f/2
minimized/precodegen.erl: internal error in pass beam_ssa_pre_codegen:
exception error: no match of right hand side value [{b_literal,integer},
 {b_var,13},
 {b_literal,[unsigned,big]},
 {b_literal,8},
 {b_literal,1}]
  in function  beam_ssa_pre_codegen:sanitize_instr/3 (beam_ssa_pre_codegen.erl, line 900)
  in call from beam_ssa_pre_codegen:do_sanitize_is/9 (beam_ssa_pre_codegen.erl, line 805)
  in call from beam_ssa_pre_codegen:sanitize/5 (beam_ssa_pre_codegen.erl, line 667)
  in call from beam_ssa_pre_codegen:sanitize/1 (beam_ssa_pre_codegen.erl, line 657)
  in call from compile:run_sub_passes_1/3 (compile.erl, line 422)
  in call from beam_ssa_pre_codegen:function/2 (beam_ssa_pre_codegen.erl, line 160)
  in call from beam_ssa_pre_codegen:functions/2 (beam_ssa_pre_codegen.erl, line 89)
  in call from beam_ssa_pre_codegen:module/2 (beam_ssa_pre_codegen.erl, line 84)

@RobinMorisset
Copy link
Contributor

And here is another with sensibly the same error message, I'm putting it here in case it helps with debugging:

f(<<X>>, Y) when {(X / 1), (Y andalso true)}; Y ->
    ok.

@bjorng
Copy link
Contributor Author

bjorng commented Nov 24, 2022

Thanks! Fixed.

@RobinMorisset
Copy link
Contributor

It seems to still be possible to trigger the same bug (or at least one with the same error message) with different testcases, such as:

f(#{0 := <<_>>, 0 := <<X>>})  -> 
	X.

or

f(#{0 := <<_>>, 0 := <<_, 0>>}) ->
    ok.

or

f(#{0 := <<_>>, 0 := <<_, _:(0 div 0)>>}) ->
    0.

or

f(#{0 := <<X:0>>, 0 := <<X>>}) ->
    ok.

or

f() ->
    case (catch <<0 || true>>) of
        #{[] := <<_>>, [] := <<X>>} ->
            X
    end.

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from 39eb24b to 7f0bd6d Compare November 24, 2022 13:55
@bjorng
Copy link
Contributor Author

bjorng commented Nov 24, 2022

Yes, it is the same bug. I have now pushed a fix that cares of it in a different way than my previous attempts.

@RobinMorisset
Copy link
Contributor

Thank you for your reactivity. The fuzzer can no longer trigger that particular bug, but it now found some very similar testcases that trigger a different error message:

f(#{0 := <<_, _:(true andalso 0)>>, 0 := <<_>>}) ->
    ok.

and

f() ->
    case catch ({}+{}) of
        #{[] := <<X:0, _:X>>, [] := <<_>>} ->
            ok
    end.

Both cause a crash with the following error message (that is for the second case, the first one is the same except for some numbers):

Sub pass ssa_opt_bsm_shortcut
Function: f/0
minimized/bug2.erl: internal error in pass beam_ssa_opt:
exception error: bad key: {b_var,14}
  in function  map_get/2
     called as map_get({b_var,14},#{39 => {0,8},{b_var,13} => 0})
     *** argument 1: not present in map
  in call from beam_ssa_opt:bsm_positions/2 (beam_ssa_opt.erl, line 1959)
  in call from beam_ssa_opt:ssa_opt_bsm_shortcut/1 (beam_ssa_opt.erl, line 1944)
  in call from compile:run_sub_passes_1/3 (compile.erl, line 422)
  in call from beam_ssa_opt:phase/4 (beam_ssa_opt.erl, line 114)
  in call from beam_ssa_opt:run_phases/3 (beam_ssa_opt.erl, line 77)
  in call from beam_ssa_opt:module/2 (beam_ssa_opt.erl, line 69)
  in call from compile:'-select_passes/2-anonymous-0-'/3 (compile.erl, line 681)

@bjorng
Copy link
Contributor Author

bjorng commented Nov 25, 2022

Your latest example triggered a bug in the ssa_opt_bsm sub pass. I fixed it... I thought. The new test cases based on your examples fail when some optimizations are disabled. It seems to be an unrelated bug. I can't push my fix to this PR until I have fixed that bug, too, because there would be failed test cases in our daily builds. Meanwhile, if your want to find more bugs (with all compiler optimizations enabled), here is another branch which includes my latest fix:

https://github.com/bjorng/otp/tree/bjorn/compiler/parallel-match-WIP

@bjorng bjorng removed the testing currently being tested, tag is used by OTP internal CI label Nov 25, 2022
@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from 1b57a7f to b5fee1a Compare November 28, 2022 05:25
@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Nov 28, 2022
@bjorng
Copy link
Contributor Author

bjorng commented Nov 28, 2022

I have now fixed the other bugs and updated this pull request.

@bjorng bjorng requested a review from jhogberg December 1, 2022 12:47
@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from c75d19f to 1a0b7c7 Compare December 5, 2022 09:10
@bjorng bjorng linked an issue Dec 5, 2022 that may be closed by this pull request
@bjorng bjorng mentioned this pull request Dec 5, 2022
@bjorng
Copy link
Contributor Author

bjorng commented Dec 5, 2022

@rvirding Do you have any opinions about this pull request and the = operator?

In particular, what do you think about that in bodies, multiple = operators will be evaluated from right to left? As an example, if the matching is defined to be right to left, that will make the following example legal:

baz(M) ->
    #{K := V} = #{k := K} = M,
    V.

I am asking because we have had some discussions in the OTP team as a preparation for an OTP Technical Board meeting we will have later this week.

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch 2 times, most recently from 2b450c8 to 4b1b796 Compare December 9, 2022 08:25
@bjorng
Copy link
Contributor Author

bjorng commented Dec 9, 2022

The OTP Technical Board has now approved this pull request.

I have updated the documentation to clarify the semantics of the match operator and the compound pattern operator (a new name invented by @RaimoNiskanen in order to more easily distinguish the two flavors of the = operator).

After the updates of the documentation have been reviewed, I hope to merge this pull request next week.

Copy link
Contributor

@RobinMorisset RobinMorisset left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for your work on this!

2> <input>F = fun({A, B} = E) -> {E, A + B} end, F({1,2}).</input>
{{1,2},3}
3> <input>G = fun(&lt;&lt;A:8,B:8>> = &lt;&lt;C:16>>) -> {A, B, C} end, G(&lt;&lt;42,43>>).</input>
{42,42,10795}</pre>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the second element be 43 here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should. I changed example to make it clearer and changed it manually instead of copy-pasting the whole thing from the output of the shell.

<p>If the matching succeeds, any unbound variable in the pattern
becomes bound and the value of <c>Expr2</c> is returned.</p>
becomes bound and the value of <c>Expr</c> is returned.</p>
<p>If multiple match operators are applied in sequence, they will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd maybe add that even if there is a single match operator, its right hand side will be executed before the pattern on its left hand side is checked. So for example the following is valid:

<<X:Y>> = begin Y = 8, <<42:8>> end

And it correctly binds X to 42, as Y is bound to 8 before the validity of the pattern on the left hand side is checked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add that clarification and an example.

@RaimoNiskanen RaimoNiskanen self-requested a review December 9, 2022 11:12
Copy link
Contributor

@RaimoNiskanen RaimoNiskanen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation changes looks very good to me

@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from 4b1b796 to fe049b3 Compare December 9, 2022 13:58
Copy link
Contributor

@rickard-green rickard-green left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc changes looks good to me too 👍

There has always been an implementation limitation for matching
of binaries (for technical reasons). For example:

    foo(Bin) ->
        <<A:8>> = <<X:4,Y:4>> = Bin,
        {A,X,Y}.

This would fail to compile with the following message:

    t.erl:5:5: binary patterns cannot be matched in parallel using '='
    %    5|     <<A:8>> = <<X:4,Y:4>> = Bin,
    %     |     ^

This commit lifts this restriction, making the example legal.

A restriction for map matching is also lifted, but before we can
describe that, we'll need a digression to talk about the `=` operator.

The `=` operator can be used for two similar but slightly differently
purposes.

When used in a pattern in a clause, for example in a function head,
both the left-hand and right-hand side operands must be patterns:

    Pattern1 = Pattern2

For example:

    bar(#{a := A} = #{b := B}) -> {A, B}.

The following example will not compile because the right-hand side
is not a pattern but an expression:

    wrong(#{a := A} = #{b => B}) -> {A, B}.

    t.erl:4:23: illegal pattern
    %    4| wrong(#{a := A} = #{b => B}) -> {A, B}.
    %     |                       ^

Used in this context, the `=` operator does not imply that the two
patterns are matched in any particular order. Attempting to use a
variable matched out on the left-hand side on the right-hand side, or
vice versa, will fail:

    also_wrong1(#{B := A} = #{b := B}) -> {A,B}.
    also_wrong2(#{a := A} = #{A := B}) -> {A,B}.

    t.erl:6:15: variable 'B' is unbound
    %    6| also_wrong1(#{B := A} = #{b := B}) -> {A,B}.
    %     |               ^

    t.erl:7:27: variable 'A' is unbound
    %    7| also_wrong2(#{a := A} = #{A := B}) -> {A,B}.
    %     |                           ^

The other way to use `=` is in a function body. Used in this way,
the right-hand side must be an expression:

    Pattern = Expression

For example:

    foobar(Value) ->
        #{a := A} = #{a => Value},
        A.

Used in this context, the right-hand side of `=` must **not** be a pattern:

    illegal_foobar(Value) ->
        #{a := A} = #{a := Value},
        A.

    t.erl:18:21: only association operators '=>' are allowed in map construction
    %   18|     #{a := A} = #{a := Value},
    %     |                     ^

When used in a body context, the value of the `=` operator is the
value of its right-hand side operand. When multiple `=` operators are
combined, they are evaluted from right to left. That means that any
number of patterns can be matched at once:

    Pattern1 = Pattern2 = ... = PatternN = Expr

which is equivalent to:

    Var = Expr
    PatternN = Var
       .
       .
       .
    Pattern2 = Var
    Pattern1 = Var

Given that there is a well-defined evaluation order from right to
left, one would expect that the following example would be legal:

    baz(M) ->
        #{K := V} = #{k := K} = M,
        V.

It is not. In Erlang/OTP 25 or earlier, the compilation fails with the
following message:

    t.erl:28:7: variable 'K' is unbound
    %   28|     #{K := V} = #{k := K} = M,
    %     |       ^

That restriction is now lifted, making the example legal.

Closes erlang#6348
Closes erlang#6444
Closes erlang#6467
@bjorng bjorng force-pushed the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch from fe049b3 to 6a9ebe6 Compare December 13, 2022 11:36
@bjorng bjorng merged commit 02384cf into erlang:master Dec 13, 2022
@bjorng bjorng deleted the bjorn/compiler/parallel-match/GH-6348/OTP-18297 branch December 13, 2022 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
6 participants