Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance documentation for kron() #28697

Merged
merged 4 commits into from
Aug 5, 2019
Merged

Conversation

amilsted
Copy link
Contributor

Due to column-major ordering, using kron() together with reshape() can produce counter-intuitive results. Document this to avoid unnecessary pain.

Due to column-major ordering, using kron() together with  reshape() can produce counter-intuitive results. Document this to avoid unnecessary pain.
@amilsted
Copy link
Contributor Author

This behavior (which I realize is the same as Matlab's) has bitten myself and a number of colleagues who use Julia to do quantum-mechanics computations.

@fredrikekre
Copy link
Member

fredrikekre commented Aug 16, 2018

How is this related to kron? kron just creates another matrix. If the resulting order of reshape is unclear I suggest we clarify that documentation instead.

@ViralBShah
Copy link
Member

Unrelated, but it would be cool to have a compact representation of kronecker products in a type that just shares the inputs and overloads the indexing.

@amilsted
Copy link
Contributor Author

amilsted commented Aug 16, 2018

How is this related to kron?

It's related to kron insofar as it is an issue with the combined use of kron and reshape.

If the resulting order of reshape is unclear I suggest we clarify that documentation instead.

Perhaps that would be better. The reason I chose to put this in the kron documentation is that I do not know of any stdlib function, except kron, that conflicts with reshape in this way. I expect many people who use reshape would not naturally encounter kron, whereas it is natural (at least in some contexts) to reshape an array after a kron to retain separate access to the indices of the two factors (as in the example I added).

Furthermore, it is made pretty clear in the docs that arrays are stored column-major, so for me the behavior of reshape is expected. It was the behavior of kron that I found counter intuitive, particularly coming from python/numpy, where the definition of kron matches the row-major reshape.

...Actually, I secretly want to change the definition of kron so that the argument ordering matches index ordering under reshapes, but I imagine people would get upset about this not being the usual definition of the Kronecker product (although I think it is an equally valid one), particularly if moving data between row-major and column-major languages, so I won't suggest that! Perhaps a new function ckron(A,B) = kron(B,A) (column-major kron)?!

@fredrikekre
Copy link
Member

How is this related to kron?

It's related to kron insofar as it is an issue with the combined use of kron and reshape.

But... Should we add this to all functions then? If we combine + and reshape its the same thing, or * and reshape?

@amilsted
Copy link
Contributor Author

amilsted commented Aug 16, 2018

But... Should we add this to all functions then? If we combine + and reshape its the same thing, or * and reshape?

Could you explain how combining + or * and reshape results in the same ordering issue? I don't see it. I think the issue is quite specific to kron, as I tried to illustrate in the doc example:

Given

A = rand(2,2); B = rand(2,2);
AB = reshape(kron(A, B), (2,2,2,2));

you'll find the following is true

all(AB[b1,a1,b2,a2] == A[a1,a2] * B[b1,b2] for a1=1:2, a2=1:2, b1=1:2, b2=1:2)

and hence in general

AB[a1,b1,a2,b2] != A[a1,a2] * B[b1,b2]

It's all about the ordering of the indices for AB. Perhaps a vector example is easier:

A = rand(3); B = rand(3);
AB = reshape(kron(A, B), (3,3));
all(AB[b,a] == A[a] * B[b] for a=1:3, b=1:3)

evaluates to true, whereas I would have naively expected AB[a,b] == A[a] * B[b] (as it is with numpy).

@garrison
Copy link
Sponsor Member

Pinging @Jutho, whose thoughts could be valuable here.

@Jutho
Copy link
Contributor

Jutho commented Aug 16, 2018

Despite sharing @amilsted regrets that this does not combine well with how we think of it in the context of quantum mechanics, I think the current definition is just the one and only standard definition of the Kronecker product: https://en.wikipedia.org/wiki/Kronecker_product , i.e. the indices of the B matrix move the fastest. I don't think this has anything to do with row or column major, so I don't know how Python's definition could be different.

In fact, even in quantum information this definition in its current form is useful (as was told to me by Frank, @amilsted knows who I mean), the reason being that it is natural to have a standard basis which is ordered such that the rightmost bit moves the fastest:
|000>, |001>, |010>, |011>, |100>, |101>, |110>, |111>.

In the context of tensor networks, I really never use kron and just make a dedicated tensorproduct function that also just keeps the indices separate, so that an extra reshape is not necessary. For that, I am happy that julia does not immediately bind the tensor product symbol to kron.

@amilsted
Copy link
Contributor Author

amilsted commented Aug 16, 2018

Hi @Jutho, thanks for the comments. I suppose it wasn't clear that I am not requesting a change to the definition of kron here (and indeed the numpy definition is that same as Julia's). I was just voicing a "secret" desire 😄. For the record, if it weren't for the potential confusion regarding the standard definition, I still think the column-major version of kron is perfectly valid, although it should perhaps be called tensor_product(). My argument is that I shouldn't have to care about memory ordering to separate the tensor product factors sensibly. To use your example, right now if I do kron(psi1, psi2, psi3) and reshape the result to (2,2,2) I get psi[i3,i2,i1] == psi1[i1]*psi2[i2]*psi3[i3], which jives with the standard QM notation |i1,i2,i2> = |i1>⊗|i2>⊗|i3>. The alternative to tensor_product(A,B,C) = kron(C,B,A) would of course be to define a row-major reshape, but that would require permutation!

Anyway, what I actually want right now is to get something like the documentation change of this PR into Julia in order to warn people of the potential confusion. Do you have any thoughts on that? Do you think this belongs in the docstring for kron or for reshape, or somewhere else, or not at all?

@garrison
Copy link
Sponsor Member

the one and only standard definition of the Kronecker product

Interesting -- this is actually news to me. I personally had not considered this definition to be the one and only standard, as I seem to recall noticing both possible conventions bring used in the literature. A quick search brings me to this StackExchange post which justifies the use of either convention. (Incidentially, it was Tony Zee who taught my quantum field theory class, so maybe this is where I first began to believe there is no standard convention...)

@amilsted
Copy link
Contributor Author

the one and only standard definition of the Kronecker product

Interesting -- this is actually news to me.

I think we need to distinguish between the definition of the tensor product and the definition of the Kronecker product (considered as a particular implementation of the former for matrices). The Kronecker product seems to be pretty much standardized.

@Jutho
Copy link
Contributor

Jutho commented Aug 16, 2018

Probably I was too quick there. Wikipedia is my go to. I prefer to call this tensor product anyway, and then indeed there is no convention. Kronecker product is (in my mind) just some specific matrix operation, that happens to coincide with tensor product, if the index order is indeed correctly chosen etc. A bit similar to how I would not call elementswise multiplication of arbitrary multidimensional arrays, i.e. .* in julia, a Hadamard product, even though the latter is indeed a specific version of the former, specifically for matrices. But that's just my take.

@Jutho
Copy link
Contributor

Jutho commented Aug 16, 2018

@amilsted , you beat me to it. Anyway, for the docs, I agree that it is reasonable to expect AB[a1,b1,a2,b2] == A[a1,a2] * B[b1,b2], but I don't know how many people will think about it in this way. Probably its only a small minority (e.g. tensor network people), and they already know about this.

Thus, I have no strong feelings either way.

@amilsted
Copy link
Contributor Author

@Jutho Yes, I expect you're right that it's a small minority, but I know from experience that they do not all already know about this. Probably coming from Matlab it is clear, but coming from python/numpy it may be unexpected, and I know people, including myself, who have occasionally wasted time tracking down mistakes due to this. 🙄

Given that the harm is practically nonexistent (a slightly longer kron docstring) and assuming there are some benefits for some scientific users (perhaps not even such a small proportion of those using kron?), I believe that a docstring addition is a good option.

Perhaps the simplified, vector version of the example is better than the matrix version? (see above)

PS: If anyone ever adds a column-wise "kronecker" product, can you call it nork(A,B) = kron(B,A)?

@Jutho
Copy link
Contributor

Jutho commented Aug 17, 2018

Indeed a vector example could be useful, in that it also connects to fields like continuum mechanics etc, where they often use the notation v ⊗ w to mean outer product v * w'. This too, leads toreshape(kron(v,w),...) != v*w', whereas it would match with nork :).

@fredrikekre
Copy link
Member

I am still confused about this, there is nothing magical about the combination of kron and reshape. If you caught of guard about how julias kron operates we should clarify that, and if you caught of guard by the way reshape orders things we should clarify that.

@Jutho
Copy link
Contributor

Jutho commented Aug 17, 2018

I think it boils down to this (same example as previously):

For two vectors v of length m and w of length n, in some fields, v ⊗ w yields a matrix of size (m,n) with entries (i,j)-> v[i]*w[j], i.e. the outer product or, more generally, tensor product.

In other fields, v ⊗ w yields a big vector of length m*n with entries v[div(k-1,n)+1]*w[mod1(k,n)], known as the kronecker product (although it is more often used for the matrices/linear operators acting on those vectors). These two concepts are often considered equivalent, and clearly they are isomorphic.

Hence, you would expect that reshape is the natural isomorphism between these two concepts in a programming language. Unfortunately, that doesn't work with the default action of reshape in a column major world.

This shows that it is really between the non-expected interplay between kron(v,w), v * w' and reshape. Each operation in itself brings no surprise.

Use two vectors instead of two matrices to illustrate the reshape(kron()) behavior. This also has the benefit of providing a vector example for kron().
@amilsted
Copy link
Contributor Author

Thanks @Jutho for the clarification! I simplified the example in the docstring.

To restate my position, I believe that mentioning this subtlety in the docs is useful for certain groups of people, especially if they have a python/numpy background. I also think it is easy for everybody else to disregard the sort of paragraph I propose adding.

Regarding where in the docs to mention the issue, I still think the kron docstring is the best place, since I think the proportion of kron users who will care about this is probably far higher than the proportion of reshape users.

PS: I am of course open to suggestions for improving the wording of the docstring!

@ViralBShah
Copy link
Member

I have to say that as someone who was not familiar with this issue before, that docstring would be extremely confusing. It is also unclear what the example is demonstrating. To me, the kron followed by reshape do exactly what I expect.

If it helps certain people, I am ok with including this - but then the example needs to include the detail of the kind @Jutho mentions and perhaps even pointers to wikipedia or other such sources.

@ViralBShah
Copy link
Member

Maybe one way to explain this might be to skip the text about the confusion, and just state what the ordering is and what identity holds (with that example).

@KristofferC
Copy link
Sponsor Member

Using "counter-intuitive" here is subjective and in my opinion not true. These functions have quite simple behavior so I don't see how composing them could have "counter-intuitive" behavior.

If this is about peoples expectations coming from another language we could add it to the "language differences" section or something, but in my opinion, something like this shouldn't be in the docstring for the function.

@amilsted
Copy link
Contributor Author

Ok, I concede that the "counter-intuitiveness" is very background dependent! I like @ViralBShah's idea of removing that part of the text. Perhaps just a reshape(kron()) example in the kron docstring to illustrate the identity in the case of vectors, without much further comment? Something like:

julia> A = [1 2 3]; B = [4 5 6];

julia> AB = reshape(kron(A,B), (3,3))
3×3 Array{Int64,2}:
 4   8  12
 5  10  15
 6  12  18

julia> all(AB[b,a] == A[a] * B[b] for a=1:3, b=1:3)
true

Printing the reshape result also illustrates the structure of the kron operation on vectors nicely, I think.

@ViralBShah
Copy link
Member

ViralBShah commented Aug 19, 2018

Perhaps this should be a more general example with matrices?

@Jutho
Copy link
Contributor

Jutho commented Aug 20, 2018

One could also add a sentence:

For vectors v and w, the Kronecker product is related to the outer product by kron(v,w) = vec(w*transpose(v)) or w*transpose(v) = reshape(kron(v,w), (length(w), length(v))).

I think this is a purely objective statement. I think it is useful, as it provides a further definition for the more esoteric Kronecker product in terms of the more familiar outer product, and explains how this well-known equivalence is established in Julia.

The relationship between the outer product, and the Kronecker product, of two vectors illustrates the argument-ordering differences due to vec/reshape and column-major storage.
@amilsted
Copy link
Contributor Author

I like this. Have updated the PR... I also sneakily added a sentence pointing out the ordering differences:

Note how the ordering of v and w differs on the left and right
of these expressions (due to column-major storage).

Although the reader can obviously discover this themselves, I think it's nice to point it out (without judging this time!).

@amilsted
Copy link
Contributor Author

Is this an acceptable solution for everyone? The complete docstring now reads:

Kronecker tensor product of two vectors or two matrices.

For vectors v and w, the Kronecker product is related to the outer product by
kron(v,w) == vec(w*transpose(v))
or
w*transpose(v) == reshape(kron(v,w), (length(w), length(v))).
Note how the ordering of v and w differs on the left and right
of these expressions (due to column-major storage).

Examples

julia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
 1  2
 3  4

julia> B = [im 1; 1 -im]
2×2 Array{Complex{Int64},2}:
 0+1im  1+0im
 1+0im  0-1im

julia> kron(A, B)
4×4 Array{Complex{Int64},2}:
 0+1im  1+0im  0+2im  2+0im
 1+0im  0-1im  2+0im  0-2im
 0+3im  3+0im  0+4im  4+0im
 3+0im  0-3im  4+0im  0-4im

julia> v = [1, 2]; w = [3, 4, 5];

julia> w*transpose(v)
3×2 Array{Int64,2}:
 3   6
 4   8
 5  10

julia> reshape(kron(v,w), (length(w), length(v)))
3×2 Array{Int64,2}:
 3   6
 4   8
 5  10

@amilsted
Copy link
Contributor Author

amilsted commented Sep 3, 2018

Since there have been no further comments, can I assume this is ok?

@CNOT
Copy link

CNOT commented Sep 8, 2018

After almost a week of head-scratching and confusion I'm glad that an answer to my question on Discourse lead me to this discussion.

@jebej
Copy link
Contributor

jebej commented Sep 8, 2018

To chime in in support of ordering tensor products rightmost "first", here is a short arxiv note I found recently about this: https://arxiv.org/pdf/1711.02086.pdf

@Jutho
Copy link
Contributor

Jutho commented Sep 8, 2018

Since there have been no further comments, can I assume this is ok?

I would say yes, and the fact that this lead to another question on Discourse shows that such an addendum to the help string is useful. In fact, maybe the matrix example was also illustrative and could be added as well. The more examples and doctests the better?

@ranocha
Copy link
Member

ranocha commented Aug 5, 2019

I stumbled upon the (for me unexpected behaviour) of the combination of kron and reshape for Vectors a few minutes ago and would like to see this PR merged. For me, the example using Vectors is absolutely sufficient and nice.
Are there any remaining reasons against this PR?

@ViralBShah
Copy link
Member

This is a strict improvement and I am merging it.

@ViralBShah ViralBShah merged commit f928246 into JuliaLang:master Aug 5, 2019
@ViralBShah ViralBShah added domain:docs This change adds or pertains to documentation domain:linear algebra Linear algebra labels Aug 5, 2019
mbauman added a commit that referenced this pull request Aug 16, 2019
but not its methods as you should typically use `view` to construct it.  Small part of #28697.
mbauman added a commit that referenced this pull request Nov 1, 2019
* Document the SubArray type

but not its methods as you should typically use `view` to construct it.  Small part of #28697.

* Update base/subarray.jl

Co-Authored-By: Tim Holy <tim.holy@gmail.com>

* Update base/subarray.jl

Co-Authored-By: Tim Holy <tim.holy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation domain:linear algebra Linear algebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants