-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance documentation for kron() #28697
Conversation
Due to column-major ordering, using kron() together with reshape() can produce counter-intuitive results. Document this to avoid unnecessary pain.
This behavior (which I realize is the same as Matlab's) has bitten myself and a number of colleagues who use Julia to do quantum-mechanics computations. |
How is this related to |
Unrelated, but it would be cool to have a compact representation of kronecker products in a type that just shares the inputs and overloads the indexing. |
It's related to kron insofar as it is an issue with the combined use of kron and reshape.
Perhaps that would be better. The reason I chose to put this in the kron documentation is that I do not know of any stdlib function, except kron, that conflicts with reshape in this way. I expect many people who use reshape would not naturally encounter kron, whereas it is natural (at least in some contexts) to reshape an array after a kron to retain separate access to the indices of the two factors (as in the example I added). Furthermore, it is made pretty clear in the docs that arrays are stored column-major, so for me the behavior of reshape is expected. It was the behavior of kron that I found counter intuitive, particularly coming from python/numpy, where the definition of kron matches the row-major reshape. ...Actually, I secretly want to change the definition of kron so that the argument ordering matches index ordering under reshapes, but I imagine people would get upset about this not being the usual definition of the Kronecker product (although I think it is an equally valid one), particularly if moving data between row-major and column-major languages, so I won't suggest that! Perhaps a new function |
But... Should we add this to all functions then? If we combine |
Could you explain how combining + or * and reshape results in the same ordering issue? I don't see it. I think the issue is quite specific to kron, as I tried to illustrate in the doc example: Given
you'll find the following is true
and hence in general
It's all about the ordering of the indices for AB. Perhaps a vector example is easier:
evaluates to |
Pinging @Jutho, whose thoughts could be valuable here. |
Despite sharing @amilsted regrets that this does not combine well with how we think of it in the context of quantum mechanics, I think the current definition is just the one and only standard definition of the Kronecker product: https://en.wikipedia.org/wiki/Kronecker_product , i.e. the indices of the B matrix move the fastest. I don't think this has anything to do with row or column major, so I don't know how Python's definition could be different. In fact, even in quantum information this definition in its current form is useful (as was told to me by Frank, @amilsted knows who I mean), the reason being that it is natural to have a standard basis which is ordered such that the rightmost bit moves the fastest: In the context of tensor networks, I really never use |
Hi @Jutho, thanks for the comments. I suppose it wasn't clear that I am not requesting a change to the definition of kron here (and indeed the numpy definition is that same as Julia's). I was just voicing a "secret" desire 😄. For the record, if it weren't for the potential confusion regarding the standard definition, I still think the column-major version of kron is perfectly valid, although it should perhaps be called Anyway, what I actually want right now is to get something like the documentation change of this PR into Julia in order to warn people of the potential confusion. Do you have any thoughts on that? Do you think this belongs in the docstring for |
Interesting -- this is actually news to me. I personally had not considered this definition to be the one and only standard, as I seem to recall noticing both possible conventions bring used in the literature. A quick search brings me to this StackExchange post which justifies the use of either convention. (Incidentially, it was Tony Zee who taught my quantum field theory class, so maybe this is where I first began to believe there is no standard convention...) |
I think we need to distinguish between the definition of the tensor product and the definition of the Kronecker product (considered as a particular implementation of the former for matrices). The Kronecker product seems to be pretty much standardized. |
Probably I was too quick there. Wikipedia is my go to. I prefer to call this tensor product anyway, and then indeed there is no convention. Kronecker product is (in my mind) just some specific matrix operation, that happens to coincide with tensor product, if the index order is indeed correctly chosen etc. A bit similar to how I would not call elementswise multiplication of arbitrary multidimensional arrays, i.e. |
@amilsted , you beat me to it. Anyway, for the docs, I agree that it is reasonable to expect Thus, I have no strong feelings either way. |
@Jutho Yes, I expect you're right that it's a small minority, but I know from experience that they do not all already know about this. Probably coming from Matlab it is clear, but coming from python/numpy it may be unexpected, and I know people, including myself, who have occasionally wasted time tracking down mistakes due to this. 🙄 Given that the harm is practically nonexistent (a slightly longer kron docstring) and assuming there are some benefits for some scientific users (perhaps not even such a small proportion of those using kron?), I believe that a docstring addition is a good option. Perhaps the simplified, vector version of the example is better than the matrix version? (see above) PS: If anyone ever adds a column-wise "kronecker" product, can you call it |
Indeed a vector example could be useful, in that it also connects to fields like continuum mechanics etc, where they often use the notation |
I am still confused about this, there is nothing magical about the combination of |
I think it boils down to this (same example as previously): For two vectors In other fields, Hence, you would expect that This shows that it is really between the non-expected interplay between |
Use two vectors instead of two matrices to illustrate the reshape(kron()) behavior. This also has the benefit of providing a vector example for kron().
Thanks @Jutho for the clarification! I simplified the example in the docstring. To restate my position, I believe that mentioning this subtlety in the docs is useful for certain groups of people, especially if they have a python/numpy background. I also think it is easy for everybody else to disregard the sort of paragraph I propose adding. Regarding where in the docs to mention the issue, I still think the kron docstring is the best place, since I think the proportion of kron users who will care about this is probably far higher than the proportion of reshape users. PS: I am of course open to suggestions for improving the wording of the docstring! |
I have to say that as someone who was not familiar with this issue before, that docstring would be extremely confusing. It is also unclear what the example is demonstrating. To me, the kron followed by reshape do exactly what I expect. If it helps certain people, I am ok with including this - but then the example needs to include the detail of the kind @Jutho mentions and perhaps even pointers to wikipedia or other such sources. |
Maybe one way to explain this might be to skip the text about the confusion, and just state what the ordering is and what identity holds (with that example). |
Using "counter-intuitive" here is subjective and in my opinion not true. These functions have quite simple behavior so I don't see how composing them could have "counter-intuitive" behavior. If this is about peoples expectations coming from another language we could add it to the "language differences" section or something, but in my opinion, something like this shouldn't be in the docstring for the function. |
Ok, I concede that the "counter-intuitiveness" is very background dependent! I like @ViralBShah's idea of removing that part of the text. Perhaps just a reshape(kron()) example in the kron docstring to illustrate the identity in the case of vectors, without much further comment? Something like: julia> A = [1 2 3]; B = [4 5 6];
julia> AB = reshape(kron(A,B), (3,3))
3×3 Array{Int64,2}:
4 8 12
5 10 15
6 12 18
julia> all(AB[b,a] == A[a] * B[b] for a=1:3, b=1:3)
true
Printing the reshape result also illustrates the structure of the kron operation on vectors nicely, I think. |
Perhaps this should be a more general example with matrices? |
One could also add a sentence: For vectors I think this is a purely objective statement. I think it is useful, as it provides a further definition for the more esoteric Kronecker product in terms of the more familiar outer product, and explains how this well-known equivalence is established in Julia. |
The relationship between the outer product, and the Kronecker product, of two vectors illustrates the argument-ordering differences due to vec/reshape and column-major storage.
I like this. Have updated the PR... I also sneakily added a sentence pointing out the ordering differences:
Although the reader can obviously discover this themselves, I think it's nice to point it out (without judging this time!). |
Is this an acceptable solution for everyone? The complete docstring now reads: Kronecker tensor product of two vectors or two matrices. For vectors v and w, the Kronecker product is related to the outer product by Examplesjulia> A = [1 2; 3 4]
2×2 Array{Int64,2}:
1 2
3 4
julia> B = [im 1; 1 -im]
2×2 Array{Complex{Int64},2}:
0+1im 1+0im
1+0im 0-1im
julia> kron(A, B)
4×4 Array{Complex{Int64},2}:
0+1im 1+0im 0+2im 2+0im
1+0im 0-1im 2+0im 0-2im
0+3im 3+0im 0+4im 4+0im
3+0im 0-3im 4+0im 0-4im
julia> v = [1, 2]; w = [3, 4, 5];
julia> w*transpose(v)
3×2 Array{Int64,2}:
3 6
4 8
5 10
julia> reshape(kron(v,w), (length(w), length(v)))
3×2 Array{Int64,2}:
3 6
4 8
5 10
|
Since there have been no further comments, can I assume this is ok? |
After almost a week of head-scratching and confusion I'm glad that an answer to my question on Discourse lead me to this discussion. |
To chime in in support of ordering tensor products rightmost "first", here is a short arxiv note I found recently about this: https://arxiv.org/pdf/1711.02086.pdf |
I would say yes, and the fact that this lead to another question on Discourse shows that such an addendum to the help string is useful. In fact, maybe the matrix example was also illustrative and could be added as well. The more examples and doctests the better? |
I stumbled upon the (for me unexpected behaviour) of the combination of |
This is a strict improvement and I am merging it. |
but not its methods as you should typically use `view` to construct it. Small part of #28697.
* Document the SubArray type but not its methods as you should typically use `view` to construct it. Small part of #28697. * Update base/subarray.jl Co-Authored-By: Tim Holy <tim.holy@gmail.com> * Update base/subarray.jl Co-Authored-By: Tim Holy <tim.holy@gmail.com>
Due to column-major ordering, using kron() together with reshape() can produce counter-intuitive results. Document this to avoid unnecessary pain.