Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Indexing, Slicing, Joining, Mutating Ops #35

Merged
merged 6 commits into from
Jul 2, 2023

Conversation

davoclavo
Copy link
Contributor

@davoclavo davoclavo commented Jun 28, 2023

Implement more Indexing, Slicing, Joining, Mutating Ops from https://pytorch.org/docs/stable/torch.html#indexing-slicing-joining-mutating-ops

TODO

  • Add implementations
  • Add tests
  • Add documentation using the guide™️
  • BONUS: I think testing this will require a lot of nested tensors, so it would be good to attempt to handle tensor creation for nested arrays
  • Handle Int | Seq(Int) for some operations

For later work on a separate PR:

  • Handle string inputs for indexReduce and scatterReduce - perhaps as an Enum?
  • Implement missing scatter ops

Apparently the complement-property testing was broken, which is useful to test
if the tensor dtypes in tensor operations are narrowed just enough, but not too
much.

With this change the generator can generate all the DTypes (which is useful for
complement-property testing) or filter them if you want to do just property testing
@davoclavo
Copy link
Contributor Author

davoclavo commented Jul 1, 2023

Quick update!

  1. I was able to implement a way to create 2D and 3D tensors using nested Seq + shapeless. It is a bit hacky, but it does the job for now (using a combo of TypeCase + Tensor.view + Seq.flatten + asInstanceOf[Seq[U]]) (source here) - I hope this is fine for now, until we find a more robust solution for N-dimension arrays.

    • On this topic: I was thinking it would be cool to try to model a lot of Seq-related inputs as Tuples, that way we could type check nested things (such as the N-dimensional tuple creation), and another cool thing is that we could pass arguments more similar to python lists/tuples []/(), such as: Tensor(((1,2,3), (4,5,6))
  2. A lot of documentation was added, however I am not quite sure to test if it renders properly. Could I get some pointers on could I do that?

    • Update: I think I found it looking at .github/workflows/ci.yml. it seems like doing sbt doc for api docs, and sbt docs/tlSite for the website
  3. Some changes that I will defer to later PRs, already marked them as TODOs in the file

    • Most of the scatter operations were not implemented as they are a bit more complex to document and implement tests for.
    • Handle Int | Seq[Int] for a lot of operations that could handle it.
  4. Some examples used some not-yet-implemented functions, so I added them:

    • Tensor class (gt|>, gte|>=, lt|<, lte|<=, isConj, adjoint, and also made == an alias to eq)
    • Added torch.argsort
    • Tried to add torch.complex to CreationOps, but it seems javacpp does not yet support torch.complex creation - I found a workaround for the example in the docs that used it, so we are good for now

@davoclavo davoclavo marked this pull request as ready for review July 1, 2023 01:34
@davoclavo
Copy link
Contributor Author

Oh wow the examples in the docs are being compiled, that's awesome! will work on fixing them later today

- Add documentation
- Add tests
- Improve some implementations to handle more scenarios
@sbrunk
Copy link
Owner

sbrunk commented Jul 2, 2023

Looks great, thank you!

  1. I was able to implement a way to create 2D and 3D tensors using nested Seq + shapeless. It is a bit hacky, but it does the job for now (using a combo of TypeCase + Tensor.view + Seq.flatten + asInstanceOf[Seq[U]]) - I hope this is fine for now, until we find a more robust solution for N-dimension arrays.

I think more deeply nested seqs are quite rare, so I agree this is fine. If we want to support this in a more generic way at some point, perhaps https://maximekjaer.github.io/tf-dotty/ is worth a look.

  • On this topic: I was thinking it would be cool to try to model a lot of Seq-related inputs as Tuples, that way we could type check nested things (such as the N-dimensional tuple creation), and another cool thing is that we could pass arguments more similar to python lists/tuples []/(), such as: Tensor(((1,2,3), (4,5,6))

Good point. I actually had the same thought a while ago. We need to ensure that the tuple only contains ints, though. One idea is to either have something like type IntTuple = Int | (Int, Int) | (Int, Int, Int, Int) ... up to Tuple22, or we use the new Tuple type and constrain that generically to ints. It is possible, I've discussed it on the Scala Discord a while ago, but needs an implicit as far as I remember. I'll try to find it...

  1. A lot of documentation was added, however I am not quite sure to test if it renders properly. Could I get some pointers on could I do that?
  • Update: I think I found it looking at .github/workflows/ci.yml. it seems like doing sbt doc for api docs, and sbt docs/tlSite for the website

Yeah apologies, I need to include that in the contributor docs. You can also run sbt docs/tlSitePreview to run a preview server. See https://typelevel.org/sbt-typelevel/site.html for details.

@sbrunk
Copy link
Owner

sbrunk commented Jul 2, 2023

Oh wow the examples in the docs are being compiled, that's awesome! will work on fixing them later today

I was hoping to have mdoc like output rendering in Scaladoc as well, but that's not supported yet. But having compilation errors checked is already helpful.

@sbrunk sbrunk merged commit 19abf3e into sbrunk:main Jul 2, 2023
@davoclavo
Copy link
Contributor Author

Awesome, thanks for reviewing and merging it!

I think more deeply nested seqs are quite rare, so I agree this is fine. If we want to support this in a more generic way at some point, perhaps https://maximekjaer.github.io/tf-dotty/ is worth a look.

Good idea, will do! After building some more complex models, I realized the pain for named tensors/shape checking, so will take a look at that repo, and also some other resources I've found. Here is a compiled list of them:

Good point. I actually had the same thought a while ago. We need to ensure that the tuple only contains ints, though. One idea is to either have something like type IntTuple = Int | (Int, Int) | (Int, Int, Int, Int) ... up to Tuple22, or we use the new Tuple type and constrain that generically to ints. It is possible, I've discussed it on the Scala Discord a while ago, but needs an implicit as far as I remember. I'll try to find it...

Would love to see that discussion. After some exploration, chatgpt prompts, and bouncing this with some friends, here are two solutions that are usable to guarantee a Tuple only has Ints (a bit more work needs to be done to get it to work with generic types, such as a union type like ScalaType)

  1. Using Tuple.Union to reduce all the types into one, and then doing an equivalence/subtype check:
@ def createTensor[T <: Tuple](tup: T)(using Tuple.Union[T] <:< Int) = tup.toList
defined function createTensor

@ createTensor((1,2,3))
res97: List[Int] = List(1, 2, 3)

@ createTensor((1,2,"3"))
-- [E172] Type Error: cmd98.sc:1:33 --------------------------------------------
1 |val res99 = createTensor((1,2,"3"))
  |                                 ^
  |                Cannot prove that Tuple.Union[(Int, Int, String)] <:< Int.
Compilation Failed
  1. Using type matches with recursion and DummyImplicit:
@ type IntTuple[T <: Tuple] = T match
        case Int *: t => IntTuple[t]
        case EmptyTuple => DummyImplicit

defined type IntTuple

@ def createTensor[T <: Tuple](tup: T)(using IntTuple[T]) = tup.toList
defined function createTensor

@ createTensor(1,2,3)
res101: List[Int] = List(1, 2, 3)

@ createTensor(1,2,"3")
-- Error: cmd102.sc:1:25 -------------------------------------------------------
1 |val res102 = createTensor(1,2,"3")
  |                         ^
  |     Match type reduction failed since selector  String *: EmptyTuple.type
  |     matches none of the cases
  |
  |         case Int *: t => ammonite.$sess.cmd99.IntTuple[t]
  |         case EmptyTuple => DummyImplicit
Compilation Failed

@davoclavo davoclavo deleted the indexing-ops branch July 2, 2023 18:57
@sbrunk
Copy link
Owner

sbrunk commented Jul 2, 2023

Interesting. I did not know the Tuple.Union solution.

I found the discussion. After some ideas, I found a working solution, similar to yours above:

type HomogeneousTuple[T <: Tuple, U] = T match
  case EmptyTuple =>  EmptyTuple
  case _ *: t => U *: IntTuple[t]
type IntTuple[T <: Tuple] = HomogeneousTuple[T, Int]
def f[T <: Tuple](x: T)(using T =:= IntTuple[T]) = x
f(1,2,3) // Tuple3[Int, Int, Int] = (1,2,3)
f(1,"a") // Cannot prove that (Int, String) =:= IntTuple[(Int, String)]

And then people improved upon it:

type ContainsOnly[U] = [T] =>> T <:< HomogeneousTuple[T, U]
def f[T: ContainsOnly[Int]](x: T) = x

Someone proposed a very consice version via Tuple.Map

type ContainsOnly[U] = [T <: Tuple] =>> T <:< Tuple.Map[T, [X] =>> Int]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants