Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excuse me,the DataSet class and DataLoader class not found in javacpp pytorch? #1215

Closed
mullerhai opened this issue Aug 23, 2022 · 20 comments
Closed
Assignees

Comments

@mullerhai
Copy link

Hi:
I want to read some train data from load disk or hadoop hdfs,but I not found DataSet and DataLoader, Only the Mnist relate dataset dataloader class,I not know which dataread class to inherit ?

@saudet
Copy link
Member

saudet commented Aug 23, 2022

That's something that stills needs to be mapped. Contributions are welcome!

@lzmchina
Copy link

hi,I'm also looking for this api recently.So when will it come to project?

@mullerhai
Copy link
Author

hi,I'm also looking for this api recently.So when will it come to project?

+1 ,eagerly need

@saudet
Copy link
Member

saudet commented Oct 29, 2022

If you would like to work on this yourself, I can provide assistance, so please let me know if you encounter any problems. Thanks!

@mullerhai
Copy link
Author

If you would like to work on this yourself, I can provide assistance, so please let me know if you encounter any problems. Thanks!

wow ,I do not know how to code these module , zero ability about this, I think maybe you can master these research

@mullerhai
Copy link
Author

@saudet please make the normal pytorch torch.utils.data.DataLoader & torch.utils.data.dataset class implement in javacpp ,our algorithm team use java and scala to develop torch ,but not DataLoader and dataset class can use, eagerly need you help

@mullerhai
Copy link
Author

unless make first priority operation to implement them Please , or the javacpp pytorch will can not use in real business online deploy environment ! it is will only became the toy for ml beginner or lab . eagerly need the dataloader and dataset pytorch api in javacpp.

@saudet
Copy link
Member

saudet commented Nov 15, 2022

When you say "real business", do you mean "money"? If that's your situation, then let's have a meeting to discuss this.

@mullerhai
Copy link
Author

mullerhai commented Nov 15, 2022

When you say "real business", do you mean "money"? If that's your situation, then let's have a meeting to discuss this.

sorry ,make you misunderstand my meaning, real business environment just mean real algorithm work in technology company to build the predict model

@mullerhai
Copy link
Author

HI ,could you bring the torchvision torchaudio torchtext torchfm four packages to the javacpp? thanks

@saudet
Copy link
Member

saudet commented Nov 27, 2022

The C++ APIs of libraries like these are typically deprecated, for example pytorch/vision@c359d8d, so you'll need to use them in Python anyway. Please let me know if you find any that are supported though.

@saudet
Copy link
Member

saudet commented Dec 11, 2022

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this:
https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java

Please give it a try with the snapshots: http://bytedeco.org/builds/

Also please let me know if there is anything missing!

@mullerhai
Copy link
Author

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this: https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java

Please give it a try with the snapshots: http://bytedeco.org/builds/

Also please let me know if there is anything missing!
Oh my god,Very thanks, these days I will try to use it

@mullerhai
Copy link
Author

set batch size =1 ,I think it's work

epoch 1 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 1 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 2 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 2 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 3 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 3 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 4 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 4 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 5 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 5 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 6 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 6 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 7 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 7 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 8 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 8 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 9 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 9 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 10 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 10 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]

Process finished with exit code 0

@mullerhai
Copy link
Author

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this: https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java
Please give it a try with the snapshots: http://bytedeco.org/builds/
Also please let me know if there is anything missing!
Oh my god,Very thanks, these days I will try to use it

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this: https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java

Please give it a try with the snapshots: http://bytedeco.org/builds/

Also please let me know if there is anything missing!

HI, feel very happy that chunkDataReader has biggest impact on pytorch-scala ml env , by the way If you ask me some class missing , I think is [SequentialSampler ] [StreamSampler] [StatefulDataLoader]【StatelessDataLoader】【DistributedSampler】

@saudet
Copy link
Member

saudet commented Dec 12, 2022

HI, feel very happy that chunkDataReader has biggest impact on pytorch-scala ml env , by the way If you ask me some class missing , I think is [SequentialSampler ] [StreamSampler] [StatefulDataLoader]【StatelessDataLoader】【DistributedSampler】

Good to hear that it works well!

Right, there are a few things still missing, but I guess what I am asking is whether there is anything out of that that is important.

@mullerhai
Copy link
Author

HI, feel very happy that chunkDataReader has biggest impact on pytorch-scala ml env , by the way If you ask me some class missing , I think is [SequentialSampler ] [StreamSampler] [StatefulDataLoader]【StatelessDataLoader】【DistributedSampler】

Good to hear that it works well!

Right, there are a few things still missing, but I guess what I am asking is whether there is anything out of that that is important.

but anything else, I found the loss function has some datatype error ,I have convert the predict and target datatype, but no effect ,I feel confused about that

package org.rec.pytorch

//import au.com.bytecode.opencsv.CSVReader
import com.github.tototoshi.csv.CSVReader
import org.bytedeco.javacpp._
import org.bytedeco.pytorch._
import org.bytedeco.pytorch.global.torch.{DeviceType, ScalarType, cross_entropy_loss, nll_loss, shiftLeft}
import org.bytedeco.pytorch.Module
import org.bytedeco.pytorch.global.torch
import org.bytedeco.pytorch.presets.torch.cout
import spire.random.rng.Device

import scala.collection.mutable.ListBuffer

class Net () extends Module { // Construct and register two Linear submodules.
  var fc1 = register_module("fc1", new LinearImpl(784, 64))
  var fc2 = register_module("fc2", new LinearImpl(64, 32))
  var fc3 = register_module("fc3", new LinearImpl(32, 10))

  // Implement the Net's algorith torch   new
  def forward(xs: Tensor): Tensor = { // Use one of many tensor manipulation functions.
    var x = xs
    x = torch.relu(fc1.forward(x.reshape(x.size(0), 784)))
    x = torch.dropout(x, /*p=*/ 0.5, /*train=*/ is_training, false)
    x = torch.relu(fc2.forward(x))
    x = torch.log_softmax(fc3.forward(x), new LogSoftmaxFuncOptions(/*dim=*/ 1))
    //    torch.view()
    x
  }
}
  object TestChunk {
    @throws[Exception]
    def main(args: Array[String]): Unit = {
      try {
        val scope = new PointerScope
        System.setProperty("org.bytedeco.openblas.load", "mkl")
        try {
          val batch_size = 64
          val net = new Net
          val prefetch_count = 1
          val testPath = "/Users/muller/Downloads/lamp/lamp-core/src/test/resources/mnist_test.csv"

          val mnistData = CSVReader.open(testPath)
          val dataBuffer = new ListBuffer[( Seq[Float],Float)]()
          val dataExample = new ListBuffer[Example]()
          var index = 0
          mnistData.foreach(ele => {
            if (index > 0) {
              val label_feateure = ele.map(_.toFloat)
              val label = label_feateure.take(1).head.toFloat
              val labelFeature = (label_feateure.drop(1),label)
              val example = new Example( AbstractTensor.create(label_feateure.drop(1): _*),AbstractTensor.create(label))
              dataBuffer.append(labelFeature)
              dataExample.append(example)
            }
            index += 1
          })
//          println(dataBuffer(0))
          val optimizer = new SGD(net.parameters, new SGDOptions(/*lr=*/ 0.01))


          // val criterion = cross_entropy_loss()
          //       val mapHeader =  mnistData.iteratorWithHeaders
          val data_reader = new ChunkDataReader() {
            override def read_chunk(chunk_index: Long) = {
              new ExampleVector(dataExample: _*)
            }

            override def chunk_count = dataExample.length

            override def reset(): Unit = {
            }
          }
          val sampler = new RandomSampler(0)
          val data_set = new ChunkSharedBatchDataset(new ChunkDataset(data_reader, sampler, sampler, new ChunkDatasetOptions(prefetch_count, batch_size))).map(new ExampleStack)
          val data_loader = new ChunkRandomDataLoader(data_set, new DataLoaderOptions(batch_size))
          for (epoch <- 1 to 10) {
            var it = data_loader.begin
            var batch_index =0
            while ( {
              !it.equals(data_loader.end)
            }) {
              val batch = it.access
              optimizer.zero_grad()
              // Execute the model on the input data.
              //prediction 64|10 ,batch.target :64|1
              //Exception in thread "main" java.lang.RuntimeException: 0D or 1D target tensor expected, multi-target not supported
              val prediction = net.forward(batch.data)

              //  "main" java.lang.RuntimeException: "nll_loss_out_frame" not implemented for 'Long'
              val pred = torch.argmax(prediction,new LongOptional(1),true).squeeze(1)
              // Compute a loss value to judge the prediction of our model.

              //Exception in thread "main" java.lang.RuntimeException: 0D or 1D target tensor expected, multi-target not supported
              val target =batch.target
              import org.bytedeco.pytorch.{Device => TorchDevice}

              val device :TorchDevice =new TorchDevice(DeviceType.CPU)

              //Exception in thread "main" java.lang.RuntimeException: "nll_loss_out_frame" not implemented for 'Long'   ScalarType.Long
              //  thread "main" java.lang.RuntimeException: expected scalar type Long but found Float ScalarType.Float
              // Exception in thread "main" java.lang.RuntimeException: expected scalar type Long but found Double  ScalarType.Double
              val squeezeTarget = target.squeeze(1).to(device,ScalarType.BFloat16)
              println(s"prediction ${prediction.shape.mkString("|")} prde ${pred.shape.mkString("|")},batch.target :${batch.target.shape.mkString("|")} squeeze batch ${batch.target.squeeze(1).shape.mkString("|")}")
              shiftLeft(cout, prediction)
              shiftLeft(cout, pred)
              shiftLeft(cout, target)
//              val loss = nll_loss( batch.target.squeeze(1),batch.target.squeeze(1))
              val loss = nll_loss( pred,squeezeTarget)
              loss.backward
              optimizer.step
              if ( {
                batch_index += 1;
                batch_index
              } % 100 == 0) {
                System.out.println("Epoch: " + epoch + " | Batch: " + batch_index + " | Loss: " + loss.item_float)
                // Serialize your model periodically as a checkpoint.
                val archive = new OutputArchive
                net.save(archive)
                archive.save_to("net.pt")
              }
              //            println(s"batch.data.createIndexer  ${batch.data.createIndexer}  batch.target.createIndexer ${batch.target.createIndexer}")

              it = it.increment
            }
          }
        } finally if (scope != null) scope.close()
      }
    }
  }


//            new ExampleVector(
//            new Example(AbstractTensor.create(100.0), AbstractTensor.create(200.0)),
//            new Example(AbstractTensor.create(300.0), AbstractTensor.create(400.0)))


@saudet
Copy link
Member

saudet commented Dec 12, 2022

but anything else, I found the loss function has some datatype error ,I have convert the predict and target datatype, but no effect ,I feel confused about that

If you still have problems with that, try with a smaller example, it should help you figure out what the problem is.

@mullerhai
Copy link
Author

but anything else, I found the loss function has some datatype error ,I have convert the predict and target datatype, but no effect ,I feel confused about that

If you still have problems with that, try with a smaller example, it should help you figure out what the problem is.

I have solve the problem ,now the mnist model can really training by load the local chunk mnist dataset, now the ChunkDataReader ChunkDataSet ChunkDataLoader these class can use perfectly !
SequentialSampler this class is necessary for time-stamp model training, and I think maybe it is easy to implement in javacpp, please add them in pytorch-javacpp 1.5.9 release version
thanks

@saudet
Copy link
Member

saudet commented Jun 6, 2023

Those classes are now available in version 1.5.9. Enjoy!

@saudet saudet closed this as completed Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants