Excuse me,the DataSet class and DataLoader class not found in javacpp pytorch? #1215

mullerhai · 2022-08-23T09:48:01Z

Hi:
I want to read some train data from load disk or hadoop hdfs,but I not found DataSet and DataLoader, Only the Mnist relate dataset dataloader class,I not know which dataread class to inherit ?

saudet · 2022-08-23T13:22:08Z

That's something that stills needs to be mapped. Contributions are welcome!

lzmchina · 2022-10-16T05:30:09Z

hi,I'm also looking for this api recently.So when will it come to project?

mullerhai · 2022-10-18T06:44:21Z

hi,I'm also looking for this api recently.So when will it come to project?

+1 ，eagerly need

saudet · 2022-10-29T02:07:38Z

If you would like to work on this yourself, I can provide assistance, so please let me know if you encounter any problems. Thanks!

mullerhai · 2022-10-31T08:51:18Z

If you would like to work on this yourself, I can provide assistance, so please let me know if you encounter any problems. Thanks!

wow ,I do not know how to code these module , zero ability about this, I think maybe you can master these research

mullerhai · 2022-11-15T05:56:25Z

@saudet please make the normal pytorch torch.utils.data.DataLoader & torch.utils.data.dataset class implement in javacpp ，our algorithm team use java and scala to develop torch ,but not DataLoader and dataset class can use, eagerly need you help

mullerhai · 2022-11-15T06:04:31Z

unless make first priority operation to implement them Please , or the javacpp pytorch will can not use in real business online deploy environment ! it is will only became the toy for ml beginner or lab . eagerly need the dataloader and dataset pytorch api in javacpp.

saudet · 2022-11-15T06:06:22Z

When you say "real business", do you mean "money"? If that's your situation, then let's have a meeting to discuss this.

mullerhai · 2022-11-15T10:36:45Z

When you say "real business", do you mean "money"? If that's your situation, then let's have a meeting to discuss this.

sorry ，make you misunderstand my meaning, real business environment just mean real algorithm work in technology company to build the predict model

mullerhai · 2022-11-27T03:40:25Z

HI ,could you bring the torchvision torchaudio torchtext torchfm four packages to the javacpp? thanks

saudet · 2022-11-27T04:50:35Z

The C++ APIs of libraries like these are typically deprecated, for example pytorch/vision@c359d8d, so you'll need to use them in Python anyway. Please let me know if you find any that are supported though.

…ing classes from PyTorch (issue #1215)

saudet · 2022-12-11T14:03:41Z

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this:
https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java

Please give it a try with the snapshots: http://bytedeco.org/builds/

Also please let me know if there is anything missing!

mullerhai · 2022-12-11T15:10:27Z

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this: https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java

Please give it a try with the snapshots: http://bytedeco.org/builds/

Also please let me know if there is anything missing!
Oh my god,Very thanks, these days I will try to use it

mullerhai · 2022-12-11T16:20:25Z

set batch size =1 ,I think it's work

epoch 1 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 1 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 2 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 2 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 3 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 3 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 4 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 4 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 5 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 5 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 6 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 6 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 7 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 7 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 8 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 8 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 9 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 9 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]
epoch 10 batch.data.createIndexer  [ 100.0, 200.0, 412.0 ]  batch.target.createIndexer [ 200.0 ]
epoch 10 batch.data.createIndexer  [ 300.0, 1234.0, 322.0 ]  batch.target.createIndexer [ 400.0 ]

Process finished with exit code 0

mullerhai · 2022-12-12T03:08:56Z

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this: https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java
Please give it a try with the snapshots: http://bytedeco.org/builds/
Also please let me know if there is anything missing!
Oh my god,Very thanks, these days I will try to use it

I've added support for that with ChunkDataReader in commit fa4dfdc, which works something like this: https://github.com/bytedeco/javacpp-presets/blob/ci/pytorch/samples/TestChunkData.java

Please give it a try with the snapshots: http://bytedeco.org/builds/

Also please let me know if there is anything missing!

HI, feel very happy that chunkDataReader has biggest impact on pytorch-scala ml env , by the way If you ask me some class missing , I think is [SequentialSampler ] [StreamSampler] [StatefulDataLoader]【StatelessDataLoader】【DistributedSampler】

saudet · 2022-12-12T03:40:59Z

HI, feel very happy that chunkDataReader has biggest impact on pytorch-scala ml env , by the way If you ask me some class missing , I think is [SequentialSampler ] [StreamSampler] [StatefulDataLoader]【StatelessDataLoader】【DistributedSampler】

Good to hear that it works well!

Right, there are a few things still missing, but I guess what I am asking is whether there is anything out of that that is important.

mullerhai · 2022-12-12T06:51:11Z

HI, feel very happy that chunkDataReader has biggest impact on pytorch-scala ml env , by the way If you ask me some class missing , I think is [SequentialSampler ] [StreamSampler] [StatefulDataLoader]【StatelessDataLoader】【DistributedSampler】

Good to hear that it works well!

Right, there are a few things still missing, but I guess what I am asking is whether there is anything out of that that is important.

but anything else, I found the loss function has some datatype error ,I have convert the predict and target datatype, but no effect ,I feel confused about that

package org.rec.pytorch

//import au.com.bytecode.opencsv.CSVReader
import com.github.tototoshi.csv.CSVReader
import org.bytedeco.javacpp._
import org.bytedeco.pytorch._
import org.bytedeco.pytorch.global.torch.{DeviceType, ScalarType, cross_entropy_loss, nll_loss, shiftLeft}
import org.bytedeco.pytorch.Module
import org.bytedeco.pytorch.global.torch
import org.bytedeco.pytorch.presets.torch.cout
import spire.random.rng.Device

import scala.collection.mutable.ListBuffer

class Net () extends Module { // Construct and register two Linear submodules.
  var fc1 = register_module("fc1", new LinearImpl(784, 64))
  var fc2 = register_module("fc2", new LinearImpl(64, 32))
  var fc3 = register_module("fc3", new LinearImpl(32, 10))

  // Implement the Net's algorith torch   new
  def forward(xs: Tensor): Tensor = { // Use one of many tensor manipulation functions.
    var x = xs
    x = torch.relu(fc1.forward(x.reshape(x.size(0), 784)))
    x = torch.dropout(x, /*p=*/ 0.5, /*train=*/ is_training, false)
    x = torch.relu(fc2.forward(x))
    x = torch.log_softmax(fc3.forward(x), new LogSoftmaxFuncOptions(/*dim=*/ 1))
    //    torch.view()
    x
  }
}
  object TestChunk {
    @throws[Exception]
    def main(args: Array[String]): Unit = {
      try {
        val scope = new PointerScope
        System.setProperty("org.bytedeco.openblas.load", "mkl")
        try {
          val batch_size = 64
          val net = new Net
          val prefetch_count = 1
          val testPath = "/Users/muller/Downloads/lamp/lamp-core/src/test/resources/mnist_test.csv"

          val mnistData = CSVReader.open(testPath)
          val dataBuffer = new ListBuffer[( Seq[Float],Float)]()
          val dataExample = new ListBuffer[Example]()
          var index = 0
          mnistData.foreach(ele => {
            if (index > 0) {
              val label_feateure = ele.map(_.toFloat)
              val label = label_feateure.take(1).head.toFloat
              val labelFeature = (label_feateure.drop(1),label)
              val example = new Example( AbstractTensor.create(label_feateure.drop(1): _*),AbstractTensor.create(label))
              dataBuffer.append(labelFeature)
              dataExample.append(example)
            }
            index += 1
          })
//          println(dataBuffer(0))
          val optimizer = new SGD(net.parameters, new SGDOptions(/*lr=*/ 0.01))


          // val criterion = cross_entropy_loss()
          //       val mapHeader =  mnistData.iteratorWithHeaders
          val data_reader = new ChunkDataReader() {
            override def read_chunk(chunk_index: Long) = {
              new ExampleVector(dataExample: _*)
            }

            override def chunk_count = dataExample.length

            override def reset(): Unit = {
            }
          }
          val sampler = new RandomSampler(0)
          val data_set = new ChunkSharedBatchDataset(new ChunkDataset(data_reader, sampler, sampler, new ChunkDatasetOptions(prefetch_count, batch_size))).map(new ExampleStack)
          val data_loader = new ChunkRandomDataLoader(data_set, new DataLoaderOptions(batch_size))
          for (epoch <- 1 to 10) {
            var it = data_loader.begin
            var batch_index =0
            while ( {
              !it.equals(data_loader.end)
            }) {
              val batch = it.access
              optimizer.zero_grad()
              // Execute the model on the input data.
              //prediction 64|10 ,batch.target :64|1
              //Exception in thread "main" java.lang.RuntimeException: 0D or 1D target tensor expected, multi-target not supported
              val prediction = net.forward(batch.data)

              //  "main" java.lang.RuntimeException: "nll_loss_out_frame" not implemented for 'Long'
              val pred = torch.argmax(prediction,new LongOptional(1),true).squeeze(1)
              // Compute a loss value to judge the prediction of our model.

              //Exception in thread "main" java.lang.RuntimeException: 0D or 1D target tensor expected, multi-target not supported
              val target =batch.target
              import org.bytedeco.pytorch.{Device => TorchDevice}

              val device :TorchDevice =new TorchDevice(DeviceType.CPU)

              //Exception in thread "main" java.lang.RuntimeException: "nll_loss_out_frame" not implemented for 'Long'   ScalarType.Long
              //  thread "main" java.lang.RuntimeException: expected scalar type Long but found Float ScalarType.Float
              // Exception in thread "main" java.lang.RuntimeException: expected scalar type Long but found Double  ScalarType.Double
              val squeezeTarget = target.squeeze(1).to(device,ScalarType.BFloat16)
              println(s"prediction ${prediction.shape.mkString("|")} prde ${pred.shape.mkString("|")},batch.target :${batch.target.shape.mkString("|")} squeeze batch ${batch.target.squeeze(1).shape.mkString("|")}")
              shiftLeft(cout, prediction)
              shiftLeft(cout, pred)
              shiftLeft(cout, target)
//              val loss = nll_loss( batch.target.squeeze(1),batch.target.squeeze(1))
              val loss = nll_loss( pred,squeezeTarget)
              loss.backward
              optimizer.step
              if ( {
                batch_index += 1;
                batch_index
              } % 100 == 0) {
                System.out.println("Epoch: " + epoch + " | Batch: " + batch_index + " | Loss: " + loss.item_float)
                // Serialize your model periodically as a checkpoint.
                val archive = new OutputArchive
                net.save(archive)
                archive.save_to("net.pt")
              }
              //            println(s"batch.data.createIndexer  ${batch.data.createIndexer}  batch.target.createIndexer ${batch.target.createIndexer}")

              it = it.increment
            }
          }
        } finally if (scope != null) scope.close()
      }
    }
  }


//            new ExampleVector(
//            new Example(AbstractTensor.create(100.0), AbstractTensor.create(200.0)),
//            new Example(AbstractTensor.create(300.0), AbstractTensor.create(400.0)))

saudet · 2022-12-12T23:21:37Z

but anything else, I found the loss function has some datatype error ,I have convert the predict and target datatype, but no effect ,I feel confused about that

If you still have problems with that, try with a smaller example, it should help you figure out what the problem is.

mullerhai · 2022-12-13T08:28:19Z

but anything else, I found the loss function has some datatype error ,I have convert the predict and target datatype, but no effect ,I feel confused about that

If you still have problems with that, try with a smaller example, it should help you figure out what the problem is.

I have solve the problem ,now the mnist model can really training by load the local chunk mnist dataset, now the ChunkDataReader ChunkDataSet ChunkDataLoader these class can use perfectly !
SequentialSampler this class is necessary for time-stamp model training, and I think maybe it is easy to implement in javacpp, please add them in pytorch-javacpp 1.5.9 release version
thanks

…` from PyTorch (issue #1215)

saudet · 2023-06-06T08:45:03Z

Those classes are now available in version 1.5.9. Enjoy!

saudet added enhancement help wanted labels Aug 23, 2022

saudet assigned HGuillemet Oct 15, 2022

saudet mentioned this issue Nov 14, 2022

Need help ：make the normal pytorch torch.utils.data.DataLoader & torch.utils.data.dataset class implement in javacpp please #1263

Closed

saudet mentioned this issue Nov 29, 2022

I want to reconstruct implement the pytorch dataset in scala but meet error #1272

Closed

saudet added a commit that referenced this issue Dec 11, 2022

* Map torch::data::datasets::ChunkDataReader and related data load…

fa4dfdc

…ing classes from PyTorch (issue #1215)

saudet removed the help wanted label Dec 11, 2022

saudet added a commit that referenced this issue Dec 13, 2022

* Map torch::data::datasets::DistributedSampler and `StreamSampler…

c0d15db

…` from PyTorch (issue #1215)

saudet mentioned this issue Dec 19, 2022

[Pytorch] How to create the Example ExampleStack ExampleIterator from tensor? #1273

Closed

saudet closed this as completed Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excuse me,the DataSet class and DataLoader class not found in javacpp pytorch? #1215

Excuse me,the DataSet class and DataLoader class not found in javacpp pytorch? #1215

mullerhai commented Aug 23, 2022

saudet commented Aug 23, 2022

lzmchina commented Oct 16, 2022

mullerhai commented Oct 18, 2022

saudet commented Oct 29, 2022

mullerhai commented Oct 31, 2022

mullerhai commented Nov 15, 2022

mullerhai commented Nov 15, 2022

saudet commented Nov 15, 2022 •

edited

Loading

mullerhai commented Nov 15, 2022 •

edited by saudet

Loading

mullerhai commented Nov 27, 2022

saudet commented Nov 27, 2022

saudet commented Dec 11, 2022

mullerhai commented Dec 11, 2022

mullerhai commented Dec 11, 2022

mullerhai commented Dec 12, 2022

saudet commented Dec 12, 2022

mullerhai commented Dec 12, 2022

saudet commented Dec 12, 2022

mullerhai commented Dec 13, 2022

saudet commented Jun 6, 2023

Excuse me,the DataSet class and DataLoader class not found in javacpp pytorch? #1215

Excuse me,the DataSet class and DataLoader class not found in javacpp pytorch? #1215

Comments

mullerhai commented Aug 23, 2022

saudet commented Aug 23, 2022

lzmchina commented Oct 16, 2022

mullerhai commented Oct 18, 2022

saudet commented Oct 29, 2022

mullerhai commented Oct 31, 2022

mullerhai commented Nov 15, 2022

mullerhai commented Nov 15, 2022

saudet commented Nov 15, 2022 • edited Loading

mullerhai commented Nov 15, 2022 • edited by saudet Loading

mullerhai commented Nov 27, 2022

saudet commented Nov 27, 2022

saudet commented Dec 11, 2022

mullerhai commented Dec 11, 2022

mullerhai commented Dec 11, 2022

mullerhai commented Dec 12, 2022

saudet commented Dec 12, 2022

mullerhai commented Dec 12, 2022

saudet commented Dec 12, 2022

mullerhai commented Dec 13, 2022

saudet commented Jun 6, 2023

saudet commented Nov 15, 2022 •

edited

Loading

mullerhai commented Nov 15, 2022 •

edited by saudet

Loading