Skip to content
This repository has been archived by the owner on Sep 14, 2020. It is now read-only.

Trying to use LSTM for sequence classification, having some issues with usage #2

Open
mrbullseye opened this issue Oct 25, 2018 · 4 comments

Comments

@mrbullseye
Copy link

Hello Peter.

This is a long post so I apologize beforehand.

First a big thank you for making this code available. I searched the web for a lightweight neural network framework to use for a thesis concerning "sequence classification for the internet of things" and found your work, so first and foremost a BIG thank you! Your code seemed very appropriate for the paradigm of IOT because of the lightweight nature, seeing as the IOT is concerned with low resource devices.

To my questions. I have been trying to figure out exactly how to use tiny-rnn for my purposes and some questions have popped up.

1: To try and understand your code i tried to implement a simple XOR solver but have been unable to make it work. I have tried your code in the trainingtests.cpp file but cannot get the XOR function to correctly train even though I have tried different network sizes (hidden layer of 2, 5 and 10), all of the activation functions, different training rates and different iterations. I even tried with a do..while and it got to about a million iterations before I cancelled the training. I can make the code available if you would like to see it.

The network always correctly gives output close to 0 for input {0,0} but {1,0} {1,1} and {0,1} are close to 0.5 They are actually moving in the right direction but very very slowly. Training rate does not affect the outcome. Might this have to do with the starting weights?

How do I change the seed for the random starting weights? Can they be changed? Manually?
Do you have working code for an XOR solver that I can look at?

2: For my primary needs, I need to implement an LSTM for stepping through a couple of different time sequences of sensor data. I have a few questions concerning this.
I have created dummy sensor data to see if it works containing some dummy data such as outside and inside temperature, passage counter (the scenario is for a subway station), and some others such as a simple boolean for open or closed status of the subway station.

I thought about having one input, a few hidden layers and 5 outputs (5 sensor sequences in total to classify) and inputing one sequence at a time to train the network on different types of sequences one after the other. The idea is that the LSTM will then give probabilities for what type of sequence is being tested for.

How do I step through a time sequence with your implementation? What function/s do I use?
How do I clear the memory cells for inputing another sequence but saving the trained weights so as to classify another sequence?
How do I save the weights for a properly trained network?

I apologize for the loooooooong post, but I think you might be able to help me with my long overdue thesis. A big big thank you once again for making the code available and for any help you might provide.

Peace! /Johan B

@peterrudenko
Copy link
Owner

Hello Johan.

To try and understand your code i tried to implement a simple XOR

From what I remember, the thing is that it's sensitive to metaparameters, like learning curve and unit types, i.e. sigmoid, relu, etc, so tinkering with that might help (at least the training tests are passing).

Do you have working code
How do I save the weights for a properly trained network?

Here's my playground where I trained my networks: https://github.com/peterrudenko/go-deeper

But in general, as you have noticed, the library is not so hot when it comes to learning speed and accuracy (as well as the one it was inspired by, synaptic.js - I don't remember if they even had a nice sequence learning example).

I've faced these limitations once I tried to make LSTM to generate some text (like in this epic article from Andrej Karpathy), but the progress slowed down all the time when results were like halfway there.

So I gave up since I realized I'll need to implement and tune all the state-of-the-art techniques, like dropout, batch normalization, SGD with momentum (which is only a must-have basic stuff today, as people keep coming up with more black magic fuckery like that to tweak learning accuracy).

To sum it up, the main goal of this project is a self-education, so I'm not sure if it actually fits your needs in production.

A couple of months ago I stumbled upon a project with a similar name: tiny-dnn, which is also a header-only C++ library with no dependencies, but it's much more mature, so you might want to check it out - hopefully, it helps you more :)

@mrbullseye
Copy link
Author

Thanks for the quick reply, Peter. I understand about the project being smaller scale and for self education. Since I am writing my thesis for my bachelors degree and would like to get a thorough understanding of these concepts, as well as this project being written in c++ and being lightweight, I figured that all these facts came together nicely. Also I had spent some time with your code and found it quite easy to understand just by looking at the code, so I thought I would give it a shot.

I have also stumbled upon tiny-dnn, however I had some issues getting it to work since I am mostly coding in Eclipse with mingw for windows, and had a lot of compilation errors, so I guess it doesn't support c++ 14 entirely. I could try mingw-w64 which is a newer fork of the compiler, or I could move it to Linux within Virtual box I guess.

To be clear, I don't need quite the production ready environment. Rather i need to prove the concept of classification using my own data set and then try it on more resource bound hardware like a raspberry pi. The idea is to later combine my findings (if any = ) ) with other theses into a sort of edge gateway for IOT that can autonomously classify data and take action using Deep Coder, but that is outside my scope. I really only need to test classification of the data set.

Do you think it would be possible, or am I barking up the wrong tree, so to speak? Any help would be appreciated and if you have any more answers I (like about time steps with LSTM) I would love to hear it. I am dipping my toes into this field and like you said a lot seems to be black magic fuckery. ;)

Thank you again.
/Johan B

@peterrudenko
Copy link
Owner

How do I step through a time sequence with your implementation? What function/s do I use?
How do I clear the memory cells for inputing another sequence but saving the trained weights so as to classify another sequence?
How do I save the weights for a properly trained network?

One training step basically consists of:

  • feed() function for filling the inputs and passing them through the network (aka forward-propagation)
  • learn() function for correcting the weights based on the error levels we've got after step 1 (aka back-propagation)

Please check the code from my playground to see how to save, load and reset weights (and, optionally, the topology):
https://github.com/peterrudenko/go-deeper/blob/65f6b58571a9be4844acd2d25342c7c370970bfa/Source/Playground/GoDeeper.cpp#L188-L192

Hopefully it explains the training steps as well:
https://github.com/peterrudenko/go-deeper/blob/65f6b58571a9be4844acd2d25342c7c370970bfa/Source/Playground/Models/TextTrainIteration.cpp#L208-L225

Rather i need to prove the concept of classification using my own data set and then try it on more resource bound hardware like a raspberry pi
Do you think it would be possible

Yup, theoretically it might be possible. But until then you'll face same issues that I did, I mean the need for implementing more up-to-date techniques.

In particular, one of those missing things is a softmax layer, which is vital for classification tasks like yours. It is often used as a final layer, allowing to take advantage of the fact that the sum of all outputs should equal 1, i.e. 100%.

Since I spend all my spare time on other pet-projects, I won't be able to offer much help on that (of course, you could take over the project to make it a proof-of-concept RNN library for resource bound hardware, if you have time and interest).

@mrbullseye
Copy link
Author

Thank you for the thorough explanation Peter. I really appreciate it. I will see what I can make of everything and will post my findings if and when I get them. Once again, thank you for your helpful attitude.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants