- A real time data collected by the Govt of Gujarat, Diu and Daman District.
- This dataset is one of the most interesting NLP Datasets, I've ever come across. It gives the REASONS why people in the rural areas are extremely reluctant to go to school and the cherry on the top is the fact that both responses are recorded. The Children's and the Parent's there is where the stark difference lies, which you will come across if you go through EDA with a keen eye.
- The 'ML' part is done in such a peculiar way the primary reason is that the data is extremely dirty, I have never seen so many '\n' in a excel file, another being I wanted a simple way of getting how sentences are similar and how it can be visualized hence this peculiar method is followed. Although I have never seen anybody do this, it's mostly instinctive thinking.
Explanation in the video below
https://drive.google.com/file/d/1YD5IrBhP5hpGB2ZoqCPSEHOzO2hVHN_R/view?usp=sharing