Wongnai-corpus

This project is a collection of Wongnai's datasets which are mostly in Thai language. We hope that these datasets will advance research in natural language processing(NLP) especially in Thai language.

1. Search query dataset

There are 500,000 unique words extracted from search queries. These words were labeled by algorithms and judges for a word segmentation task. Our segmentation criteria is to segment the longest food word as possible for archiving the highest precision score in search system.

1.1 Files

search/labeled_queries_by_algo.txt : List of 500K words labeled by algorithms which were described in detail in blog post.
search/labeled_queries_by_judges.txt : List of 10K words labeled by judges following Wongnai's search criteria.
search/food_dictionary.txt : List of 400K food words used for labelling the labeled_queries_by_algo.txt.

Please note that these words were collected from user-generated content(UGC) which might include some out of topic words.

1.2 Usage

You may use labeled_queries_by_algo.txt for training your own word segmentation model by spliting into train and validation set and then evaluate your model with labeled_queries_by_judges.txt.

2. Review dataset

The review dataset contains restaurant reviews and ratings (there are only 5 classes ranging from 1 to 5 stars).

2.1 Files

The dataset is located in Kaggle competition which was created by Dr.Ekapol Chuangsuwanich
If you can't download files, they are also located here review/review_dataset.zip

2.2 Usage

The dataset is originally used for a Review Rating Prediction task. You can find an example of how to import the data from here. (by Khun Korakot Chaovavanich)
In addition, it is also used for creating a Text Classification Benchmark which was well described here. (by Khun Charin Polpanumas)

Wongnai data services

If you are interested in Wongnai database such as photos, reviews or restaurant database, Wongnai also provides data services including API and files. For more details, please follow the link below. https://business.wongnai.com/restaurants-data-service/en/

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
review		review
search		search
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wongnai-corpus

1. Search query dataset

1.1 Files

1.2 Usage

2. Review dataset

2.1 Files

2.2 Usage

Wongnai data services

About

Releases

Packages

Contributors 4

License

wongnai/wongnai-corpus

Folders and files

Latest commit

History

Repository files navigation

Wongnai-corpus

1. Search query dataset

1.1 Files

1.2 Usage

2. Review dataset

2.1 Files

2.2 Usage

Wongnai data services

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages