Dependencies

    sbt >= 1.4.7

Should be available by default on the IC Cluster. Otherwise, refer to each project installation instructions.

Dataset

Download the ml-100k.zip dataset in the data/ folder:

> mkdir -p data
> cd data
> wget http://files.grouplens.org/datasets/movielens/ml-100k.zip

Check the integrity of the file with (it should give the same number as below):

> md5 -q ml-100k.zip
0e33842e24a9c977be4e0107933c0723

Unzip:

> unzip ml-100k.zip

Personal Ratings

Add your ratings in the data/personal.csv file, by providing a numerical rating between [1,5] for at least 20 movies. For example, to rate the 'Toy Story' movie with '5', modify this line:

1,Toy Story (1995),

to this:

1,Toy Story (1995),5

Do include your own ratings in your final submission so we can check your answers against those provided in your report.

Important: Edit the data/personal.csv file using your IDE/editor. Do not use applications such as Microsoft Excel, as there is the risk of changing the commas (,) to semicolons (;).

Usage

Compute similarity predictions

> sbt "runMain similarity.Predictor --train data/ml-100k/u1.base --test data/ml-100k/u1.test --json similarity.json"

Compute k-NN predictions

> sbt "runMain knn.Predictor --train data/ml-100k/u1.base --test data/ml-100k/u1.test --json knn.json"

Compute recommendations

> sbt "runMain recommend.Recommender --data data/ml-100k/u.data --personal data/personal.csv --json recommendations.json"

Package for submission

Steps:

1. Ensure you only used the dependencies listed in ````build.sbt```` in this template, and did not add any other.
2. Remove ````project/project````, ````project/target````, and ````target/````.
3. Test that all previous commands for generating statistics, predictions, and recommendations correctly produce a JSON file (after downloading/reinstalling dependencies).
4. Remove the ml-100k dataset (````data/ml-100k.zip````, and ````data/ml-100k````), as well as the````project/project````, ````project/target````, and ````target/````.
5. Add your report and any other necessary files listed in the Milestone description (see ````Deliverables````).
6. Zip the archive.
7. Submit to the TA for grading.

References

Essential sbt: https://www.scalawilliam.com/essential-sbt/

Explore Spark Interactively (supports autocompletion with tabs!): https://spark.apache.org/docs/latest/quick-start.html

Scallop Argument Parsing: https://github.com/scallop/scallop/wiki

Spark Resilient Distributed Dataset (RDD): https://spark.apache.org/docs/3.0.1/api/scala/org/apache/spark/rdd/RDD.html

JSON Serialization: https://github.com/json4s/json4s#serialization

Credits

Erick Lavoie (Design, Implementation, Tests)

Athanasios Xygkis (Requirements, Tests)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.bloop		.bloop
.metals		.metals
data		data
project		project
src		src
.gitignore		.gitignore
Milestone-2.pdf		Milestone-2.pdf
README.md		README.md
build.sbt		build.sbt
hs_err_pid42375.log		hs_err_pid42375.log
knn.json		knn.json
recommendations.json		recommendations.json
replay_pid42375.log		replay_pid42375.log
similarity.json		similarity.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Dataset

Personal Ratings

Usage

Compute similarity predictions

Compute k-NN predictions

Compute recommendations

Package for submission

References

Credits

About

Releases

Packages

Languages

giordano-lucas/movie-recommender

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Dataset

Personal Ratings

Usage

Compute similarity predictions

Compute k-NN predictions

Compute recommendations

Package for submission

References

Credits

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages