Skip to content

ClayRS 0.5.1 - ClayRS can see!

Latest
Compare
Choose a tag to compare
@Silleellie Silleellie released this 04 Jul 17:27
· 3 commits to master since this release
fa77370

ClayRS can see

Release which includes image support for the Content Analyzer and RecSys modules!

  • This release was co-developed with @m-elio

NOTE: The minimum Python version has been bumped up from Python 3.7 to Python 3.8 in order to use @functools.cached_property decorator


Added

Content Analyzer

  • Implemented visual preprocessors thanks to torchvision library
    • Also torch augmenters were implemented
    • All of them can be checked in the docs
  • Implemented postprocessors techniques which also work for textual techniques
    • Visual bag of words (with count and tfidf weighting schema)
    • Scipy vector quantization
    • Dimensionality reduction techniques from sklearn (PCA, Gaussian random projections, Feature agglomeration)
  • Images path to process specified in the raw source could be an absolute_path, relative_path, online url!
  • Implemented several content techniques which extract embedding features from images
    • Pre-trained models from timm
    • Pre-trained caffe models using opencv.dnn
    • Hog descriptor, Canny edge detector, LBP, SIFT from skimage
    • Color histogram
    • Custom filter convolution
  • Implemented FromNPY technique, which imports features from a numpy serialized matrix

RecSys

  • Implemented VBPR technique following the corresponding paper
    • The implementation has been tested thoroughly by experimental comparison with cornac (experiment repository can be found here)

Changed

Content Analyzer

  • Changed Ratings class to use numpy arrays and integer mappings instead of relying on python dictionaries and strings
  • Adapted FieldContentProductionTechnique to consider the distinction between textual and visual techniques
  • Added possibility to serialize contents produced with multi threading

RecSys

  • Vectorized computation of CentroidVector algorithm
  • Adapted content based algorithm abstraction to make room for neural algorithms
  • Fixed missing Bootstrap partitioning technique from online documentation
  • AllItemsMethodology by default now considers as items catalog the union between train and test set
  • HoldOutPartitioningTechnique can now accept an integer value representing the n° of instances to hold rather than a percentage
  • Changed log of users skipped in partitioning/algorithm fitting: a single print with total number of skipped users is fired instead of a single one for each skipped user

EvalModel

  • Changed NDCG implementation to allow the choice of the gain weights (linear or exponential) and the definition of a discount function
  • Improved visualization of statistical tests results