Skip to content

Post Processors

Giacomo Stelluti Scala edited this page Jan 25, 2020 · 20 revisions

Reason

A post processor service is a type that configured in a SearchContext will process a sequence of ResultInfo producing a new one. It must be a subtype of PostProcessor and must override the following abstract method:

IEnumerable<ResultInfo> Process(IEnumerable<ResultInfo> results)

It's also mandatory to define a constructor that accepts a single object parameter, used for the settings of the specific post processor.

Built-In

PickAll comes with following built-in post processors:

  • Uniqueness: removes duplicate results by URL.
  • Order: orders results placing indexes of same number close by each other.
  • FuzzyMatch: compares a string against results descriptions.
  • Improve: improves results computing word frequency to perform a subsequent search.
  • Textify: extract all text from documents of results URLs.

FuzzyMatch

FuzzyMatch post processors computes Levenshtein Distance between a given string and results descriptions. If the distance is out of the specified range, the result will be excluded. Is configured as follows:

var context = SearchContext.Default
                    .With<FuzzyMatch>(new FuzzyMatchSettings {
                        Text = options.FuzzyMatch,
                        MaximumDistance = 10 }); // MinimumDistance default is 0

Improve

Improve post processor reduces results descriptions to words, than computes the more frequents to be used in the query of a subsequent search. It's configured as follows:

var context = SearchContext.Default
                    .With<Improve>(
                        new ImproveSettings {
                            WordCount = 2,
                            NoiseLength = 3});

In this case it will consider only the first two more frequent words. All words with a length of 3 caracthers or less will be excluded from the computation.

Textify

Textify extracts all text from each URL of results. It follows a configuration sample:

var context = SearchContext.Default
                    .With<Textify>(
                        new TextifySettings {
                            IncludeTitle = true,
                            NoiseLength = 3}); // Textify doesn't support NoiseLength

Data is presented in different ways:

ResultInfo result = results.First();
// Wordify
IEnumerable<string> words = (WordifyData)result.Data).Words;
// Textify
string text = (Textify)result.Data).Text;

By default Textify doesn't sanitize text (set TextifySettings.SanitizeText to true, will guarantee only alphanumeric text).