optimization and simplification suggestions #31

petri · 2017-02-07T09:53:20Z

switch to function-based API

it makes no sense to instantiate a class for each cleaned name; it's overcomplex, extra work and unnecessary, especially when most of setup code is now outside the class

switch to working on whitespace-separated name parts rather than full strings

In effect we would check for example in case of suffix for business_name.split()[-1] == term rather than business_name.endswith(' ' + term). Of course the splitting would be done just once in the beginning.

at the moment, the class is splitting and rejoining the name already, to get rid of extra whitespaces
at the moment, the code already looks for a prefix/suffix that's padded by a single whitespace, so in effect it's the same

If we can just handle the fact that some legal terms are "multi-part" (whitespace-separated), this would simplify the code and make it run faster since for example we'd only have to work on the last whitespace-separated name part for suffix, and just the first for prefix. There are other cases, too.

We would not have to presort the data, either.

don't use both legal and countrywise suffixes in clean_name

there are a lot of duplicates, it should be enough to use just either (preferably countrywise data since that would allow dropping off countries easily)

The text was updated successfully, but these errors were encountered:

petri · 2020-04-26T14:42:34Z

Since 2.0, there are now following optimizations:

function-based API added
term search works on splitting the names & terms rather than directly on strings; see optimization2 branch for code to compare the effect of this (x3 speedup)
the term preparation code generates unique terms

These are pretty much what this request was asking for, so closing.

petri added enhancement question labels Feb 8, 2017

petri mentioned this issue Feb 7, 2019

Could we please have 1.4 or 1.3.1 (or any release) in PyPI With the recent patches? #30

Closed

psolin modified the milestone: Version 2.0 Apr 19, 2020

petri closed this as completed Apr 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization and simplification suggestions #31

optimization and simplification suggestions #31

petri commented Feb 7, 2017 •

edited

Loading

petri commented Apr 26, 2020

optimization and simplification suggestions #31

optimization and simplification suggestions #31

Comments

petri commented Feb 7, 2017 • edited Loading

switch to function-based API

switch to working on whitespace-separated name parts rather than full strings

don't use both legal and countrywise suffixes in clean_name

petri commented Apr 26, 2020

petri commented Feb 7, 2017 •

edited

Loading