Performance optimization #52

iharshulhan · 2021-03-05T13:57:30Z

Pandas querying is very slow and can be easily replaced with traditional indexing.
Here is the code that cause the bottleneck:

def _eval_rule_perf(self, rule, X, y):
      detected_index = list(X.query(rule).index)

Profiling results:

1141.451 _eval_rule_perf  skrules/skope_rules.py:614
         └─ 1140.967 query  pandas/core/frame.py:3316

An example of improved version:

tmp = X
for part_rule in rule.split('and '):
    part_rule = part_rule.strip()
    sign = '==' if '>' in part_rule else '!='
    tmp = tmp[tmp[part_rule.split()[0]] == 1 if sign == '==' else tmp[part_rule.split()[0]] != 1]

Note, this is the code for a binary case, it should be changed to a more generic version.

Profiling results

 8.658 <listcomp>  skrules/skope_rules.py:357
         └─ 8.609 _eval_rule_perf  skrules/skope_rules.py:614
            └─ 6.739 __getitem__  pandas/core/frame.py:2987

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance optimization #52

Performance optimization #52

iharshulhan commented Mar 5, 2021 •

edited

Loading

Performance optimization #52

Performance optimization #52

Comments

iharshulhan commented Mar 5, 2021 • edited Loading

Profiling results:

Profiling results

iharshulhan commented Mar 5, 2021 •

edited

Loading