Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimization #52

Open
iharshulhan opened this issue Mar 5, 2021 · 0 comments
Open

Performance optimization #52

iharshulhan opened this issue Mar 5, 2021 · 0 comments

Comments

@iharshulhan
Copy link

iharshulhan commented Mar 5, 2021

Pandas querying is very slow and can be easily replaced with traditional indexing.
Here is the code that cause the bottleneck:

def _eval_rule_perf(self, rule, X, y):
      detected_index = list(X.query(rule).index)

Profiling results:

1141.451 _eval_rule_perf  skrules/skope_rules.py:614
         └─ 1140.967 query  pandas/core/frame.py:3316

An example of improved version:

tmp = X
for part_rule in rule.split('and '):
    part_rule = part_rule.strip()
    sign = '==' if '>' in part_rule else '!='
    tmp = tmp[tmp[part_rule.split()[0]] == 1 if sign == '==' else tmp[part_rule.split()[0]] != 1]

Note, this is the code for a binary case, it should be changed to a more generic version.

Profiling results

 8.658 <listcomp>  skrules/skope_rules.py:357
         └─ 8.609 _eval_rule_perf  skrules/skope_rules.py:614
            └─ 6.739 __getitem__  pandas/core/frame.py:2987
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant