Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SyntaxError: Python keyword not valid identifier in numexpr query #21

Open
saurabhdaalia opened this issue Feb 11, 2019 · 10 comments
Open

Comments

@saurabhdaalia
Copy link

When I add feature names to the SkopeRules model, I encounter this error.

Some of the feature names are :

data__blocked_bugs_number
data__ever_affected=False
data__ever_affected=True
data__has_crash_signature=False
data__has_crash_signature=True
data__has_github_url=False
data__has_github_url=True
data__has_str=irrelevant
data__has_str=no
Traceback (most recent call last):
  File "run.py", line 55, in <module>
    model.train()
  File "C:\Users\Saurabh Daalia\Desktop\bugbug\bugbug\model.py", line 101, in train
    self.skope_clf.fit(X_train, y_train)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 350, in fit
    for r in set(rules_from_tree)]
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 350, in <listcomp>
    for r in set(rules_from_tree)]
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\skrules\skope_rules.py", line 600, in _eval_rule_perf
    detected_index = list(X.query(rule).index)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3088, in query
    res = self.eval(expr, **kwargs)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3203, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\eval.py", line 294, in eval
    truediv=truediv)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 749, in __init__
    self.terms = self.parse()
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 766, in parse
    return self._visitor.visit(self.expr)
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 327, in visit
    raise e
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\site-packages\pandas\core\computation\expr.py", line 321, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "C:\Users\Saurabh Daalia\Anaconda3\lib\ast.py", line 35, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
SyntaxError: Python keyword not valid identifier in numexpr query
@ngoix
Copy link
Member

ngoix commented Mar 2, 2019

is it because you put = in your feature names?

@saurabhdaalia
Copy link
Author

I see, I think that might be the issue.
But what is causing this issue? Is there any workaround for it?

@ngoix
Copy link
Member

ngoix commented Mar 10, 2019

the variable names are parsed to build the rules, which causes your bug.
I don't see an easy workaround. You really shouldn't put = in your feature names...

@marco-c
Copy link

marco-c commented Mar 13, 2019

You really shouldn't put = in your feature names...

Feature names are strings, so it seems like a limitation to restrict what they can contain (everything else in the scikit-learn world doesn't care about it). Maybe it should be allowed, or at least documented somewhere?

@ngoix
Copy link
Member

ngoix commented Mar 13, 2019

you are right this should be documented. Feel free to open a PR for that or for fixing the syntax error :)

@ghost
Copy link

ghost commented Aug 25, 2019

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition

SyntaxError: Python keyword not valid identifier in numexpr query

Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

@vedal
Copy link

vedal commented Mar 30, 2020

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition

SyntaxError: Python keyword not valid identifier in numexpr query

Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

This happened to me as well. The problem was that I kept holding down the alt-key when writing the following the pipe symbol. I encounter this frequently, as writing pipe requires me to hold alt.

@osdiego
Copy link

osdiego commented Jun 2, 2020

Guys, I too get the similar error, when I run the below command, if I remove the pipe, it works with only one condition
SyntaxError: Python keyword not valid identifier in numexpr query
Error is --- train_outliers = train.query('age_z > 3 | age_z < ‐3')

This happened to me as well. The problem was that I kept holding down the alt-key when writing the following the pipe symbol. I encounter this frequently, as writing pipe requires me to hold alt.

Happened to me too, do anyone know how to fix?! Thanks xD

@CCNOAI
Copy link

CCNOAI commented Jun 3, 2020

@osdiego Did you copy and paste from another document. The "-3" is not being read correctly by the query function. Try removing/deleting the minus and replacing it. Let me know if this works.

@osdiego
Copy link

osdiego commented Jun 3, 2020

@CCNOAI I'm doing something like: (importance >= 0 | importance = -7).
The question is: I need to search like that, is there no way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants