Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tokenizer and maybe evaluator #226

Closed
hgrecco opened this issue Jan 24, 2015 · 4 comments
Closed

New tokenizer and maybe evaluator #226

hgrecco opened this issue Jan 24, 2015 · 4 comments

Comments

@hgrecco
Copy link
Owner

hgrecco commented Jan 24, 2015

This is an issue to discuss the new tokenizer and evaluator. The main goal is to make unicode indentifier accepted in Python 2. In 8756ed7 I have copied the Python 3.4 tokenize module to pint and tweaked to make it work in Python 2.7. In b32c2ec I have changed the isindentifier function to allow some unicode characters in Python 2.

Pending:
1.- Accepting identifiers is more restrictive than accepting probable unit names. For example (", ' and %) are not valid identifiers but they are valid unit names. We might need to tweak even more the tokenizer. What to do?
2.- Removing some redundancy. Because the way we copied from Python3.4, we now change the string to bytes, detect the encoding and then change it back to a unicode string. This needs to be discussed.
3.- Do we need a custom evaluator?
4.- Performance

@cheezman34
Copy link

Regarding a custom evaluator & security:

As far as I can tell, the current use of eval only happens after the input has been modified during tokenization. This seems like it should make eval safe to use on unverified input, but I'm not entirely convinced a sufficiently clever person could't figure a way around it. Custom eval would, if nothing else, help provide some peace of mind for some of us.

I might be willing to take a crack at it if the requirements are concise.

What all would a custom evaluator have to do?

  1. Parse quantities & units
  2. Perform math operations on units
  3. Compare units?
  4. Anything else?

@hgrecco
Copy link
Owner Author

hgrecco commented Mar 4, 2015

Evaluating an expression can be divided in two different aspects:
1.- Evaluating a multiplicative expressions: This just needs to understand product, division, power, numbers and units as strings, returning a ParserHelper-like instance. Notices that this is registry independent.
2.- Evaluating an additive expression: This is more complex as it needs to know how to convert one unit to another. This requires the registry.

For most internal operations only 1 is required. To allow a generic unit calculator we also need the second..

@hgrecco
Copy link
Owner Author

hgrecco commented Jun 1, 2015

We have a new parser!

@nfearnley
Copy link

I'm still trying to parse something like "5'" or "6"" or "5'6"" as per #192. Pint doesn't seem to parse the 5'2" units properly, and when I attempt to add it as a custom unit, it just throws an error. Any luck handling this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants