New tokenizer and maybe evaluator #226

hgrecco · 2015-01-24T20:29:55Z

This is an issue to discuss the new tokenizer and evaluator. The main goal is to make unicode indentifier accepted in Python 2. In 8756ed7 I have copied the Python 3.4 tokenize module to pint and tweaked to make it work in Python 2.7. In b32c2ec I have changed the isindentifier function to allow some unicode characters in Python 2.

Pending:
1.- Accepting identifiers is more restrictive than accepting probable unit names. For example (", ' and %) are not valid identifiers but they are valid unit names. We might need to tweak even more the tokenizer. What to do?
2.- Removing some redundancy. Because the way we copied from Python3.4, we now change the string to bytes, detect the encoding and then change it back to a unicode string. This needs to be discussed.
3.- Do we need a custom evaluator?
4.- Performance

The text was updated successfully, but these errors were encountered:

cheezman34 · 2015-03-04T02:17:21Z

Regarding a custom evaluator & security:

As far as I can tell, the current use of eval only happens after the input has been modified during tokenization. This seems like it should make eval safe to use on unverified input, but I'm not entirely convinced a sufficiently clever person could't figure a way around it. Custom eval would, if nothing else, help provide some peace of mind for some of us.

I might be willing to take a crack at it if the requirements are concise.

What all would a custom evaluator have to do?

Parse quantities & units
Perform math operations on units
Compare units?
Anything else?

hgrecco · 2015-03-04T02:46:03Z

Evaluating an expression can be divided in two different aspects:
1.- Evaluating a multiplicative expressions: This just needs to understand product, division, power, numbers and units as strings, returning a ParserHelper-like instance. Notices that this is registry independent.
2.- Evaluating an additive expression: This is more complex as it needs to know how to convert one unit to another. This requires the registry.

For most internal operations only 1 is required. To allow a generic unit calculator we also need the second..

hgrecco · 2015-06-01T03:11:17Z

We have a new parser!

nfearnley · 2019-07-21T01:31:19Z

I'm still trying to parse something like "5'" or "6"" or "5'6"" as per #192. Pint doesn't seem to parse the 5'2" units properly, and when I attempt to add it as a custom unit, it just throws an error. Any luck handling this?

This was referenced Jan 24, 2015

Angstrom abbreviated pretty formats to Å #223

Closed

Unicode support #25

Closed

SyntaxError when using " and ' for inches and feet #192

Closed

hgrecco mentioned this issue Feb 6, 2015

Introduce 'u' to format only a Quantity's units #231

Closed

hgrecco closed this as completed Jun 1, 2015

zaccrites mentioned this issue Jan 13, 2016

Parser Safety for Untrusted Input #325

Closed

jameshiebert mentioned this issue Sep 21, 2017

How should one parse unit names that are not valid identifiers? #554

Closed

cpascual mentioned this issue Jan 31, 2018

(doc) document how TaurusValueLineEdit parses quantities taurus-org/taurus#679

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New tokenizer and maybe evaluator #226

New tokenizer and maybe evaluator #226

hgrecco commented Jan 24, 2015

cheezman34 commented Mar 4, 2015

hgrecco commented Mar 4, 2015

hgrecco commented Jun 1, 2015

nfearnley commented Jul 21, 2019

New tokenizer and maybe evaluator #226

New tokenizer and maybe evaluator #226

Comments

hgrecco commented Jan 24, 2015

cheezman34 commented Mar 4, 2015

hgrecco commented Mar 4, 2015

hgrecco commented Jun 1, 2015

nfearnley commented Jul 21, 2019