Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strptime(.., '%c') fails to parse output of strftime('%c', ..) in some locales #53203

Open
abalkin opened this issue Jun 9, 2010 · 21 comments
Open
Labels
3.7 (EOL) end of life extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@abalkin
Copy link
Member

abalkin commented Jun 9, 2010

BPO 8957
Nosy @abalkin, @vstinner, @ezio-melotti
Dependencies
  • bpo-8915: Use locale.nl_langinfo in _strptime
  • Files
  • strptime-locale-bug.c: Working C code
  • strptime-locale-bug.py: Failing python code
  • cfmt.py
  • issue8957.py3k.1.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-06-09.20:24:38.889>
    labels = ['extension-modules', 'type-bug', '3.7']
    title = "strptime(.., '%c') fails to parse output of strftime('%c', ..) in some locales"
    updated_at = <Date 2016-09-26.21:29:29.654>
    user = 'https://github.com/abalkin'

    bugs.python.org fields:

    activity = <Date 2016-09-26.21:29:29.654>
    actor = 'belopolsky'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Extension Modules']
    creation = <Date 2010-06-09.20:24:38.889>
    creator = 'belopolsky'
    dependencies = ['8915']
    files = ['17601', '17602', '20358', '20412']
    hgrepos = []
    issue_num = 8957
    keywords = ['patch']
    message_count = 18.0
    messages = ['107413', '125966', '125968', '126043', '126045', '126055', '126057', '126059', '126084', '126235', '126313', '126316', '126339', '126340', '126345', '126347', '126358', '221924']
    nosy_count = 4.0
    nosy_names = ['belopolsky', 'vstinner', 'ezio.melotti', 'rpetrov']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue8957'
    versions = ['Python 3.7']

    Linked PRs

    @abalkin
    Copy link
    Member Author

    abalkin commented Jun 9, 2010

    The following code:

    import locale, time
    locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
    t = time.localtime()
    s = time.strftime('%c', t)
    time.strptime('%c', s)

    Raises

    ValueError: time data '%c' does not match format 'Mer 9 jui 16:14:46 2010'

    in any locale where month follows day in '%c' format. Note that attached C code works as expected on my OSX laptop.

    I wonder it it would make sense to call platform strptime where available? I wonder if platform support for strptime has improved since 2002 when _strptime.py was introduced.

    @abalkin abalkin self-assigned this Jun 9, 2010
    @abalkin abalkin added extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Jun 9, 2010
    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 11, 2011

    Adding bpo-8915 as a dependency because deducing D_T_FMT locale setting from strftime output seems impossible:

    >>> locale.nl_langinfo(locale.D_T_FMT)
    '%a %b %e %H:%M:%S %Y'

    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 11, 2011

    Victor,

    You may be interested because your native language is implicated. :-)

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Jan 11, 2011

    time.strptime(s, '%c' ) ?

    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 11, 2011

    time.strptime(s, '%c' ) ?

    Oh my. It certainly took a long time to recognize a silly mistake!

    Thanks.

    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 12, 2011

    My tests were wrong but the problem does exist. I am attaching a script that tests strptime(.., '%c') for all locales installed on my system (an unmodified US Mac OS X 10.6.6).

    The only failing locale that I recognize is Hebrew (he_IL). Eli, what do you think about this?

     
    $ ./python.exe cfmt.py 
    am_ET [ማክሰ ጃንዩ 11 18:56:18 2011] %A %B %d %H:%M:%S %Y != %a %b %e %H:%M:%S %Y
    et_EE [T, 11. jaan  2011. 18:56:18] %a, %d. %B %Y. %H:%M:%S != %a, %d. %b %Y. %T
    he_IL [EST 18:56:18 2011 ינו 11 ג'] %Z %H:%M:%S %Y %B %d %a != %Z %H:%M:%S %Y %b %d %a

    @abalkin abalkin reopened this Jan 12, 2011
    @abalkin abalkin removed the invalid label Jan 12, 2011
    @abalkin abalkin changed the title strptime('%c', ..) fails to parse output of strftime('%c', ..) in non-English locale strptime(.., '%c') fails to parse output of strftime('%c', ..) in some locales Jan 12, 2011
    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Jan 12, 2011

    • %T is equal for %H:%M:%S
    • locales with %A and %B are broken on this platform as %c is "Appropriate date and time representation (%c) with abbreviations"

    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 12, 2011

    On Tue, Jan 11, 2011 at 7:26 PM, Roumen Petrov <report@bugs.python.org> wrote:
    ..

    • locales with %A and %B are broken on this platform as %c is "Appropriate date and time representation (%c) with abbreviations"

    According to what standard? POSIX defines it as

    %c Replaced by the locale's appropriate date and time representation.

    http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html

    and the manual page on my system agrees:

    %c is replaced by national representation of time and date.

    @vstinner
    Copy link
    Member

    On Linux, cfmt.py fails on fr_FR locale (the only valid locale in the list of tested locales):
    ---
    fr_FR [mer. 12 janv. 2011 11:30:35 CET] %a %d %B %Y %H:%M:%S %Z != %a %d %b %Y %T %Z
    ---

    The problem is the month format: locale.nl_langinfo(locale.D_T_FMT) returns '%a %d %b %Y %T %Z', but _strptime (LocaleTime().LC_date_time) uses '%a %d %B %Y %H:%M:%S %Z' => '%b' vs '%B'.

    _strptime.LocalTime.__calc_date_time() uses strftime('%c') and then parse the output to get the complete format. But it uses strftime('%c') with the march month, and in french, march is formatted 'mars' for both month formats (%b *and* %B).

    _strptime.LocalTime.__calc_date_time() should detect that the month has the same format with %b and %B, and try other timestamps (other months).

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 14, 2011

    Alexander, I get the same error for the he_IL locale. Will look into this

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 15, 2011

    The problem for Hebrew appears to be the same as the one Victor stated for French. March in Hebrew is also a 3-letter word which means it's equal to its abbreviation.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 15, 2011

    I'm attaching a patch for Lib/_strptime.py that handles the month differently in __calc_date_time. It cycles all months, trying to find one where the full and abbrev names are different and matches it against the timestamp created by strftime.

    This solution is a hack, but so is the whole __calc_date_time function :-) [IMHO]

    All tests pass and I also tried it manually with all the problematic locales reported by Alexander - seems to work correctly.

    If this looks OK to you guys I can commit and backport.

    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 15, 2011

    On Sat, Jan 15, 2011 at 2:20 AM, Eli Bendersky <report@bugs.python.org> wrote:
    ..

    This solution is a hack, but so is the whole __calc_date_time function :-) [IMHO]

    I am not sure how to proceed. On one hand, I opened this issue to
    demonstrate that the current implementation is flawed, on the other
    hand, Eli has succeeded in improving the hack so that we can live with
    it a bit longer. Note that I did not have any real life application
    that would misbehave because of this bug and I don't think developers
    expect %c format to be parseable in the first place.

    I made this issue depend on bpo-8915 because I think strptime should
    query the locale for format information directly rather than reverse
    engineer what strftime does.

    I don't think this fix solves all the problems. For example, in most
    locales (including plain C locale), day of the month in %c format uses
    %e format, but current implementation guesses it as %d:

    '%a %b %e %H:%M:%S %Y'
    >>> LocaleTime().LC_date_time
    '%a %b %d %H:%M:%S %Y'

    This does not seem to be an issue because strptime with %d seems to be
    able to parse space-filled as well as zero-filled numbers. However,
    there may be platforms that are less forgiving.

    On the patch itself:

    1. Unit tests are needed.

    2. Please don't use datetime as a local variable.

    3. I am not sure what the purpose of .lower() is. Are a_month and
      f_month lowercased?

    4. Please keep lines under 79 characters long.

    5. "for m in range(1, 13)" loop is better written as "for am, fm in
      zip(self.a_month, self.f_month)"

    Eli, what do you think yourself: should we try to perfect the hack or
    is it better to reimplement strptime using locale? Note that the
    latter may be a stepping stone to implementing strftime as well.

    @abalkin
    Copy link
    Member Author

    abalkin commented Jan 15, 2011

    1. datetime.find(self.f_month[m]) >= 0 -> self.f_month[m] in datetime

    Python is not C!

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 15, 2011

    Alexander,

    1. Patch comments - thanks for those. Will have them fixed.
    2. General strategy for implementing strptime. I must confess I don't fully understand the reason for doing what the _strptime module does. Standard C AFAIK has nothing of the sort - it only has strftime and strptime, both using a given format string. Neither tries to guess it from an actual formatted time! Does it exist just to circumvent platforms where strptime isn't implemented in C or is buggy? Can you please shed some light on this (or point me somewhere)?

    With understanding of (2) I will be able to also logically reason about the next steps :-)

    @AlexanderBelopolsky
    Copy link
    Mannequin

    AlexanderBelopolsky mannequin commented Jan 15, 2011

    You pretty much hit the nail on the head. Some platforms don't have strptime or did not have it at the time this code was written. The locale module is probably more recent than this code as well.

    @elibendersky
    Copy link
    Mannequin

    elibendersky mannequin commented Jan 16, 2011

    Alexander, but still - this isn't just an implementation of strptime. strptime, AFAIU strptime gets the format string as a parameter and uses it to parse a date string into a "tm" struct. So why do we need to parse a date string *without* a format string in Python, resorting to heuristics and pseudo-AI instead?

    @abalkin
    Copy link
    Member Author

    abalkin commented Jun 30, 2014

    Eli,

    Given your last comment, are you still proposing your patch for inclusion or should we take the bpo-8915 approach?

    @abalkin abalkin removed their assignment Jun 30, 2014
    @abalkin abalkin added the 3.7 (EOL) end of life label Sep 26, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 3, 2024
    In some locales (for example French and Hebrew), the default month
    used in __calc_date_time has the same name in full and abbreviated
    form. So the code failed to correctly distinguish formats %b and %B.
    
    Co-authored-by: Eli Bendersky <eliben@gmail.com>
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 3, 2024
    In some locales (for example French and Hebrew), the default month
    used in __calc_date_time has the same name in full and abbreviated
    form. So the code failed to correctly distinguish formats %b and %B.
    
    Co-authored-by: Eli Bendersky <eliben@gmail.com>
    @serhiy-storchaka
    Copy link
    Member

    I agree that using the OS strftime() or at least nl_langinfo() is more preferable, but these issues are already 14 years old. There may be issues:

    • We cannot guarantee that strftime() exists and works on all platforms, so we still need a fallback.
    • Due to a bug in nl_langinfo() (and a flaw of the locale module), it does not work on some locales with non-UTF-8 default encoding. Even if it works, it can returns not currently supported format codes like %OY or %R. So we still need a pseudo-AI as a fallback.

    So I took the Eli's patch and added numerous tests for strftime() on different locales (they were written on another occasion).

    @serhiy-storchaka
    Copy link
    Member

    I tested with many locales and found several cases in which this patch did not work:

    • In Basque, the month name is only used in %x. %c uses month number. So the month format should be searched separately for different formats.
    • In Arabic, the day of the week name and the AM/PM indicator can be found as the part of the month name. This was a regression. Thus, the month name should be replaced before most of other items. But this breaks Japanese, because the full month name "3月" is a substring of "03月" which was created as "%m月" in %c. Thus, we should check the month name is found at the same position for all months. But this breaks Morisyen, because the month name "me" is a substring of the day of the week abbreviated name "mer". Thus, we should check all positions for all months.

    There are similar bugs related to ambiguity of the day of the week name. For example, in Kashubian, the month name matches the day of the week name for the sample datetime. In Yoruba, an abbreviated day names are used with suffix "ọjọ ́", and this matches the full day name, but not all full day name have such suffix.

    Many locales fail due to space-padded numbers or non-ASCII digits.

    I am not sure what of these bugs I'll fix here and what will leave for other issues.

    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the testsas skipped if no
    appropriate locale is available.
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the test as skipped if no
    appropriate locale is available.
    serhiy-storchaka added a commit that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the test as skipped if no
    appropriate locale is available.
    miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the test as skipped if no
    appropriate locale is available.
    (cherry picked from commit 19984fe)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    serhiy-storchaka added a commit that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the test as skipped if no
    appropriate locale is available.
    (cherry picked from commit 19984fe)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the test as skipped if no
    appropriate locale is available.
    (cherry picked from commit 19984fe)
    
    Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
    serhiy-storchaka added a commit that referenced this issue Oct 8, 2024
    Run them with different locales and different date and time.
    
    Add the @run_with_locales() decorator to run the test with multiple
    locales.
    
    Improve the run_with_locale() context manager/decorator -- it now
    catches only expected exceptions and reports the test as skipped if no
    appropriate locale is available.
    (cherry picked from commit 19984fe)
    @serhiy-storchaka
    Copy link
    Member

    For now, this PR fixes strptime() for %c and %x formats in multiple locales for the following languages: Arabic, Bislama, Chuvash, Estonian, French, Irish, Gurajati, Manx Gaelic, Hebrew, Hindi, Chhattisgarhi, Haitian Kreyol, Japanese, Kannada, Korean, Marathi, Malay, Norwegian, Nynorsk, Punjabi, Rajasthani, Tok Pisin, Yue Chinese, Yau/Nungon and Chinese.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    3 participants