Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make encoding lists public #47

Closed
wants to merge 1 commit into from
Closed

Make encoding lists public #47

wants to merge 1 commit into from

Conversation

getreu
Copy link

@getreu getreu commented Jan 13, 2020

I recently migrated Stringsext, a GNU Strings Alternative with Multi-Byte-Encoding Support from rust-encoding to encoding_rs.

The Stringsext tool prints la list of supported encoding names. As the lists in encoding_rs are not public, I had to copy them in my source code, which is an error prone duplication of code I would like to avoid.

I recently migrated [Stringsext, a GNU Strings Alternative with Multi-Byte-Encoding Support](https://github.com/getreu/stringsext) from [rust-encoding](https://github.com/lifthrasiir/rust-encoding) to [encoding_rs](https://github.com/hsivonen/encoding_rs/).

The Stringsext tool prints la list of supported encoding names. As the lists in `encoding_rs` are not public, I had to copy them in my source code, which is an error prone duplication of code I would like to avoid.
@hsivonen
Copy link
Owner

I don't want to commit to making the internal representation of these tables public. If there are compelling use cases, maybe encoding_rs could provide an iterator over the known labels. However, I haven't provided that kind of API so far, because I haven't been aware of a proper use case.

Dumping the list as a matter of documentation as opposed to something that an application actually operates on is somewhat of a different case. I'm not particularly keen on doing that either, in order not to suggest non-preferred labels to users.

I'll think about this a bit.

@Mingun
Copy link

Mingun commented Aug 20, 2022

Maybe just expose a constant with all implementing encodings?

pub static ALL_ENCODINGS: [&'static Encoding; 40] = [
    BIG5,
    EUC_JP,
    EUC_KR,
    GB18030,
    GBK,
    IBM866,
    ISO_2022_JP,
    ISO_8859_2,
    ISO_8859_3,
    ISO_8859_4,
    ISO_8859_5,
    ISO_8859_6,
    ISO_8859_7,
    ISO_8859_8,
    ISO_8859_8_I,
    ISO_8859_10,
    ISO_8859_13,
    ISO_8859_14,
    ISO_8859_15,
    ISO_8859_16,
    KOI8_R,
    KOI8_U,
    MACINTOSH,
    REPLACEMENT,
    SHIFT_JIS,
    UTF_8,
    UTF_16BE,
    UTF_16LE,
    WINDOWS_874,
    WINDOWS_1250,
    WINDOWS_1251,
    WINDOWS_1252,
    WINDOWS_1253,
    WINDOWS_1254,
    WINDOWS_1255,
    WINDOWS_1256,
    WINDOWS_1257,
    WINDOWS_1258,
    X_MAC_CYRILLIC,
    X_USER_DEFINED,
];

@ArcticLampyrid
Copy link

However, I haven't provided that kind of API so far, because I haven't been aware of a proper use case.

A case:
For legacy zip files, they use OEM encoding without any charset information stored. To handle such files, we need to let users choose an encoding, then we need to get all labels to show a pop-up list.

@hsivonen hsivonen deleted the branch hsivonen:master September 19, 2024 05:14
@hsivonen hsivonen closed this Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants