Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Update references and examples of old, unsupported OSes and uarches #92791

Merged
merged 4 commits into from
Jun 9, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions Doc/c-api/init_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -735,9 +735,8 @@ PyConfig

* ``"utf-8"`` if :c:member:`PyPreConfig.utf8_mode` is non-zero.
* ``"ascii"`` if Python detects that ``nl_langinfo(CODESET)`` announces
the ASCII encoding (or Roman8 encoding on HP-UX), whereas the
``mbstowcs()`` function decodes from a different encoding (usually
Latin1).
the ASCII encoding, whereas the ``mbstowcs()`` function
decodes from a different encoding (usually Latin1).
* ``"utf-8"`` if ``nl_langinfo(CODESET)`` returns an empty string.
* Otherwise, use the :term:`locale encoding`:
``nl_langinfo(CODESET)`` result.
Expand Down
3 changes: 1 addition & 2 deletions Doc/faq/library.rst
Original file line number Diff line number Diff line change
Expand Up @@ -483,8 +483,7 @@ including :func:`~shutil.copyfile`, :func:`~shutil.copytree`, and
How do I copy a file?
---------------------

The :mod:`shutil` module contains a :func:`~shutil.copyfile` function. Note
that on MacOS 9 it doesn't copy the resource fork and Finder info.
The :mod:`shutil` module contains a :func:`~shutil.copyfile` function.
serhiy-storchaka marked this conversation as resolved.
Show resolved Hide resolved


How do I read (or write) binary data?
Expand Down
17 changes: 10 additions & 7 deletions Doc/howto/sockets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -252,20 +252,23 @@ Binary Data
-----------

It is perfectly possible to send binary data over a socket. The major problem is
that not all machines use the same formats for binary data. For example, a
Motorola chip will represent a 16 bit integer with the value 1 as the two hex
bytes 00 01. Intel and DEC, however, are byte-reversed - that same 1 is 01 00.
that not all machines use the same formats for binary data. For example,
network byte order is big-endian, with the most significant byte first,
serhiy-storchaka marked this conversation as resolved.
Show resolved Hide resolved
so a 16 bit integer with the value ``1`` would be the two hex bytes ``00 01``.
However, most common processors (x86/AMD64, ARM, RISC-V), are little-endian,
with the least significant byte first - that same ``1`` would be ``01 00``.
Socket libraries have calls for converting 16 and 32 bit integers - ``ntohl,
htonl, ntohs, htons`` where "n" means *network* and "h" means *host*, "s" means
*short* and "l" means *long*. Where network order is host order, these do
nothing, but where the machine is byte-reversed, these swap the bytes around
appropriately.

In these days of 32 bit machines, the ascii representation of binary data is
In these days of 64-bit machines, the ASCII representation of binary data is
frequently smaller than the binary representation. That's because a surprising
amount of the time, all those longs have the value 0, or maybe 1. The string "0"
would be two bytes, while binary is four. Of course, this doesn't fit well with
fixed-length messages. Decisions, decisions.
amount of the time, all those integers have the value 0, or maybe 1.
The string ``"0"`` would be two bytes, while a full 64-bit word would be 8.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not fit with functions ntohl, htonl, ntohs, htons, described in previous paragraph, which only work with 16 and 32 bit integers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'm not sure it has to, since those two paragraphs are describing separate topics, and the respective integer widths are each appropriate to the point being made—the standard library functions are nominally for smaller 16 and 32-bit integers, while modern machines and Python's own native integer type are 64-bit, where encoding small integers as ASCII/UTF-8 has the greatest potential size advantage over binary, while also avoiding the need to potentially convert endianness, particularly so when the aforementioned functions are not available for 64-bit ints (though perhaps one could be added).

Is there something specific you'd like me to add/clarify here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is obvious to me that this paragraph refers to the previous one. It's mention of "all those longs" is a reference to "long" in ntohl and htonl which is always 32-bit.

Also, on most modern 64-bit platforms the standard integer type int in C is 32-bit. On Windows even long is 32-bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is obvious to me that this paragraph refers to the previous one. It's mention of "all those longs" is a reference to "long" in ntohl and htonl which is always 32-bit.

Okay, thanks for providing something specific. I replaced that wording with "most integers", as well as clarified the terminology in the following sentence as well. If there's something else specific you would like me to change, let me know.

Also, on most modern 64-bit platforms the standard integer type int in C is 32-bit. On Windows even long is 32-bit.

Yes, if by "standard" you mean the C type that has the name int. Of course, this is less relevant for Python users, given Python only has one native integer type, nominally 64 bits, and this how-to does not focus on the C API. In any case, the use of a native 64-bit integer is appropriate to making the point of the section, that with wider binary types, representing small numbers as text may actually be more efficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3 does not have a native fixed-width integer type. There was one in Python 2, but it was 32-bit on Windows.

Well, I think it is not important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3 does not have a native fixed-width integer type. There was one in Python 2, but it was 32-bit on Windows.

Sorry, I should have used clearer terminology—what I meant to imply was "native" to the language vs. third party (e.g. Numpy dtypes), but in this context "native" more conventionally means platform-native, as you say above. I also somewhat inaccurately simplified Python's int type to be a fixed-width 64-bit integer, when that's not truly the case.

Of course, this doesn't fit well with fixed-length messages.
Decisions, decisions.


Disconnecting
Expand Down
2 changes: 1 addition & 1 deletion Doc/library/mmap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ To map anonymous memory, -1 should be passed as the fileno along with the length

To ensure validity of the created memory mapping the file specified
by the descriptor *fileno* is internally automatically synchronized
with physical backing store on macOS and OpenVMS.
with the physical backing store on macOS.

This example shows a simple way of using :class:`~mmap.mmap`::

Expand Down
2 changes: 1 addition & 1 deletion Doc/library/platform.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Cross Platform

.. function:: machine()

Returns the machine type, e.g. ``'i386'``. An empty string is returned if the
Returns the machine type, e.g. ``'AMD64'``. An empty string is returned if the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it what returned on the AMD processors? On my computer it is 'x86_64'.

Update also the docstring of platform.machine().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original 'proper' name of what is also called the "x86-64" architecture is AMD64, as it was created by AMD and later adopted by Intel when Intel's IA-64 (Itanium) architecture failed in the market. What is returned depends (AFAIK) on the OS, not the CPU; Windows and many (most?) Linux distros call it AMD64 internally, while Apple and some others call it x86-64. Running a freshly built from main Python, as well as 3.9 and 3.10 release builds, platform.machine() returns AMD64 on my Windows system with a stock Intel i7-3730.

Update also the docstring of platform.machine().

I can, but as this PR currently only modifies the docs, I'd rather do that separately; there are other places in the codebase that should be updated too, for consistency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstrings are a part of the docs. If we do not update them together with the rst files, they are left desynchronized.

Copy link
Member Author

@CAM-Gerlach CAM-Gerlach May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only affects the choice of one specific example, both of which are equally accurate and valid, and which will be synchronized if and when I do a similar pass through the codebase itself. Several of the other platform functions, e.g. system(), have differing examples on each. And given the rest of this PR scrupulously avoids touching the code, adding this one trivial change has non-trivial cost, of triggering and requiring a whole suite of extra builds/CIs, and increasing risk for backporting.

Given the change was minor and not strictly required in the first place, and I almost didn't make it, if this is going to be a big issue I'll just drop this change instead, since its not at all worth the cost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way the docstrings and the documentation should be consistent.

value cannot be determined.


Expand Down
2 changes: 1 addition & 1 deletion Doc/library/posix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Large File Support

.. sectionauthor:: Steve Clift <clift@mail.anacapa.net>

Several operating systems (including AIX, HP-UX and Solaris) provide
Several operating systems (including AIX and Solaris) provide
support for files that are larger than 2 GiB from a C programming model where
:c:type:`int` and :c:type:`long` are 32-bit values. This is typically accomplished
by defining the relevant size and offset types as 64-bit values. Such files are
Expand Down
7 changes: 4 additions & 3 deletions Doc/library/struct.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,9 +146,10 @@ If the first character is not one of these, ``'@'`` is assumed.

Native byte order is big-endian or little-endian, depending on the host
system. For example, Intel x86 and AMD64 (x86-64) are little-endian;
Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature
switchable endianness (bi-endian). Use ``sys.byteorder`` to check the
endianness of your system.
IBM z and most legacy architectures are big-endian;
and ARM, RISC-V and IBM Power feature switchable endianness
(bi-endian, though the former two are nearly always little-endian in practice).
Use ``sys.byteorder`` to check the endianness of your system.

Native size and alignment are determined using the C compiler's
``sizeof`` expression. This is always combined with native byte order.
Expand Down