Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyInterpreterState.config.int_max_str_digits Should Not Be Modified #98417

Open
ericsnowcurrently opened this issue Oct 18, 2022 · 11 comments
Open
Assignees
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@ericsnowcurrently
Copy link
Member

PyConfig is for configuring the runtime during initialization, not for storing runtime state. Currently in sys.set_int_max_str_digits() we are modifying the interpreter's PyConfig directly.

That field should never be modified outside runtime init. Instead, the config value should be copied to a field on PyInterpreterState and that is what should be modified by sys (and used in longobject.c).

In fact, this is mostly what we were doing until a couple weeks ago (and what we still are doing in 3.11 and earlier). This was changed on main (3.12) in gh-96944. The fix is relatively trivial.

CC @gpshead

@ericsnowcurrently ericsnowcurrently added type-bug An unexpected behavior, bug, or error 3.12 bugs and security fixes labels Oct 18, 2022
@gpshead
Copy link
Member

gpshead commented Oct 18, 2022

I'm pretty sure I had a version of the code doing exactly this originally and was guided towards making it simpler by not duplicating the field between interp->config and interp itself. The resolved conversation #96944 (comment) alludes to that at least.

Do we document the semantics of PyConfig being read-only anywhere?

@gpshead gpshead self-assigned this Oct 18, 2022
@ericsnowcurrently
Copy link
Member Author

@gpshead

Do we document the semantics of PyConfig being read-only anywhere?

I agree about documenting the read-only-after-init constraint somewhere. I was thinking of a comment right above the PyConfig declaration (in initconfig.h). However, you just ran into the mutability situation. What would have been the best way to alert you to the constraint?

@ericsnowcurrently
Copy link
Member Author

To be clear, I'm operating under the assumption that the cached PyConfig copy should be treated as read-only after init. That is what PEP 432 implies, it's what makes sense to me (config is config and state is state), and @zooba agreed in a private conversation. From #96944 (comment), it sounds like that might not be a universal opinion though.

FWIW, PEP 587 doesn't say anything about read-only-after-init, but PEP 432 (from which 587 was partially derived) says:

These are snapshots of the initial configuration settings. They are not modified by the interpreter during runtime

The C-API docs don't say anything about config mutability-after-init. That may be fine though, since the fact that we cache a copy of the original config on each interpreter state is an internal implementation detail.

@ericsnowcurrently
Copy link
Member Author

The resolved conversation #96944 (comment) alludes to that at least.

That comment is more about preserving the original config value vs. updating during init, isn't it? The question of the source of the information during runtime (and where mutable state lives) seems like a separate one.

@gpshead
Copy link
Member

gpshead commented Oct 19, 2022

Updating the PyConfig struct definition comment to say it should be treated read-only after init makes sense to me.

I'd have seen that when making changes. I'm much less likely to go back and read PEPs as they're more often initial design ideas rather than policies and actual implementation.

@vstinner
Copy link
Member

What are advantages of having a read-only PyConfig? For me, it's fine to modify it. But if its value is copied somewhere (ex: sys module), it's good to attempt to keep both consistent (whenever possible). Currently, _testinternalcapi.set_config() always copies PyConfig.module_search_paths to sys.path, erasing sys.path change done after early Python init, which makes the function annoying in practice and that's why I don't want to expose it in public.

My main motivation to design PyConfig was to put all inputs at the same place. My ideal would be that PyConfig would be the only input to initialize Python.

But once Python is initialized, for me, it's convenient to reuse this structure for a few variables. The only case I'm aware where the "initial" value matters is sys.orig_argv that I added recently (Python 3.10), since Python makes changes to build sys.argv. Duplicating values between PyConfig and PyInterpreterState can lead to inconsistencies and wasting memory, no?

@zooba
Copy link
Member

zooba commented Oct 20, 2022

We should've (and still can) allow PyConfig to be fully provided by the caller, rather than making them go via our memory management functions. This would be very convenient for some situations, but relies on us not modifying the structure at all.

Honestly, we shouldn't even copy it around the way we do. It should exist once, and if we need to refer back to it we can get to it through a pointer (bearing in mind that the host app owns it, not us, and so they might have modified it themselves). But really, it just looks like a set of parameters to Py_Initialize (and friends), rather than a runtime structure. If it doesn't survive beyond the initialize call, we should be able to handle that.

carljm added a commit to carljm/cpython that referenced this issue Oct 20, 2022
* main: (40 commits)
  pythongh-98461: Fix source location in comprehensions bytecode (pythonGH-98464)
  pythongh-98421: Clean Up PyObject_Print (pythonGH-98422)
  pythongh-98360: multiprocessing now spawns children on Windows with correct argv[0] in virtual environments (pythonGH-98462)
  CODEOWNERS: Become a typing code owner (python#98480)
  [doc] Improve logging cookbook example. (pythonGH-98481)
  Add more tkinter.Canvas tests (pythonGH-98475)
  pythongh-95023: Added os.setns and os.unshare functions (python#95046)
  pythonGH-98363: Presize the list for batched() (pythonGH-98419)
  pythongh-98374: Suppress ImportError for invalid query for help() command. (pythongh-98450)
  typing tests: `_overload_dummy` raises `NotImplementedError`, not `RuntimeError` (python#98351)
  pythongh-98354: Add unicode check for 'name' attribute in _imp_create_builtin (pythonGH-98412)
  pythongh-98257: Make _PyEval_SetTrace() reentrant (python#98258)
  pythongh-98414: py.exe launcher does not use defaults for -V:company/ option (pythonGH-98460)
  pythongh-98417: Store int_max_str_digits on the Interpreter State (pythonGH-98418)
  Doc: Remove title text from internal links (python#98409)
  [doc] Refresh the venv introduction documentation, and correct the statement about VIRTUAL_ENV (pythonGH-98350)
  Docs: Bump sphinx-lint and fix unbalanced inline literal markup (python#98441)
  pythongh-92886: Replace assertion statements in `handlers.BaseHandler` to support running with optimizations (`-O`) (pythonGH-93231)
  pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `_test_multiprocessing.py` (pythonGH-93233)
  pythongh-92886: Fix tests that fail when running with optimizations (`-O`) in `test_py_compile.py` (pythonGH-93235)
  ...
@vstinner
Copy link
Member

vstinner commented Nov 3, 2022

Currently, PyConfig sits between two chairs: it's used to initialized Python (as expected), but it also stays during the whole Python lifecycle.

I hesitated to build a separated structure to store this configuration in PyInterpreterState: it would avoid storing things which become useless once Python is initialized. The problem is that it caused a lot of code to be duplicated, since we would have two similar but different structures. I chose to put PyConfig simply to keep the code as simple as possible. So yeah, we waste a few bytes in memory, but a single structure has to be handled in the C code.

Keeping PyConfig around also makes Py_NewInterpreter() simpler: it just copies the PyConfig from the current interpreter. By the way, _Py_NewInterpreterFromConfig() allows to run an interpreter with a different configuration.

Honestly, we shouldn't even copy it around the way we do.

Py_InitializeFromConfig() is a wild beast: it changes the memory allocator with implicit Python pre-initialization. If we do everything in a single PyConfig, the ownership of memory allocated on the stack is not well defined, and it's also unclear which memory allocator should be used to release the memory.

Moreover, PyConfig design is to collect all data to initialize Python without modifying Python, and then "write" the configuration: _PyConfig_Write(). This design also allows to override/replace PyConfig at runtime, currently implemented as _testinternalcapi.set_config(). This feature is also used by Modules/getpath.py if I followed correctly.

Yeah, PyConfig API is weird and surprising, but they are reasons for its special design :-)

@zooba
Copy link
Member

zooba commented Nov 7, 2022

Keeping it around is great! I love it! Modifying it after we've initialised is the issue. It needs to be treated as read-only, because it's not our memory.

And really, getpath shouldn't modify it either, but should go directly to setting up sys itself. We still keep the config around, but would need to re-run getpath if we're going to base new interpreters off the original config rather than the current setup of the interpreter creating them. However, we're now committed to modifying the original config for users (well, for our own tests), and I didn't want to block the getpath rewrite on a deprecation cycle.

@vstinner
Copy link
Member

vstinner commented Nov 7, 2022

It needs to be treated as read-only, because it's not our memory.

I don't get it: PyInterpreterState.config is a PyConfig structure, not a pointer to somewhere else. What do you call "our memory"?

And really, getpath shouldn't modify it either, but should go directly to setting up sys itself.

Well. Python has a long history and the initialization is complicated. You need many things to compute the path configuration, and the sys module is created "in the middle". There is always room for enhancement :-) On my side, I gave up at some point since I already sent like 1 or 2 years to deeply reorganize the code to make it "better" ;-)


I'm not strongly against duplicating variables to keep some PyConfig members as read-only, or even try to make the whole PyConfig structure read-only. I'm just trying to get the advantages compared to the current state where some members are copied (ex: to the sys module), and some PyConfig members are read and modified in-place.

Anyway, I'm glad that getpath is now implemented in pure Python. In that it was the main motivation for @ncoghlan and @ericsnowcurrently, at least at the beginning, to reorganize the Python initialization. My goal was just to fix my implementation of the -X utf8 command line option which has to parse the command line twice if the encoding change... but then I fall into a giant trap, and I rewrote most of the initialization code to try to make it more consistent :-)

Maybe PyConfig should really and only belong to Python initialization.

@zooba
Copy link
Member

zooba commented Nov 7, 2022

I see, it gets deep copied into every interpreter.

Even so, I don't think we should blur the line between interpreter state and interpreter config. Right now, that's what the issue is, and the fix is to move int_max_str_digits into interpreter state.

Whether we keep a copy of the config around is a separate discussion, probably to be driven by perf or memory needs rather than anything else.

@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants