This will work as long as you give it an initial empty Counter to start with (otherwise it starts with 0 and complains)

1 Like

Ah thanks. Anyhow, the idea is to minimize the work done in the single-threaded “gather” phase at the end, by having each thread individually count in a lock-free way.

1 Like

I don’t think that is true. If free threading is possible, the cat will be out of the bag, even developers that only cares about single threaded work will still be affected by threading issues. If a library starts a thread in the background for whatever reason, they can cause threading issue in my code even though I never subscribed for having threading problems.

Many libraries that had async-to-sync bridges spawns threads to simulate async tasks. Django, FastAPI, SQLAlchemy is just a few off the top of my head. And then there’s tools like IPython that starts a couple background threads for who knows what reasons.

Multithreading has a reputation for being hard. But really, I think they are considered hard because of the existence of free threading. Languages like Rust that doesn’t have free threading (or to be more precise, it has an almost free threading with some severe restrictions) actually fared better at making multithreading a lot easier to use.

The arena-based threading with subinterpreters I had mentioned earlier would be that similar sort of that almost-free threading with forced discipline.

One way to think of arena-based threading is that it’s basically like a dynamic/runtime borrow checker, enforcing acquisition of the arena locks before working with any objects owned by the arena. I think it can even be flexible enough to allow future experimentation with non standard arenas that have different borrowing rules.


There are language-level tools like golang’s race condition detector, thread sanitizer, etc, which take the common mistakes and test for them. It’s also possible someone could implement something like a borrow checker or thread safety heuristics on top of python’s type system, e.g. with passthrough types along the lines of Mutable / Immutable / Shared / Local, and auditing nonlocal variable access or object types passed into threads.

This wouldn’t be the case with my proposal to make threads take a voluntary lock by default. In a sense, you could leave something like the GIL in place, but make it safe to release for specific threads while accessing python code/objects.

I don’t think anyone has demonstrated how this would happen, and I’d view it as a fundamental flaw in the implementation if it could.

I’m not saying it’s impossible, but I think it would be useful to have specific examples–even if only theoretical–before it’s considered a significant problem.

1 Like

I guess this is a bit too open ended. I think the thread in question can only interact with your code unintentionally if you happen to share a resource with that thread in an unsafe way, furthermore to be scary it would need to be an unsafe way that isn’t possible today. Even with the GIL another thread can already do a lot of things, like mess with your file descriptors, stdout, signals. And threads sharing access to any variable is already inconsistent for non-atomic ops at the Python level.


Can you provide a (hypothetical?) example?

Yep, CPython can switch threads in the middle of these two operations. So, there’s a problem.


Again, of course. But I understood that @pf_moore made the very fine point that due to specialisations we are discussing here (e.g. BINARY_SUBSCR_DICT), and hence the GIL, things which are nominally not thread-safe are effectively so in current CPython, because they are specialised to a single native instruction. And this, I think, only needs an explicit specification for whether or not such operations are to be considered effectively “atomic” or not. Otherwise, yes, these are just undiscovered bugs, currently protected by a CPython implementation detail.

… of a library starting a background thread. Not exactly a library, but has

    sockthread = threading.Thread(target=manage_socket,
                                  args=((LOCALHOST, port),))

as a result of which threading.activecount() and theading.enumerate() returns are greater when running on IDLE. Someone once asked why the difference on Stackoverflow. (I have not idea whether no-gil will require any change to manage_socket or that chance of user code having a problem.)

import threading

BY_HOW_MUCH = 1_000_000

class Incrementor:
    def __init__(self):
        self.c = 0

def incr(incrementor, by_how_much):
    for i in range(by_how_much):
        incrementor.c += 1

incrementor = Incrementor()

threads = [
    threading.Thread(target=incr, args=(incrementor, BY_HOW_MUCH))
    for i in range(THREAD_COUNT)

for t in threads:

for t in threads:


prints 3 million when ran it on 3.10. Does it mean you can rely on += being atomic when writing Python code? No! If you run it on 3.9 it prints between 1.5 and 2 million. Soon a Faster CPython team member can swoop in and change (not break!) it again.

BTW if Java and .net developers can have free threads, then so can we.


The word “can” here translates to (potentially) decades of work, which was the case for Java:

Yes we “can” (and likely should), but it requires serious commitment, and off-hand “others do it too” is not helpful here.

In the context Java and threading, it’s worth noting how threads commonly need quite a lot of developer-facing infrastructure (e.g. thread pools) that’s probably very hard to make beginner-friendly / “Pythonic”, and that they’re on a similarly large multi-{year,person} effort to move from free threads to virtual threads under Project “Loom” (where – arguably – the boundaries to async programming start getting blurred), and encapsulating a lot of that in simpler interfaces through “structured concurrency”, which we have already (at least through trio).

All that to say: if we argue “Java can”, then we should also look at where those choices have led them, and what they consider as “moving forward” from there. But realistically, we’re very far from an apples-to-apples comparison in any case, and it’s better to leave that rhetorical tool hanging in the shed.


I meant it from the user perspective, referring to “maybe PEP-684 is better, because it’s safer” part of the discussion. For years Python was “parallel, but…” and now adding subinterpreters will help, but it won’t solve the entire problem.

PEP-703 on the other hand, goes pretty much all the way (though stop-the-world GC may still be a limitation) for those willing to learn how to use it and for those who already have experience with threads from other languages. Python “popularity” will increase with projects choosing it for a multicore program when it will become an option.

Will some users hurt themselves with free threading? I’ve been tracking nogil for a long while now and from what I’ve seen, for someone who writes threadsafe code already (but not native extensions) it will be really hard to run into trouble.

1 Like

There are some multithreading traps inside glibc (relevant for Linux) that are unfortunate and sort of perennial issues (not just in Python!). For example getenv and setenv; glibc maintains that multithreaded programs must not use setenv, it’s not thread safe.

A library could start a thread, and the library wants to and would use C getenv in this thread (getenv is allowed according to glibc in a multithreaded program, following the usual logic).
The user’s program then has a threading issue: they must not use setenv, that could possibly cause segfaults (Python has setenv interfaces through os.environ and os.putenv).

(Does this issue exist in Python already today? Is there some mitigating factor that I don’t know about? How do subinterpreters deal with this? It would be great if C getenv/setenv had a major revision to be somewhat compatible with threading.)

1 Like

I just want to throw in a use case which has not yet been discussed here and in the discussions of PEP 703: GUI toolkits.

GUI toolkits are naturally using threads and therefore an approach where free threading is replaced by a different concept like sub interpreters or multiprocessing is problematic in the design & architecture of GUI applications written in python (because the toolkits being exposed in python are not aware of such concepts and most likely offer plain threading for offloading computational workload from the GUI). I’m a user of the PySide (Qt for python) project, and the PySide devs did struggle with the GIL as explained here Qt for Python 5.15.0 is out! (and also the links inside the document).

The problem of when it’s better to release or not to release the GIL in the C extensions of GUI toolkits is not straightforward, sometimes counter-intuitive (at least to me) and often there is a compromise involved depending on most common use cases but with drawbacks for other use cases.

Moreover, when creating GUI applications in python, you start struggling with the GIL when you have huge workloads happening in the background. My use case is a computer vision GUI which acts as a monitor and development environment for remote embedded systems. It is similar to what @lunixbochs reported for the realtime audio use case – keeping latencies and stutters on an acceptable level is unnecessarily difficult when you have to fight against the GIL mechanism.


Wouldn’t this be a potential problem today? “Not thread-safe” doesn’t mean “not thread-safe only when used in a free-threaded environment.”

I’m not trying to be difficult, maybe a bit pedantic. The presence of the GIL can obscure threading bugs or make them rear their ugly heads less often, but it doesn’t make code thread-safe. @colesbury’s work to remove the GIL has done a lot to remove places in the interpreter, stdlib, and some third-party libraries that relied on the GIL (knowingly or not). Most (all? almost all?) of that work will have been in C code, not Python code.


I think it could be a problem today, don’t know, I have the same question. I’m here to be curious.

Again as an interesting anecdotal data point here is a port of that troublesome C code to Python:
Here’s how it runs:

  • Using Python 3.11.3 this program loops for a long time without problem
  • I compiled and used nogil-3.12 commit 4526c07caee8f2e (current tip of the repo)
    and it runs 1-2 seconds before it segfaults in getenv just like the C code from 2017.

This is a reduced example. It doesn’t look like a normal Python program, it has a strange shape so that it can reproduce a crash easily. But the fundamental elements can occur in normal Python programs – various C calls that libraries use that use getenv – let’s say mktime to use the example from that blog post – and for setenv we have plain interface to it in os (os.getenv is not a plain interface to C getenv).

Yes, it is a problem today, without free threading. getenv + setenv thread safety is a problem for Python applications I run at work. We had to do a bunch of whackamole to work around segfaults resulting from extension libraries using getenv + setenv (for a while we gave up and used a terrible LD_PRELOAD hack)



At the high level this is the kind of thing I’d love someone to try creating for per-subinterpreter-GIL use! This is also quite hard, but I assume there are interested folks out there.

Intuitively I expect this winds up being the same problem that needs to be solved for free threading (which PEP-703 appears to do): our pure reference counting model is the most significant reason we have a GIL – in order to share objects between multiple threads you need to make the reference counts work without that single lock.

Someone really needs to try creating explicitly shared objects implementation for CPython and subinterpreters to prove or disprove it’s actual utility. In the absence of that, I wouldn’t point to it and suggest it is a better solution. I consider it an open avenue of future work. (Even if we get free threading, performant explicit sharing would be something I expect many would appreciate having.)

1 Like

We’ve had a chance to discuss this internally with the right people. Our team believes in the value that nogil will provide, and we are committed to working collaboratively to improve Python for everyone.

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.


Read More