GCD Internals
The undocumented side of the Grand Central Dispatcher
Jonathan Levin, http://newosxbook.com/ - 02/15/14
About
The book touches very little on Apple's Grand Central Dispatcher, which is becoming the de-facto standard for multi-threaded applications in OS X and iOS, as it pushes aside the venerable (and standard) pthread APIs. While I do discuss the kernel support for GCD (Chapter 14, pg. 550, "Work Queues"), the implementation has changed considerably as Apple has added a new SPI in Mountain Lion (XNU 2050)/iOS 6, and has completely externalized pthread functionality to the pthread kernel extension in Mavericks and iOS 7. The pthread support in user mode has also been moved (as of OS X 10.9) to
On (yet another) flight to PVG, where I have to deliver a presentation on (among other things) GCD internals, I figured I might as well make public the information, in my attempt to keep the book as updated as possible for readers such as yourself. This article covers libdispatch versions 339.1.9 (OS X 10.9, for which the source is available), 354.3.1 (iOS 7, no source), and XNU 2050 (OS X 10.8) and 2423 (~ OS X 10.9/iOS 7).
Why should you care? (Target Audience)
Arguably, most developers couldn't care less about the implementation of GCD, as it "magically" provides concurrency and scheduling support. I'm more of the view that all "magic" has a logical explanation, and this is what I aim to provide here. Unlike other articles I've posted thus far, which can come in quite handy when you develop apps, this article is more of a deep dive into the esoteric. So maybe you do care, or maybe you don't. That's for you to decide. Me, I still have 11 hours to kill.
I: User Mode (libdispatch)
The Grand Central Dispatcher is implemented in <dispatch/dispatch.h>
and friends) and the man pages (q.v. dispatch(3)
). The rest of this article builds on those references as a foundation, though in a nutshell, the process for using GCD can be summarized as follows:
- GCD offers the application several global dispatch queues, of different priorities:
DISPATCH_QUEUE_PRIORITY_HIGH
(2),_DEFAULT
(0),_LOW
(-2) and_BACKGROUND
(-32768). The queues are scheduled in decreasing priority. The_BACKGROUND
queue is also run on a background thread (i.e. priority of about 4), with I/O throttling. These queues are obtained bydispatch_get_global_queue(priority, flags)
, with the only supported flag beingDISPATCH_QUEUE_OVERCOMMIT
. - An application also has a main thread queue, which can be obtained by a call to
dispatch_get_main_queue
. This is the queue served by the well knownCF/NSRunLoop
constructs. - An application can create additional queues using
dispatch_queue_create (label, attr)
. The label is an optional name (which can be obtained bydispatch_queue_get_label
and debugging tools), and attr is eitherDISPATCH_QUEUE_SERIAL
(1-by-1, FIFO) orDISPATCH_QUEUE_CONCURRENT
(parallelized execution) , controlling the execution of blocks. What Apple doesn't mention here is that (as of 10.9/7) there is also adispatch_queue_create_with_target
, specifying a third argument of an already existing queue, to serve as the target queue. - To schedule work, an application can call one of the following functions:
dispatch_async[_f]
: Sending a block or function (_f) to the queue specified. Execution is asynchronous, "as soon as possible".dispatch_sync[_f]
: Sending a block or function (_f) to the queue specified, and blocking until exection completes. Note that this doesn't necessarily mean the block or function will be executed in the current thread context - only that the current thread will block (that is, hang) so as to synchronize execution with the block or function.`
- GCD also supports dispatch sources. These can be created with
dispatch_source_create
, which takes four arguments: a source type, a (type-dependent) handle, a (type-dependent) mask of events to handle, and a queue on which the handler will run. The handler itself is set withdispatch_source_set_event_handler[_f]
, after which the source may be started with a call todispatch_resume
.
The root and predefined queues
What the Apple documentation refers to as "global" queues (in the sense of being global to the application, requiring no initialization), libdispatch calls "root" queues. The queues are hard-coded in an array (Index | serial # | queue name |
---|---|---|
0 | 4 | com.apple.root.low-priority |
1 | 5 | com.apple.root.low-overcommit-priority |
2 | 6 | com.apple.root.default-priority |
3 | 7 | com.apple.root.default-overcommit-priority |
4 | 8 | com.apple.root.high-priority |
5 | 9 | com.apple.root.high-overcommit-priority |
6 | 10 | com.apple.root.background-priority |
7 | 11 | com.apple.root.background-overcommit-priority |
The implementation of dispatch_get_global_queue
calls the internal _dispatch_get_root_queue
with the same arguments, which returns the approriate queue from the _dispatch_root_queues
array, mapping the priority code to an index of 0 (LOW),2 (DEFAULT),4(HIGH) or 6(BACKGROUND), or their off-by-one odd numbers if OVERCOMMIT was specified. Application created queues (i.e. dispatch_queue_create
) are always mapped to the low priority queue (index 0), with the serial queues created with overcommit (index 1)
Looking at the above table you might wonder about why the queues' serial numbers start at 4. This is because libdispatch also creates a queue for the application's main thread - com.apple.main-thread
(Serial #1, from init.c
), and uses internal queues for its own management: com.apple.root.libdispatch-manager
(Serial #2), and com.apple.libdispatch-manager
(Serial #3). Serial #0 is unused.
Dispatch Queue implementation
The dispatch queue is implemented is defined in
The dispatch queue starts by including the DISPATCH_STRUCT_HEADER
- as all dispatch objects do. This common header consists of an OS_OBJECT_HEADER
, (which provides the object operations table (vtable), and reference count), and several more fields, including the target queue (settable by dispatch_set_target_queue). The target queue is one of the root queues (usually the default one). Custom queues as well as dispatch sources thus eventually get coalesced into the root queues.
Then dispatch queue then follows with its subclass fields: DISPATCH_QUEUE_HEADER
, and the DISPATCH_QUEUE_CACHELINE_PADDING
. The latter is used to ensure that the structure can fit optimally within the CPU's cache lines. The former (DISPATCH_QUEUE_HEADER
) is used to maintain the queue metadata, including the "width" (# of threads in pool), label (for debugging), serial #, and work item list. The annotated header is shown below:
Note that Queues are not threads!. A single queue may be served by multiple worker threads, and vice versa. You can easily see the internals of GCD by using lldb on a sample program, say something as crude as:
By placing a breakpoint inside a block, you'll see something similar to:
When I tried this code in my 10.9 VM, the same breakpoint caught the main thread in the act of dispatching - before the dispatch_group_wait
:
This isn't due to 10.9's GCD being different - rather, it demonstrates the true asynchronous nature of GCD: The main thread has yet to return from requesting the worker (which it does by pthread_workqueue_addthreads_np
, as I'll describe later), and already the worker thread has spawned and is mid execution, possibly on another CPU core. The exact state of the main thread with respect to the worker is largely unpredictable.
Note another cool feature of GCD is that the queue name in thread #2 has been set to the custom queue. GCD renames the root queues when they are working on behalf of custom queues, like in this example), in a way that is visible to lldb. I'm working on adding this functionality to process explorer. In case you're wondering why "dispatch_worker_thread2
" is used - that's because libdispatch defined three worker thread functions: the first, for use when compiled with DISPATCH_USE_PTHREAD_POOL
. The second (this one), for use with HAVE_PTHREAD_WORKQUEUE_SETDISPATCH_NP
, and the third for HAVE_PTHREAD_WORKQUEUES
. The second also falls through to the third.
Dispatch Sources
A key function of dispatch queues is connecting them to dispatch sources. These enable an application to multiplex multiple event-listeners, much as would traditionally be provided by select(2)
, but with a far wider support of event sources - from file descriptors, through sockets, mach ports, signals, process events, timers and event custom sources.
All of the myriad sources are built on top of the kernel's kqueue mechanism. The type
argument to dispatch_source_create
is, in fact, a struct dispatch_source_type_s
pointer, defined in
A dispatch source can be thought of a special case of a queue. The two are closely related, and the former is a "subclass" of the latter, as can be seen by the definition:
The dispatch_source_create
function operation is straightforward: following validation of the type
argument, it allocates and initializes a dispatch_source_s
structure, in particular populating its ds_dkev with the kevent() parameters passed to the function.
Internally, most (if not all) sources eventually get triggered by kevent()
. I cover this important syscall in both chapter 2 (page 57) and 14 (500 pages later..). This means that most sources use the same kqueue. Most, with the exception of Mach sources, which use Mach's request_notification
mechanism.
You can see this for yourself by using lldb on a program or daemon which uses dispatch sources. One example to debug is diskarbitration:
When a source does fire, the libdispatch-manager triggers the callback on another thread (via dispatch_worker_thread2
, as usual, though it goes on to call dispatch_source_invoke, resulting in a slightly different stack). This way, the manager thread remains available to process events from other sources.
II: Still in User Mode (pthread)
GCD, contrary to the impression one might get, does not replace threads - it builds on them. The underlying support for libdispatch is still the venerable POSIX threads library (pthread), though most of the support comes from non-POSIX compliant Apple extensions (which are easily identifiable by the _np
suffix in function names. Most of those functions were silently introduced in Leopard (10.5), with others added in 10.6, as GCD was formerly introduced. The API, however, has undergone significant changes, making it a moving target.
To exacerbate matters, though the Apple pthread implementation was formerly a part of LibC, (thus open source), this has changed as of OS X 10.9 (somewhere between APPLE_PRIVATE
APIs) I guess they figure developers were forewarned.
The last open source implementation of pthreads, therefore, is that of 10.8 (_np
calls. 10.9 changes the API further, and it seems like it might take a while before the dust settles. This is also evident in the code of libdispatch, in the sections defined DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK
, though as of 10.8 the legacy interface has effectively been removed: Both libdispatch and pthreads check if the kernel supports the new interface (referred to as the "New SPIs"), and return an error if that is not the case.
The non standard pthread extensions provided by Apple were, surprisingly enough, documented - not by Apple, but by FreeBSD man pages, since GCD has been ported to it. Apple, however, effectively drops almost of all those extensions in favor of new ones, as shown in the following figure:
Since virtually the entire "legacy" API has been eradicated, let's focus on those functions which did make the cut:
Function | Notes |
---|---|
pthread_workqueue_addthreads_np | Add numthreads to workqueue of priority queue_priority, according to options. The only options supported is WORKQ_ADDTHREADS_OPTION_OVERCOMMIT . As you could see in Output 2, this call will asynchronously spawn the worker threads. |
pthread_workqueue_setdispatch_np | - Sets the dispatch worker function (always worker_thread2) - Makes sure new SPI is supported - Calls workq_open() |
pthread_workqueue_setdispatchoffset_np | A new addition to the API (10.9) Used by libdispatch when setting up the root queues, and passes the offset of the dq_serialnum member relative to the dispatch_queue_s struct. |
As you can see, there is no longer a way to manipulate most aspects of work queues via pthreads. Whereas before pthread exported an _additem_np
(which would enable scheduling of a work item), this has been removed in favor of _addthreads_np
, and the work function itself is set by _setdispatch_np
, normally once per process instance, during libdispatch's root_queue_init()
. This means that the actual work queue thread pool management is handled by the kernel.
Work queue diagnostics
Apple's fantabulous yet undocumented proc_info
syscall (#336), which I laud so much in the book, also has a PROC_PIDWORKQUEUEINFO
code (#12). It provides a very high level view of the workqueue, as shown here:
The latest version of my Process Explorer (v0.2.9 and later) automatically displays associated work queue information, if work queues are detected in the process whose information you are querying.
III: Kernel support (workqueues)
System call interface
As stated in the bookworkq_open
(#367) and workq_kernreturn
(#368). Though the system calls remain constant, their implementation has changed with 10.8/6 and the introduction of the "new SPI". Beginning with 10.9/7, the implementation of the system calls has moved to the bsdthread_register
. You can find the definitions in There's a reason why all three have NO_SYSCALL_STUB
: Like other (crazy useful) syscalls in XNU, Apple doesn't want you to use them. If XNU weren't open source, nobody but Apple would like know how to use them, either.
workq_open
works in essentially the same way it has before. workq_kernreturn
, however, has been completely modified: Rather than offering the WQOPS
discussed in the book as options, the new SPI deprecates them all but WQOPS_THREAD_RETURN
, and instead offers two new others:
WQOPS_QUEUE_NEWSPISUPP
(0x10), which is used to check for SPI support - and merely returns 0 if supported.WQOPS_QUEUE_REQTHREADS
(0x20). This code requests the kernel to run n more (possibly overcommited) requests of a given priority. The value of "n" in passed in the "affinity" argument, with theitem
argument (formerly used to pass the user mode address to execute forWQOPS_QUEUE_ADD
) is ignored.
The kernel workqueue implementation
Kernel workqueue support was in
@TODO: detail more about work queue implementation..
sysctl variables
The kernel exports several variables to control work queues. These are basically the same as those of FreeBSD, and are exported by the kernel proper (pre 10.9/7) or by
sysctl variable | controls |
---|---|
kern.wq_yielded_threshold | Maximum # of threads that may be yielded |
kern.wq_yielded_window_usecs | Yielded window size |
kern.wq_stalled_window_usecs | Maximum # of usecs thread can not respond before it is deemed stalled |
kern.wq_reduce_pool_window_usecs | Maximum # of usecs thread can idle before the thread pool will be reduced |
kern.wq_max_timer_interval_usecs | Maximum # of usecs between thread checks |
kern.wq_max_threads | Maximum # of threads in the work queue |
kdebug codes
As with all kernel operations, the workqueue mechanism is laced with KERNEL_DEBUG macro calls, to mark function calls and arguments. Unlike other calls, however, the macros often define the debug codes as hex constants, rather than meaningful names. Unsurprisingly, the codes aren't listed in CoreProfile, either. I'm working on adding these to my kdebugView tool. I still need to delve into the "how" of kernel mode - so Updates will follow. Me, I need to get off this flight already.
- Usage of sysctl vars inside pthread_synch
- flow of
workqueue_run_nextreq
wq_runreq
andsetup_wqthread
- Kdebug constants..
References
- Concurrency Programming Guide:
- GCD Reference:
- My book