GCD Internals
The undocumented side of the Grand Central Dispatcher
Jonathan Levin, http://newosxbook.com/ - 02/15/14
About
The book touches very little on Apple's Grand Central Dispatcher, which is becoming the de-facto standard for multi-threaded applications in OS X and iOS, as it pushes aside the venerable (and standard) pthread APIs. While I do discuss the kernel support for GCD (Chapter 14, pg. 550, "Work Queues"), the implementation has changed considerably as Apple has added a new SPI in Mountain Lion (XNU 2050)/iOS 6, and has completely externalized pthread functionality to the pthread kernel extension in Mavericks and iOS 7. The pthread support in user mode has also been moved (as of OS X 10.9) to
On (yet another) flight to PVG, where I have to deliver a presentation on (among other things) GCD internals, I figured I might as well make public the information, in my attempt to keep the book as updated as possible for readers such as yourself. This article covers libdispatch versions 339.1.9 (OS X 10.9, for which the source is available), 354.3.1 (iOS 7, no source), and XNU 2050 (OS X 10.8) and 2423 (~ OS X 10.9/iOS 7).
Why should you care? (Target Audience)
Arguably, most developers couldn't care less about the implementation of GCD, as it "magically" provides concurrency and scheduling support. I'm more of the view that all "magic" has a logical explanation, and this is what I aim to provide here. Unlike other articles I've posted thus far, which can come in quite handy when you develop apps, this article is more of a deep dive into the esoteric. So maybe you do care, or maybe you don't. That's for you to decide. Me, I still have 11 hours to kill.
I: User Mode (libdispatch)
The Grand Central Dispatcher is implemented in <dispatch/dispatch.h>
and friends) and the man pages (q.v. dispatch(3)
). The rest of this article builds on those references as a foundation, though in a nutshell, the process for using GCD can be summarized as follows:
- GCD offers the application several global dispatch queues, of different priorities:
DISPATCH_QUEUE_PRIORITY_HIGH
(2),_DEFAULT
(0),_LOW
(-2) and_BACKGROUND
(-32768). The queues are scheduled in decreasing priority. The_BACKGROUND
queue is also run on a background thread (i.e. priority of about 4), with I/O throttling. These queues are obtained bydispatch_get_global_queue(priority, flags)
, with the only supported flag beingDISPATCH_QUEUE_OVERCOMMIT
. - An application also has a main thread queue, which can be obtained by a call to
dispatch_get_main_queue
. This is the queue served by the well knownCF/NSRunLoop
constructs. - An application can create additional queues using
dispatch_queue_create (label, attr)
. The label is an optional name (which can be obtained bydispatch_queue_get_label
and debugging tools), and attr is eitherDISPATCH_QUEUE_SERIAL
(1-by-1, FIFO) orDISPATCH_QUEUE_CONCURRENT
(parallelized execution) , controlling the execution of blocks. What Apple doesn't mention here is that (as of 10.9/7) there is also adispatch_queue_create_with_target
, specifying a third argument of an already existing queue, to serve as the target queue. - To schedule work, an application can call one of the following functions:
dispatch_async[_f]
: Sending a block or function (_f) to the queue specified. Execution is asynchronous, "as soon as possible".dispatch_sync[_f]
: Sending a block or function (_f) to the queue specified, and blocking until exection completes. Note that this doesn't necessarily mean the block or function will be executed in the current thread context - only that the current thread will block (that is, hang) so as to synchronize execution with the block or function.`
- GCD also supports dispatch sources. These can be created with
dispatch_source_create
, which takes four arguments: a source type, a (type-dependent) handle, a (type-dependent) mask of events to handle, and a queue on which the handler will run. The handler itself is set withdispatch_source_set_event_handler[_f]
, after which the source may be started with a call todispatch_resume
.
The root and predefined queues
What the Apple documentation refers to as "global" queues (in the sense of being global to the application, requiring no initialization), libdispatch calls "root" queues. The queues are hard-coded in an array (Index | serial # | queue name |
---|---|---|
0 | 4 | com.apple.root.low-priority |
1 | 5 | com.apple.root.low-overcommit-priority |
2 | 6 | com.apple.root.default-priority |
3 | 7 | com.apple.root.default-overcommit-priority |
4 | 8 | com.apple.root.high-priority |
5 | 9 | com.apple.root.high-overcommit-priority |
6 | 10 | com.apple.root.background-priority |
7 | 11 | com.apple.root.background-overcommit-priority |
The implementation of dispatch_get_global_queue
calls the internal _dispatch_get_root_queue
with the same arguments, which returns the approriate queue from the _dispatch_root_queues
array, mapping the priority code to an index of 0 (LOW),2 (DEFAULT),4(HIGH) or 6(BACKGROUND), or their off-by-one odd numbers if OVERCOMMIT was specified. Application created queues (i.e. dispatch_queue_create
) are always mapped to the low priority queue (index 0), with the serial queues created with overcommit (index 1)
Looking at the above table you might wonder about why the queues' serial numbers start at 4. This is because libdispatch also creates a queue for the application's main thread - com.apple.main-thread
(Serial #1, from init.c
), and uses internal queues for its own management: com.apple.root.libdispatch-manager
(Serial #2), and com.apple.libdispatch-manager
(Serial #3). Serial #0 is unused.
Dispatch Queue implementation
The dispatch queue is implemented is defined in
The dispatch queue starts by including the DISPATCH_STRUCT_HEADER
- as all dispatch objects do. This common header consists of an OS_OBJECT_HEADER
, (which provides the object operations table (vtable), and reference count), and several more fields, including the target queue (settable by dispatch_set_target_queue). The target queue is one of the root queues (usually the default one). Custom queues as well as dispatch sources thus eventually get coalesced into the root queues.
Then dispatch queue then follows with its subclass fields: DISPATCH_QUEUE_HEADER
, and the DISPATCH_QUEUE_CACHELINE_PADDING
. The latter is used to ensure that the structure can fit optimally within the CPU's cache lines. The former (DISPATCH_QUEUE_HEADER
) is used to maintain the queue metadata, including the "width" (# of threads in pool), label (for debugging), serial #, and work item list. The annotated header is shown below:
struct dispatch_queue_s {
/* DISPATCH_STRUCT_HEADER(queue) - from queue_internal.h */
/* _OS_OBJECT_HEADER(const struct dispatch_queue_vtable_s *do_vtable, do_ref_cnt, do_xref_cnt); */
/* from os/object_private.h */
const struct dispatch_queue_vtable_s *do_vtable; // object operations table
int volatile do_ref_cnt; // reference count
int volatile do_xref_cnt; // cross reference count
struct dispatch_queue_s *volatile do_next; // pointer to next object (i.e. linked list)
struct dispatch_queue_s *do_targetq; // Actual target of object (one of the root queues)
void *do_ctxt; // context
void *do_finalizer; // Set with dispatch_set_finalizer[_f]
unsigned int do_suspend_cnt; // increment/decrement with dispatch_queue_suspend/resume
/* DISPATCH_QUEUE_HEADER */
uint32_t volatile dq_running; // How many dispatch objects are currently running
struct dispatch_object_s *volatile dq_items_head; // pointer to first item on dispatch queue (for remove)
/* LP64 global queue cacheline boundary */
struct dispatch_object_s *volatile dq_items_tail; // pointer to last item on dispatch queue (for insert)
dispatch_queue_t dq_specific_q; // Used for dispatch_queue_set/get_specific
uint32_t dq_width; // Concurrency "width" (how many objects run in parallel)
unsigned int dq_is_thread_bound:1; // true for main thread
unsigned long dq_serialnum; // Serial # (1-12)
const char *dq_label; // User-defined; obtain with get_label
/* DISPATCH_INTROSPECTION_QUEUE_LIST */
TAILQ_ENTRY(dispatch_queue_s) diq_list // introspection builds (-DDISPATCH_INTROSPECTION) only
/* DISPATCH_QUEUE_CACHELINE_PADDING */
char _dq_pad[DISPATCH_QUEUE_CACHELINE_PAD]; // pads to 64-byte boundary
};
Note that Queues are not threads!. A single queue may be served by multiple worker threads, and vice versa. You can easily see the internals of GCD by using lldb on a sample program, say something as crude as:
#include <stdio.h>
#include <dispatch/dispatch.h>
#include <pthread.h>
int main (int arg,char **argv)
{
// Using pthread_self() inside a block will show you the thread it
// is being run in. The interested reader might want to dispatch
// this block several times, and note that the # of threads can
// change according to GCD's internal decisions..
void (^myblock1) (void) = ^ { printf("%d Blocks are cool - 1 \n",
(int) pthread_self()); };
dispatch_queue_t q =
dispatch_queue_create("com.technologeeks.demoq", // Our name
DISPATCH_QUEUE_CONCURRENT); // DISPATCH_QUEUE_SERIAL or CONCURRENT
dispatch_group_t g = dispatch_group_create();
dispatch_group_async(g, q, myblock1);
int rc= dispatch_group_wait(g, DISPATCH_TIME_FOREVER);
}
By placing a breakpoint inside a block, you'll see something similar to:
morpheus@Zephyr (~)$ cc /tmp/a.c -o /tmp/a
morpheus@Zephyr (~)$ lldb /tmp/a
Current executable set to '/tmp/a' (x86_64).
(lldb) b printf
Breakpoint 1: where = libsystem_c.dylib`printf, address = 0x0000000000080784
(lldb)
Process 9454 launched: '/tmp/a' (x86_64)
Process 9454 stopped
* thread #2: tid = 0xee5c1, 0x00007fff83232784 libsystem_c.dylib`printf,
queue = 'com.technologeeks.demoq, stop reason = breakpoint 1.1
frame #0: 0x00007fff83232784 libsystem_c.dylib`printf
libsystem_c.dylib`printf:
-> 0x7fff83232784: pushq %rbp
0x7fff83232785: movq %rsp, %rbp
0x7fff83232788: pushq %r15
0x7fff8323278a: pushq %r14
(lldb) bt all
#
# Main thread is blocking in dispatch_group_wait, which is basically like pthread_join
#
thread #1: tid = 0xee5b0, 0x00007fff86ff76c2 libsystem_kernel.dylib`semaphore_wait_trap + 10,
queue = 'com.apple.main-thread
frame #0: 0x00007fff86ff76c2 libsystem_kernel.dylib`semaphore_wait_trap + 10
frame #1: 0x00007fff893d983b libdispatch.dylib`_dispatch_group_wait_slow + 154
frame #2: 0x0000000100000e54 a`main + 100
frame #3: 0x00007fff8621e7e1 libdyld.dylib`start + 1
#
# Block is execution asynchronously on a worker thread, handled as a custom queue by libdispatch
# Offsets on Mavericks/iOS7 are (naturally) different, and worker_thread2 calls root_queue_drain
#
* thread #2: tid = 0xee5c1, 0x00007fff83232784 libsystem_c.dylib`printf,
queue = 'com.technologeeks.demoq, stop reason = breakpoint 1.1
frame #0: 0x00007fff83232784 libsystem_c.dylib`printf
frame #1: 0x0000000100000e97 a`__main_block_invoke + 39
frame #2: 0x00007fff893d7f01 libdispatch.dylib`_dispatch_call_block_and_release + 15
frame #3: 0x00007fff893d40b6 libdispatch.dylib`_dispatch_client_callout + 8
frame #4: 0x00007fff893d9317 libdispatch.dylib`_dispatch_async_f_redirect_invoke + 117
frame #5: 0x00007fff893d40b6 libdispatch.dylib`_dispatch_client_callout + 8
frame #6: 0x00007fff893d51fa libdispatch.dylib`_dispatch_worker_thread2 + 304
frame #7: 0x00007fff831c8cdb libsystem_c.dylib`_pthread_wqthread + 404
frame #8: 0x00007fff831b3191 libsystem_c.dylib`start_wqthread + 13
(lldb)
dispatch_group_wait
:
thread #1: tid = 0x6231, 0x00007fff86bace6a libsystem_kernel.dylib`__workq_kernreturn + 10,
queue = 'com.apple.main-thread
frame #0: 0x00007fff86bace6a libsystem_kernel.dylib`__workq_kernreturn + 10
frame #1: 0x00007fff8e96afa7 libsystem_pthread.dylib`pthread_workqueue_addthreads_np + 47
frame #2: 0x00007fff9432dba1 libdispatch.dylib`_dispatch_queue_wakeup_global_slow + 64
frame #3: 0x0000000100000e41 a`main + 81
frame #4: 0x00007fff911795fd libdyld.dylib`start + 1
frame #5: 0x00007fff911795fd libdyld.dylib`start + 1
This isn't due to 10.9's GCD being different - rather, it demonstrates the true asynchronous nature of GCD: The main thread has yet to return from requesting the worker (which it does by pthread_workqueue_addthreads_np
, as I'll describe later), and already the worker thread has spawned and is mid execution, possibly on another CPU core. The exact state of the main thread with respect to the worker is largely unpredictable.
Note another cool feature of GCD is that the queue name in thread #2 has been set to the custom queue. GCD renames the root queues when they are working on behalf of custom queues, like in this example), in a way that is visible to lldb. I'm working on adding this functionality to process explorer. In case you're wondering why "dispatch_worker_thread2
" is used - that's because libdispatch defined three worker thread functions: the first, for use when compiled with DISPATCH_USE_PTHREAD_POOL
. The second (this one), for use with HAVE_PTHREAD_WORKQUEUE_SETDISPATCH_NP
, and the third for HAVE_PTHREAD_WORKQUEUES
. The second also falls through to the third.
Dispatch Sources
A key function of dispatch queues is connecting them to dispatch sources. These enable an application to multiplex multiple event-listeners, much as would traditionally be provided by select(2)
, but with a far wider support of event sources - from file descriptors, through sockets, mach ports, signals, process events, timers and event custom sources.
All of the myriad sources are built on top of the kernel's kqueue mechanism. The type
argument to dispatch_source_create
is, in fact, a struct dispatch_source_type_s
pointer, defined in
struct dispatch_source_type_s {
struct kevent64_s ke;
uint64_t mask;
void (*init)(dispatch_source_t ds, dispatch_source_type_t type,
uintptr_t handle, unsigned long mask, dispatch_queue_t q);
};
A dispatch source can be thought of a special case of a queue. The two are closely related, and the former is a "subclass" of the latter, as can be seen by the definition:
struct dispatch_source_s {
/* DISPATCH_STRUCT_HEADER(source); */ // As per all other dispatch objects...
/* DISPATCH_QUEUE_HEADER; */ // as per dispatch_queue definition
/* DISPATCH_SOURCE_HEADER(source);
dispatch_kevent_t ds_dkev; \ // linked list of events and source refs
dispatch_source_refs_t ds_refs;
unsigned int ds_atomic_flags; \
unsigned int \
ds_is_level:1, \
ds_is_adder:1, \ // true for DISPATCH_SOURCE_ADD
ds_is_installed:1, \ // true if source is installed on manager queue
ds_needs_rearm:1, \ // true if needs rearmin on manager queue
ds_is_timer:1, \ // true for timer sources only
ds_cancel_is_block:1, \ // true if data source cancel_handler is a block
ds_handler_is_block:1, \ // true if data source event_handler is a block
ds_registration_is_block:1, \ // true if data source registration handler is a block
dm_connect_handler_called:1, \ // used by mach sources only
dm_cancel_handler_called:1; \ // true if in the process of calling cancel block
unsigned long ds_pending_data_mask; // returned by dispatch_source_get_data_mask()
unsigned long ds_ident_hack; // returned by dispatch_source_get_handle()
unsigned long ds_data; // returned by dispatch_source_get_data()
unsigned long ds_pending_data;
};
The dispatch_source_create
function operation is straightforward: following validation of the type
argument, it allocates and initializes a dispatch_source_s
structure, in particular populating its ds_dkev with the kevent() parameters passed to the function.
Internally, most (if not all) sources eventually get triggered by kevent()
. I cover this important syscall in both chapter 2 (page 57) and 14 (500 pages later..). This means that most sources use the same kqueue. Most, with the exception of Mach sources, which use Mach's request_notification
mechanism.
You can see this for yourself by using lldb on a program or daemon which uses dispatch sources. One example to debug is diskarbitration:
bash-3.2# ps -ef | grep diskarb
0 16 1 0 Sun10AM ?? 0:02.40 /usr/sbin/diskarbitrationd
bash-3.2# lldb -p 16
Attaching to process with:
process attach -p 16
Process 16 stopped
Executable module set to "/usr/sbin/diskarbitrationd".
Architecture set to: x86_64-apple-macosx.
(lldb) thread backtrace all
#
# The CFRunLoop construct (which is also responsible for the main thread queue)
# blocks on mach_msg_trap, which will return when a message is received
#
* thread #1: tid = 0x0140, 0x00007fff86ff7686 libsystem_kernel.dylib`mach_msg_trap + 10,
queue = 'com.apple.main-thread, stop reason = signal SIGSTOP
frame #0: 0x00007fff86ff7686 libsystem_kernel.dylib`mach_msg_trap + 10
frame #1: 0x00007fff86ff6c42 libsystem_kernel.dylib`mach_msg + 70
frame #2: 0x00007fff8be77233 CoreFoundation`__CFRunLoopServiceMachPort + 195
frame #3: 0x00007fff8be7c916 CoreFoundation`__CFRunLoopRun + 1078
frame #4: 0x00007fff8be7c0e2 CoreFoundation`CFRunLoopRunSpecific + 290
frame #5: 0x00007fff8be8add1 CoreFoundation`CFRunLoopRun + 97
frame #6: 0x00000001069d83e6 diskarbitrationd`___lldb_unnamed_function176$$diskarbitrationd + 2377
frame #7: 0x00007fff8621e7e1 libdyld.dylib`start + 1
#
# The manager queue (holds a kqueue() and blocks on kevent until a source "fires")
#
thread #2: tid = 0x0146, 0x00007fff86ff9d16 libsystem_kernel.dylib`kevent + 10,
queue = 'com.apple.libdispatch-manager
frame #0: 0x00007fff86ff9d16 libsystem_kernel.dylib`kevent + 10
frame #1: 0x00007fff893d6dea libdispatch.dylib`_dispatch_mgr_invoke + 883
frame #2: 0x00007fff893d69ee libdispatch.dylib`_dispatch_mgr_thread + 54
(lldb) detach
Detaching from process 16
Process 16 detached
When a source does fire, the libdispatch-manager triggers the callback on another thread (via dispatch_worker_thread2
, as usual, though it goes on to call dispatch_source_invoke, resulting in a slightly different stack). This way, the manager thread remains available to process events from other sources.
II: Still in User Mode (pthread)
GCD, contrary to the impression one might get, does not replace threads - it builds on them. The underlying support for libdispatch is still the venerable POSIX threads library (pthread), though most of the support comes from non-POSIX compliant Apple extensions (which are easily identifiable by the _np
suffix in function names. Most of those functions were silently introduced in Leopard (10.5), with others added in 10.6, as GCD was formerly introduced. The API, however, has undergone significant changes, making it a moving target.
To exacerbate matters, though the Apple pthread implementation was formerly a part of LibC, (thus open source), this has changed as of OS X 10.9 (somewhere between APPLE_PRIVATE
APIs) I guess they figure developers were forewarned.
The last open source implementation of pthreads, therefore, is that of 10.8 (_np
calls. 10.9 changes the API further, and it seems like it might take a while before the dust settles. This is also evident in the code of libdispatch, in the sections defined DISPATCH_USE_LEGACY_WORKQUEUE_FALLBACK
, though as of 10.8 the legacy interface has effectively been removed: Both libdispatch and pthreads check if the kernel supports the new interface (referred to as the "New SPIs"), and return an error if that is not the case.
The non standard pthread extensions provided by Apple were, surprisingly enough, documented - not by Apple, but by FreeBSD man pages, since GCD has been ported to it. Apple, however, effectively drops almost of all those extensions in favor of new ones, as shown in the following figure:
# OS X 10.8 output:
morpheus@Zephyr$ jtool -S -v /usr/lib/system/libsystem_c.dylib | grep pthread_workqueue
00000000000cfd80 d ___pthread_workqueue_pool_head
0000000000015b39 T ___pthread_workqueue_setkill
0000000000017230 T _pthread_workqueue_additem_np
0000000000016fb7 T _pthread_workqueue_addthreads_np
0000000000016aad T _pthread_workqueue_atfork_child
0000000000016aa3 T _pthread_workqueue_atfork_parent
0000000000016a99 T _pthread_workqueue_atfork_prepare
00000000000167bb T _pthread_workqueue_attr_destroy_np
0000000000016808 T _pthread_workqueue_attr_getovercommit_np
00000000000167d1 T _pthread_workqueue_attr_getqueuepriority_np
000000000001679f T _pthread_workqueue_attr_init_np
0000000000016822 T _pthread_workqueue_attr_setovercommit_np
00000000000167eb T _pthread_workqueue_attr_setqueuepriority_np
0000000000016ff8 T _pthread_workqueue_create_np
0000000000017848 T _pthread_workqueue_getovercommit_np
000000000001683a T _pthread_workqueue_init_np
0000000000016a56 T _pthread_workqueue_requestconcurrency
0000000000016f26 T _pthread_workqueue_setdispatch_np
# OS X 10.9 output:
morpheus@simulacrum$ jtool -S -v /usr/lib/system/libsystem_pthread.dylib | grep pthread_workqueue
0000000000002c0d t _pthread_workqueue_atfork_child # survived, but made private
0000000000002371 T ___pthread_workqueue_setkill # make thread killable by pthread_kill
0000000000002f78 T _pthread_workqueue_addthreads_np #
0000000000002f19 T _pthread_workqueue_setdispatch_np # q.v. below
0000000000002f12 T _pthread_workqueue_setdispatchoffset_np #
Since virtually the entire "legacy" API has been eradicated, let's focus on those functions which did make the cut:
Function | Notes |
---|---|
pthread_workqueue_addthreads_np | Add numthreads to workqueue of priority queue_priority, according to options. The only options supported is WORKQ_ADDTHREADS_OPTION_OVERCOMMIT . As you could see in Output 2, this call will asynchronously spawn the worker threads. |
pthread_workqueue_setdispatch_np | - Sets the dispatch worker function (always worker_thread2) - Makes sure new SPI is supported - Calls workq_open() |
pthread_workqueue_setdispatchoffset_np | A new addition to the API (10.9) Used by libdispatch when setting up the root queues, and passes the offset of the dq_serialnum member relative to the dispatch_queue_s struct. |
As you can see, there is no longer a way to manipulate most aspects of work queues via pthreads. Whereas before pthread exported an _additem_np
(which would enable scheduling of a work item), this has been removed in favor of _addthreads_np
, and the work function itself is set by _setdispatch_np
, normally once per process instance, during libdispatch's root_queue_init()
. This means that the actual work queue thread pool management is handled by the kernel.
Work queue diagnostics
Apple's fantabulous yet undocumented proc_info
syscall (#336), which I laud so much in the book, also has a PROC_PIDWORKQUEUEINFO
code (#12). It provides a very high level view of the workqueue, as shown here:
struct proc_workqueueinfo {
uint32_t pwq_nthreads; /* total number of workqueue threads */
uint32_t pwq_runthreads; /* total number of running workqueue threads */
uint32_t pwq_blockedthreads; /* total number of blocked workqueue threads */
uint32_t pwq_state;
};
The latest version of my Process Explorer (v0.2.9 and later) automatically displays associated work queue information, if work queues are detected in the process whose information you are querying.
III: Kernel support (workqueues)
System call interface
As stated in the bookworkq_open
(#367) and workq_kernreturn
(#368). Though the system calls remain constant, their implementation has changed with 10.8/6 and the introduction of the "new SPI". Beginning with 10.9/7, the implementation of the system calls has moved to the bsdthread_register
. You can find the definitions in
366 AUE_NULL ALL { int bsdthread_register(user_addr_t threadstart, user_addr_t wqthread, int pthsize,
user_addr_t dummy_value, user_addr_t targetconc_ptr, uint64_t dispatchqueue_offset)
NO_SYSCALL_STUB; }
367 AUE_WORKQOPEN ALL { int workq_open(void) NO_SYSCALL_STUB; }
368 AUE_WORKQOPS ALL { int workq_kernreturn(int options, user_addr_t item, int affinity, int prio)
NO_SYSCALL_STUB; }
There's a reason why all three have NO_SYSCALL_STUB
: Like other (crazy useful) syscalls in XNU, Apple doesn't want you to use them. If XNU weren't open source, nobody but Apple would like know how to use them, either.
workq_open
works in essentially the same way it has before. workq_kernreturn
, however, has been completely modified: Rather than offering the WQOPS
discussed in the book as options, the new SPI deprecates them all but WQOPS_THREAD_RETURN
, and instead offers two new others:
WQOPS_QUEUE_NEWSPISUPP
(0x10), which is used to check for SPI support - and merely returns 0 if supported.WQOPS_QUEUE_REQTHREADS
(0x20). This code requests the kernel to run n more (possibly overcommited) requests of a given priority. The value of "n" in passed in the "affinity" argument, with theitem
argument (formerly used to pass the user mode address to execute forWQOPS_QUEUE_ADD
) is ignored.
The kernel workqueue implementation
Kernel workqueue support was in
struct workqueue {
proc_t wq_proc; // Owning process
vm_map_t wq_map; // VM Map for work thread stacks
task_t wq_task; // The owning process's task port (used to create thread)
thread_call_t wq_atimer_call;
int wq_flags; // WQ_EXITING, WQ_ATIMER_RUNNING, WQ_LIST_INITED,
int wq_lflags; // WQL_ATIMER_BUSY, _WAITING
uint64_t wq_thread_yielded_timestamp; // set by workqueue_thread_yielded()
uint32_t wq_thread_yielded_count; // count of yielded threads, used with threshold
uint32_t wq_timer_interval;
uint32_t wq_affinity_max;
uint32_t wq_threads_scheduled;
uint32_t wq_constrained_threads_scheduled;
uint32_t wq_nthreads; // # of threads in this workqueue
uint32_t wq_thidlecount; // .. of which how many are idle
uint32_t wq_reqcount; // # of current requests (incremented by WQOPS_QUEUE_REQTHREADS)
TAILQ_HEAD(, threadlist) wq_thrunlist; // List of active threads
TAILQ_HEAD(, threadlist) wq_thidlelist; // List of idle ("parked") threads
uint16_t wq_requests[WORKQUEUE_NUMPRIOS]; // # of current requests, by priority
uint16_t wq_ocrequests[WORKQUEUE_NUMPRIOS];// # of overcommitted requests, by priority
uint16_t wq_reqconc[WORKQUEUE_NUMPRIOS]; /* requested concurrency for each priority level */
uint16_t *wq_thscheduled_count[WORKQUEUE_NUMPRIOS];
uint32_t *wq_thactive_count[WORKQUEUE_NUMPRIOS]; /* must be uint32_t since we OSAddAtomic on these */
uint64_t *wq_lastblocked_ts[WORKQUEUE_NUMPRIOS];
};
@TODO: detail more about work queue implementation..
sysctl variables
The kernel exports several variables to control work queues. These are basically the same as those of FreeBSD, and are exported by the kernel proper (pre 10.9/7) or by
sysctl variable | controls |
---|---|
kern.wq_yielded_threshold | Maximum # of threads that may be yielded |
kern.wq_yielded_window_usecs | Yielded window size |
kern.wq_stalled_window_usecs | Maximum # of usecs thread can not respond before it is deemed stalled |
kern.wq_reduce_pool_window_usecs | Maximum # of usecs thread can idle before the thread pool will be reduced |
kern.wq_max_timer_interval_usecs | Maximum # of usecs between thread checks |
kern.wq_max_threads | Maximum # of threads in the work queue |
kdebug codes
As with all kernel operations, the workqueue mechanism is laced with KERNEL_DEBUG macro calls, to mark function calls and arguments. Unlike other calls, however, the macros often define the debug codes as hex constants, rather than meaningful names. Unsurprisingly, the codes aren't listed in CoreProfile, either. I'm working on adding these to my kdebugView tool. I still need to delve into the "how" of kernel mode - so Updates will follow. Me, I need to get off this flight already.
- Usage of sysctl vars inside pthread_synch
- flow of
workqueue_run_nextreq
wq_runreq
andsetup_wqthread
- Kdebug constants..
References
- Concurrency Programming Guide:
- GCD Reference:
- My book