Нет-Work: Darwin Networking

Networking has forever redefined computing. With the advent of the Internet, a system without network connectivity finds little use, as more applications rely on remote servers to perform some or all of their functions. This is especially important with the move to the Cloud, which in some aspects goes back full circle to the "dumb terminals" of the mainframe age.

Like other operating systems, Darwin places a considerable emphasis on its networking stack. Originally inherited from the BSD layer, Apple has continuously refined and extended it with support for new protocols and more features improving functionality, efficiency, and speed. This chapter discusses Darwin's networking features, from a user mode perspective.

The chapter begins with a review of socket families, specifically the ones idiosyncratic to Darwin. These are the PF_NDRV sockets, which enable (to a certain extent) raw packet manipulation, and the PF_SYSTEM sockets for user/kernel mode communication. The latter is especially important, since it contains quite a few proprietary, undocumented but powerful interfaces.

We next briefly explain MultiPath TCP, an emerging Internet standard which Apple was quick to adopt. This required the addition of new system calls in Darwin 13, as well as some sysctl MIBs. MIBs are also the focus of the next two sections, which deal with network configuration from user mode, and the gathering of statistics. Statistics, however, are where Darwin excels through undocumented APIs. The PF_SYSTEM control sockets introduced earlier are especially useful to provide live network statistics which other OSes can only struggle with.

Following that, we turn our attention to firewalling and packet filtering. Darwin provides not one, but two built-in network-layer firewalls - BSD's ipfw(8) (which has been deprecated as of around Darwin 15) and pf(8). MacOS further provides an application layer firewall as well, through ALF.kext. For packet filtering, another legacy of BSD - the Berkeley Packet Filter (BPF) - is used, and though it is best described elsewhere, it is also briefly explained here.

Last, but not least we turn a spotlight towards two entirely undocumented but powerful APIs: The first is that of Network Extension Control Policies, which enable QoS, flow control and further through policy objects and a proprietary file descriptor. The second is the mysterious Skywalk, and its nexus and channel objects. This is an entire subsystem which is not only undocumented, but intentionally left out of XNU's public sources. The pages of this book provide the only public documentation of this important mechanism to date.

This is the complete 16^th chapter from *OS Internals, Volume I (in its v1.2 update) It's free, but please respect the copyright and immense amounts of research devoted to creating it. If any of this is useful, please cite using the original link. You might also want to consider getting the book, or Checking out Tg's training

Darwin Extensions of the BSD Socket APIs

Since its inception in the 1980's, the BSD socket model has proven time and again its superb design and extensibility to new protocols. The set of system calls used in manipulating sockets is (for the most part) entirely implementation agonstic. The only times where protocol specific functions are required, they may be set through the sockaddr_.. variant used when bind(2)ing the socket, or [get/set]sockopt(2), if family options are supported.

The header file in <sys/socket.h> lists over three dozen address families (as [AF/PF]_* constants), but in practice only a subset of them are supported in Darwin. These are shown in Table 16-1:

**Table 16-1:** The Protocol families supported on Darwin
#	Protocol Family	Transport
1	`PF_[LOCAL/UNIX]`	UNIX domain sockets
2	`PF_INET`	IPv4
14	`PF_ROUTE`	Internal Routing Protocol
27	`PF_NDRV`	Network Drivers: Raw device access
29	`PF_KEY`	IPSec Key Management (RFC2367)
30	`PF_INET6`	IPv6 (And IPv4 mapped)
32	`PF_SYSTEM`	System/kernel local communication (Proprietary)

Most of these are standard, and should be well known to the reader from other UN*X variants. PF_NDRV and PF_SYSTEM, however, are Darwin proprietary, and deserve special discussion.

`PF_NDRV`

The PF_NDRV protocol family is a somewhat misnamed one - the documentation describes it as used by "Network Drivers", though drivers are generally kernel mode beasts. A better name would have been "PF_RAW", as the family allows raw access to network interfaces, or perhaps (in keeping with Linux) "PF_PACKET". Raw interface access is quite similar to AF_INET[6]'s SOCK_RAW, or using the IP_HDRINCL setsockopt(2) syscall. Unlike either, however, PF_NDRV allows control over all layers - down to the link layer header.

The PF_NDRV sockets are created as usual, but require a different socket address family - struct sockaddr_ndrv. As a sockaddr_* compatible structure, its first fields are the byte-size snd_len and snd_family (set to sizeof(sockaddr_ndrv) and PF_NDRV, respectively. The only other field in the structure is the snd_name character array (of IFNAMSIZ bytes), which holds the underlying interface the socket is to be bound to.

Though seldom used, PF_NDRV allows a user-mode client to register its own EtherType, so that the kernel will dispatch packets to it. In that sense, it allows for "user mode drivers" to register their custom protocol implementations, using a setsockopt(2) with the SOL_NDRVPROTO level and NDRV_SETDMXSPEC option name. This, however, will only work if there is no a priori registered protocol (otherwise returning EADDRINUSE), so is therefore not useful for general packet sniffing.

A much more useful aspect of PF_NDRV is to create custom packets. A socket is created the same way, but specifying SOCK_RAW for the socket type. Following the binding to an interface, packets can be fabricated by directly writing to a buffer, and sending it on the bound interface using sendto(2).

Experiment: Using PF_NDRV to implement a custom network protocol

The following program can be used to demonstrate the capabilities of PF_NDRV for both custom protocol packet reception and sending. Because a lot of code is common, the receiving functionality is #ifdef LISTENER, and otherwise this will send the packets. You can try this program with any ethertype (specified as a decimal argument), so long as it doesn't collide with an already existing one (e.g. IP's 0x0800 or IPv6's 0x86dd).

Listing 16-2: A sample PF_NDRV client/listener program

#include <sys/socket.h>
#include <net/if.h>
#include <net/ndrv.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <net/ethernet.h>

int main (int argc, char **argv) {

   if (geteuid()) { fprintf(stderr,"No root, no service\n"); exit(1); }
   int s = socket(PF_NDRV,SOCK_RAW,0);
   if (s < 0) { perror ("socket"); exit(2); }

   uint16_t   etherType = ntohs(atoi(argv[1]));
   struct sockaddr_ndrv    sa_ndrv;

   strlcpy((char *)sa_ndrv.snd_name, "en0", sizeof (sa_ndrv.snd_name));
   sa_ndrv.snd_family = PF_NDRV;
   sa_ndrv.snd_len = sizeof (sa_ndrv);
   
   rc = bind(s, (struct sockaddr *) &sa_ndrv, sizeof(sa_ndrv));
   
   if (rc < 0) { perror ("bind"); exit (3);}

   char packetBuffer[2048];

#ifdef LISTENER
   struct ndrv_protocol_desc desc;
   struct ndrv_demux_desc demux_desc[1];
   memset(&desc, '\0', sizeof(desc));
   memset(&demux_desc, '\0', sizeof(demux_desc));

   /* Request kernel for demuxing of one chosen ethertype */
   desc.version = NDRV_PROTOCOL_DESC_VERS;
   desc.protocol_family = atoi(argv[1]);
   desc.demux_count = 1;
   desc.demux_list = (struct ndrv_demux_desc*)&demux_desc;
   demux_desc[0].type = NDRV_DEMUXTYPE_ETHERTYPE;
   demux_desc[0].length = sizeof(unsigned short);
   demux_desc[0].data.ether_type = ntohs(atoi(argv[1]));

   if (setsockopt(s, 
        SOL_NDRVPROTO, 
        NDRV_SETDMXSPEC, 
	 (caddr_t)&desc, sizeof(desc))) {
      perror("setsockopt"); exit(4);
   }
   /* Socket will now receive chosen ethertype packets */
   while ((rc = recv (s, packetBuffer, 2048, 0) ) > 0 ) {
   	printf("Got packet\n"); // remember, this is a PoC..
   }
#else
   memset(packetBuffer, '\xff', 12);
   memcpy(packetBuffer + 12, &etherType, 2);
   strcpy(packetBuffer,"NDRV is fun!");
   rc = sendto (s, packetBuffer, 20, 0, 
		(struct sockaddr *)&sa_ndrv, sizeof(sa_ndrv));
   if (rc < 0) { perror("sendto"); }
#endif
}

`PF_SYSTEM`

The PF_SYSTEM protocol family is a proprietary Darwin mechanism which provides communication between kernel mode providers and user mode requesters. PF_SYSTEM sockets are always SOCK_RAW, with two protocols implemented: SYSPROTO_EVENT (1) and SYSPROTO_CONTROL (2).

`SYSPROTO_EVENT`

The SYSPROTO_EVENT protocol is used by the kernel to multicast events to interested parties. In that sense, it is very similar to Linux's AF_NETLINK. No binding is required for the socket - but an event filter must be set using a SIOCSKEVFILT ioctl(2) request. The ioctl(2) takes a struct kev_request, defined in sys/kern_event.h (along with the ioctl(2) codes) to consist of three uint32_t - for the vendor_code, kev_class and kev_subclass. Zero values may be specified as wildcards (....ANY) for any of the three values. The only vendor supported out of box is KEV_VENDOR_APPLE, and Table 16-3 shows the classes which exist in Darwin 18:

**Table 16-3:** `SYSPROTO_EVENT` `KEV_...` classes in Darwin 18
`KEV_..._CLASS`		`KEV_..._SUBCLASS`		Event types
1	`NETWORK`	1	`INET`	IPv4 (codes in <net/net_kev.h>)
		2	`DL`	Data Link subclass (codes in <net/net_kev.h>)
		3	`NETPOLICY`	Network policy subclass
		4	`SOCKET`	Sockets
		5	`ATALK`	AppleTalk (no longer used)
		6	`INET6`	IPv6 (codes in <net/net_kev.h>)
		7	`ND6`	IPv6 Neighbor Discovery Protocol
		8	`NECP`	NECP subclasss
		9	`NETAGENT`	Net-Agent subclass
		10	`LOG`	Log subclass
		11	`NETEVENT`	Generic Net events subclass
		12	`MPTCP`	Global MPTCP events subclass
2	`IOKIT`	?	?	IOKit drivers
3	`SYSTEM`	2	`CTL`	Control notifications
3	`SYSTEM`	3	`MEMORYSTATUS`	Jetsam/memorystatus subclass
4	`APPLESHARE`			AppleShare events (no longer used)
5	`FIREWALL`	1	`IPFW`	`ipfw` - IPv4 firewalling
5	`FIREWALL`	2	`IP6FW`	`ipfw` - IPv6 firewalling
6	`IEEE80211`	1	?	Wireless Ethernet (`IO8211Family` drivers)

Following the setting of the ioctl(2), events can be read from the socket as a stream of kern_event_msg structures. As further explained in <sys/kern_event.h>, each event structure is of variable total_size, specifying a vendor_code, kev_class and kev_subclass (which are guaranteed to match the filter), as well as a monotonically increasing id, an event_code, and any number of event_data words (up to the total_size specified). Additional ioctl(2) codes are SIOCGKEVID (to get current event ID), SIOCGKEVFILT (get the filter set on the socket) and SIOCGKEVVENDOR (looking up the provider code for a string provider name). Apple's built-in mechanisms naturally use the APPLE class, although vendor class 1000 has also been seen on MacOS (for the socketfilterfw, discussed later).

Experiment: Creating a SYSPROTO_EVENT listener

The programming model of SYSPROTO_EVENT is so simple an event listener can be coded in but a few lines:

Listing 16-4: Sample code for a SYSPROTO_EVENT listener

#include <sys/socket.h>
#include <sys/kern_event.h>
#include <sys/ioctl.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int main (int argc, char **argv)
{
  struct kev_request req;
  int s = socket(PF_SYSTEM, SOCK_RAW, SYSPROTO_EVENT);

  req.vendor_code = KEV_VENDOR_APPLE;
  req.kev_class = KEV_ANY_CLASS;
  req.kev_subclass = KEV_ANY_SUBCLASS;

  if (ioctl(s, SIOCSKEVFILT, &req)){ perror("Unable to set filter\n"); exit(1);}
  char buf[1024];

  while (1) {
     int rc;
     struct kern_event_msg *kev;

     // can use if (ioctl(s, SIOCGKEVID, &id)) to get next ID
     // or simply read and block until an event occurs..
     rc = read (s, buf, 1024);
     kev = (struct kern_event_msg *)buf;
     printf ("%d: (%d bytes). Vendor/Class/Subclass: %d/%d/%d Code: %d\n",
              kev->id, kev->total_size, kev->vendor_code, 
	      kev->vendor_code, kev->kev_class, kev->kev_subclass, 
	      kev->event_code);
  } // end while
  return 0;
} // end main

Compiling the above program and running it will block, and occasionally spit out event notifications. Most commonly on MacOS are those of IE80211 (1/6/1), which emits messages on WiFi scans and state changes. Toggling the WiFi interface will also generate NETWORK/DL messages (1/1/2) as the interface reconfigures, and NETWORK/INET6 (1/1/6) as it gets a dynamic IP address.

Being a Darwin proprietary mechanism, the event notifications are used by Apple's own daemons. Using procexp(j) you can see which sockets are used which daemons - including the above program ('kev'), when it runs:

Output 16-5: Viewing SYSPROTO_EVENT socket usage with procexp(j)

root@Chimera (~)# sudo procexp all fds | grep Event:
kev	         12172 FD  3u  socket system Event:   APPLE:ANY:ANY
socketfilterfw    1723 FD  4u  socket system Event:   1000:5:11
socketfilterfw    1723 FD  7u  socket system Event:   APPLE:NETWORK:LOG
sharingd           289 FD  5u  socket system Event:   APPLE:IEEE80211:1
UserEventAgent     244 FD  4u  socket system Event:   APPLE:IEEE80211:1
airportd           143 FD  7u  socket system Event:   APPLE:NETWORK:DL
airportd           143 FD 22u  socket system Event:   APPLE:IEEE80211:1
CommCenter         248 FD  6u  socket system Event:   APPLE:NETWORK:NETEVENT
symptomsd          177 FD 15u  socket system Event:   APPLE:NETWORK:INET
symptomsd          177 FD 16u  socket system Event:   APPLE:NETWORK:ND6
AirPlayXPCHelpe     98 FD  3u  socket system Event:   APPLE:NETWORK:DL
AirPlayXPCHelpe     98 FD  6u  socket system Event:   APPLE:IEEE80211:1
UserEventAgent      43 FD  5u  socket system Event:   APPLE:SYSTEM:MEMORYSTATUS
bluetoothd          95 FD  4u  socket system Event:   APPLE:IEEE80211:1
locationd           84 FD 11u  socket system Event:   APPLE:IEEE80211:1
configd             54 FD  4u  socket system Event:   APPLE:NETWORK:ANY
configd             54 FD 19u  socket system Event:   APPLE:IEEE80211:1
configd             54 FD 21u  socket system Event:   APPLE:IEEE80211:1

`SYSPROTO_CONTROL`

The second protocol of the PF_SYSTEM family is SYSPROTO_CONTROL. This merely provides a control channel from user space onto a given provider, which may be a kernel subsystem or some kernel extension, calling on the ctl_register KPI. Such SYSPROTO_CONTROL sockets are associated with control names, which Apple maintains in a reverse DNS notation. Apple keeps adding more and more providers in between Darwin versions and in new kernel extensions - needless to say all undocumented. Using netstat(1), you can see both which providers are registered (under "Registered kernel control modules"), and which are actively in use (through "Active kernel control sockets"), although to see which processes are actually holding control sockets one needs to use lsof(1) or procexp(j). Table 16-6 shows the providers found in Darwin 18.

**Table 16-6:** The known `SYSPROTO_CONTROL` IDs in Darwin 18
`com.apple.` Control Name	Provides
`network.statistics`	Live socket statistics and notifications
`content-filter`	XNU-2782: User space packet data filtering. Used by network-cmds `cfilutil`
`fileutil.kext.state[less/ful].ctl`	MacOS 14: AppleFileUtil.kext
`flow-divert`	XNU-2422: MPTCP flow diversions
`mcx.kernctl.alr`	MacOS mcxalr.kext: Managed Client eXtensions control
`net.ipsec_control`	XNU-2422: User-mode IPSEC controls
`net.necp_control`	XNU-2782: Network Extension Control Policies
`net.netagent`	XNU-3248: Network Agents (discussed later)
`net.rvi_control`	RemoteVirtualInterface.kext: control socket
`net.utun_control`	User mode tunneling (VPNs)
`netsrc`	Network/route policies and statistics
`network.advisory`	XNU-3248: Report `SYMPTOMS_ADVISORY_[CELL/WIFI]_[BAD/OK]` to kernel
`network.tcp_ccdebug`	XNU-2782: Collect flow control algorithm debug data
`nke.sockwall`	MacOS: The Application Layer Firewall (ALF.kext), discussed later
`nke.webcontentfilter`	webcontentfilter.kext: "HolyInquisition" socket filtering via user-mode proxy
`packet-mangler`	XNU-2782: Tracks flows and handles TCP options (Used by network-cmds `pktmnglr`)
`uart.[sk].*`	`AppleOnBoardSerial.kext` and other UART devices (MacOS: `BLTH`, `MALS`, `SOC`, iOS: `oscar`, `gas-gauge`, `wlan-debug` and `iap`)
`userspace_ethernet`	`IOUserEthernet.kext`: User mode tunneling (Layer II Ethernet)

Once a socket is created, a CTLIOCGINFO ioctl(2) must be issued with a struct ctl_info argument, whose ctl_name field is initialized with the requested control name. If the ioctl(2) is successful, the socket may then be connect(2)ed through a struct sockaddr_ctl, initialized with the ctl_id returned from the previous ioctl(2).

A connected socket, however, is far as the SYSPROTO_CONTROL goes. From that point on, every socket behaves differently, depending on the underlying provider. The general flow usually entails send(2)ing and recv(2)ing, and in some cases using [get/set]sockopt(2). The system calls result in kernel mode callbacks (registered by the implementing party, usually a kernel extension) being invoked. The kernel-mode implementation is discussed in Volume II.

Control sockets are commonly used for administering other networking facilities, so a few examples of their usage will be discussed in this chapter.

Experiment: User mode tunneling with SYSPROTO_CONTROL

User mode tunneling, a feature commonly tapped by VPN applications, is a great example of a SYSPROTO_CONTROL socket. Such applications, rather than installing some kernel filtering mechanism, instead draw on a kernel facility to request the creation of a new interface, which appears to other processes as another link-layer, complete with its own IPv4 or IPv6 address. When such processes bind to the interface, the IP-layer packets are redirected to the utun controller, which can then do with them as his own, commonly encapsulating them in an additional IP layer (with or without encryption), and sending them elsewhere. The process works both ways, in that the utun controller can also inject packets onto the tunneling interface, which the kernel will then route to their bound sockets, as it would have with any other interface.

The compact code in Listing 16-7 sets up com.apple.net.utun_control:

Listing 16-7: The code to set up a user mode tunneling interface

#include <sys/kern_control.h>
#include <net/if_utun.h>       // for UTUN_CONTROL_NAME

int tun(void)
{
  struct sockaddr_ctl sc;
  struct ctl_info ctlInfo;

  memset(&ctlInfo, 0, sizeof(ctlInfo));
  strlcpy(ctlInfo.ctl_name, UTUN_CONTROL_NAME, sizeof(ctlInfo.ctl_name));

  int fd = socket(PF_SYSTEM, SOCK_DGRAM, SYSPROTO_CONTROL);
  if (fd == -1) { /* perror .. */ return -1;}
  if (ioctl(fd, CTLIOCGINFO, &ctlInfo) < 0) { /* perror.. */ return -1; }

  sc.sc_id = ctlInfo.ctl_id; sc.sc_len = sizeof(sc);
  sc.sc_family = AF_SYSTEM;  sc.ss_sysaddr = AF_SYS_CONTROL;
  sc.sc_unit = 2; /* To create utun1, just in case utun0 is in use */

  /* utun%d device will be created, where "%d" is unit number -1 */
  if (connect(fd, (struct sockaddr *)&sc, sizeof(sc)) == -1) {
           perror ("connect(AF_SYSCONTROL)"); close(fd); return -1; }

  return fd;
}

Once the descriptor is set up, a trivial read(2) loop implementation is left as an exercise for the avid reader. When completed, an interface will appear (In the example we useutun1, since utun0 is occasionally used by the IDS.framework's identityserviced). After configuring an IP address on the interface, generating any traffic (e.g. with ping(8)), will send those IP packets to the process:

Output 16-8 (a/b): The output of a sample program built with the previous listing

In the first terminal, set up tunnel:

morpheus@chimera(~)# ifconfig utun1
ifconfig: interface utun1 does not exist
morpheus@chimera(~)# /tmp/tun
#
# IP Packets come out here (in this example, 
#   20 bytes (IP Header)
# +  8 bytes (ICMP Header)
# + 56 byte payload = 76 bytes, below
#
45 00 00 54 97 90 00 00 # 45 - IPv4, 20 bytes
40 01 db 0c 01 02 03 04 # 01 - ICMP, src: 1.2.3.4 
01 02 03 05 08 00 a4 0e # dst: 1.2.3.5, 08 - Echo
87 30 00 00 5b f4 ba 97 
00 07 cb 2a 08 09 0a 0b 
0c 0d 0e 0f 10 11 12 13 
14 15 16 17 18 19 1a 1b 
1c 1d 1e 1f 20 21 22 23 
24 25 26 27 28 29 2a 2b
2c 2d 2e 2f 30 31 32 33 
34 35 36 37

In another terminal, once tunnel is up:

#
# Interface exists - initially no address
#
$ ifconfig utun1
utun1: flags=8051<UP,POINTOPOINT,.. > mtu 1500
#
# Configure with some address
#
$ sudo ifconfig utun1 1.2.3.4 1.2.3.5
#
# Make sure configuration worked:
#
$ ifconfig utun1
utun1: flags=8051 ..  mtu 1500
 inet 1.2.3.4 --> 1.2.3.5 netmask 0xff000000 
#
# Generate traffic
#
$ ping 1.2.3.5
PING 1.2.3.5 (1.2.3.5): 56 data bytes

Proprietary socket system calls

In addition to the _nocancel extensions found in XNU for other I/O related calls, Apple has extended the BSD socket API with several proprietary system calls - which are, as usual, undocumented.

`pid_shutdown_sockets` (#436)

The pid_shutdown_sockets system call enables the caller to force all presently open sockets to be forcefully shutdown. This is only used on *OS, wherein the only caller of this system call appears to be /usr/libexec/assertiond.

`socket_delegate` (#450)

The socket_delegate system call works like socket(2) - only it receives an additional (fourth) argument, specifying the target PID in which the socket is to be created. There appears to be little use for this system call.

`[dis]connectx` and `[send/recv]msg_x` (#447-8, #480-1)

Apple introduced several non-standard system calls which both extend BSD sockets and provide support for MultiPath TCP (and other protocols) as far back as Darwin 13. The first pair, [dis]connectx, provide for quick bind(2)/connect(2), deferred connection setup (CONNECT_RESUME_ON_READ_WRITE) as well as for supporting multiple address associations. These only became an official API as of 15, and got a fairly detailed manual page. The second pair, [send/recv]msg_x (supporting array based [send/recv]msg for protocols handling simultaneous multiple datagrams), is still not officially provided to this day. The prototypes and documentation in bsd/sys/socket.h are PRIVATE, so as not to appear in <sys/socket.h>.

Listing 16-9: The extended socket system calls (from <sys/socket.h>)

__API_AVAILABLE(macosx(10.11), ios(9.0), tvos(9.0), watchos(2.0))       // #447
int connectx(int, const sa_endpoints_t *, sae_associd_t, unsigned int,
             const struct iovec *, unsigned int, size_t *, sae_connid_t *);

__API_AVAILABLE(macosx(10.11), ios(9.0), tvos(9.0), watchos(2.0))       // #448
int disconnectx(int, sae_associd_t, sae_connid_t);

__API_STILL_NOT_AVAILABLE_FOR_SOME_REASON(macosx(10.14), ios(12.0), etc)
ssize_t recvmsg_x(int s, struct msghdr_x *msgp, u_int cnt, int flags);  // #480
ssize_t sendmsg_x(int s, struct msghdr_x *msgp, u_int cnt, int flags);  // #481

Apple expects developers to use the (very) high level abstraction of NSURLSession objects, setting the MultipathServiceType property of NSURLSessionConfiguration as documented in a developer article^[1]. For pure C programmers, the system calls remain the most effective mechanism to tap into MPTCP's powerful functionality, and other address family specific yet non-standard features.

`peeloff` (#449)

peeloff was a short lived system call meant to extract an association from a socket. It was added in XNU-2422 but apparently replaced with a null implementation (returning 0) in XNU-4570.

Interfaces

As with other UN*X systems, Darwin provides user mode with network access through the notion of interfaces. These are devices which, unlike the standard character or block devices, have no filesystem presence and can only be accessed through sockets, and controlled through ioctl(2) on the bound sockets. The command line ifconfig(1) utility comes in very handy to view devices, with -l for a short list or -a for full information. Trying this on any Darwin system details the interfaces, which follow the naming convetions shown in Table 16-10:

**Table 16-10:** The interfaces found on Darwin systems
Link	Interface	Provided by	Used for
Loopback	lo0	XNU	The loopback ("localhost") interface
	gif#		Generic IPv[4/6]-in-IPv[4/6] (RFC2893) tunneling
	stf#		6-to-4 (RFC3056) tunneling
	utun#		User mode tunneling
	ipsec#		IPSec tunneling
Ethernet	en#	`IONetworkingFamily`	Ethernet (wired, wireless, and over other media)
	awdl0	`IOgPTPPlugin.kext`	Apple Wireless Device Link
	p2p#	`AppleBCMWLANCore`	Wi-Fi peer to peer
	ppp#	`PPP.kext`	Point-to-Point Protocol (/usr/sbin/pppd)
	bridge#	XNU	MacOS: Interface bridging
	fw#	`IOFireWireIP`	MacOS:IP over FireWire
	rvi#	`RemoteVirtualInterface`	MacOS: captures packets from attached *OS devices
	ap#		Access Point (personal hotspot)
Cell	pdp_ip#	AppleBasebandPCI[ICE/MAV]PDP	iOS/WatchOS: Cellular connection (if applicable)
USB	XHC#	`AppleUSBHostPacketFilter`	MacOS: USB Packet capture
Capture	[pk/ip]tap#	XNU	Packet or IP Layer capture from multiple interfaces

The en# interfaces are the ones most commonly used. Not only does the local wired Ethernet (on Mac Minis and iMacs, through ......) and wireless interfaces (used by the Airport Broadcom or other NIC kext) appear as en#, but so does Bluetooth, the Mac ↔ BridgeOS interface as well (usually as en5) started by the com.apple.driver.usb.cdc.ncm driver, as well as the tethering interface of the iPhone when Mobile Hotspot is activated over USB. The system maintains a property list of all ethernet interfaces and their mappings to the GUI visible strings ("UserDefinedName") in /L*/Pref*ces/SystemConfiguration/NetworkInterfaces.plist (which can be displayed with networksetup -listallhardwareports).

Interface Configuration

As with other UN*X systems, the ifconfig(8) utility can be used to obtain information on interfaces and perform various operations, such as plumb, add/remove IPv(4/6) addresses or aliases, and bond. Darwin also extends the interface object internally with numerous proprietary ioctl(2) codes, all marked PRIVATE or BSD_KERNEL_PRIVATE so as to not be visible in user mode. Whereas <sys/sockio.h> defines about 19 SIOCSIF* codes, XNU's bsd/sys/sockio.h nearly doubles that number with additional codes to handle numerous proprietary functions. The undocumented ioctl(2) codes are shown in Table 16-11 (next page). The codes are shown in their macro form, which also defines their third (void *) argument. Note, that some of the structures are also undocumented, and are found elsewhere in the XNU sources. The Book's companion XXR can come handy to help you locate the structures and copy them to a user mode header.

**Table 16-11:** Unexported `SIOC*` codes (from XNU-4903's bsd/sys/sockio.h)
`ioctl(2)` code	Value (`_IOWR()`)	Description
`SIOCGIFCONF[32/64]`	('i', 36, struct ifconf[32/64])	get ifnet list
`SIOCGIFMEDIA[32/64]`	('i', 56, struct ifmediareq[32/64])	get net media
`SIOCGIFGETRTREFCNT`	('i', 137, struct ifreq)	get interface route refcnt
`SIOCGIFLINKQUALITYMETRIC`	('i', 138, struct ifreq)	get LQM
`SIOCGIFEFLAGS`	('i', 142, struct ifreq)	get extended ifnet flags
`SIOC[S/G]IFDESC`	('i', 143/144, struct if_descreq)	Set/Get interface description
`SIOC[S/G]IFLINKPARAMS`	('i', 145/146, struct if_linkparamsreq)	Set/Get output TBR rate/percent
`SIOCGIFQUEUESTATS`	('i', 147, struct if_qstatsreq)	Get interface queue statistics
`SIOC[S/G]IFTHROTTLE`	('i', 148/149, struct if_throttlereq)	Set/Get throttling for interface
`SIOCGASSOCIDS[/32/64]`	('s', 150, struct so_aidreq[/32/64])	get associds
`SIOCGCONNIDS[/32/64]`	('s', 151, struct so_cidreq[/32/64])	get connids
`SIOCGCONNINFO[/32/64]`	('s', 152, struct so_cinforeq[/32/64])	get conninfo
`SIOC[S/G]CONNORDER`	('s', 153/154, struct so_cordreq)	set conn order
`SIO[C/G]SIFLOG`	('i', 155/156, struct ifreq)	Get/Set Interface log level
`SIOCGIFDELEGATE`	('i', 157, struct ifreq)	Get delegated interface index
`SIOCGIFLLADDR`	('i', 158, struct ifreq)	get link level addr
`SIOCGIFTYPE`	('i', 159, struct ifreq)	get interface type
`SIOC[G/S]IFEXPENSIVE`	('i', 160/161, struct ifreq)	get/mark interface expensive flag
`SIO[C/S]GIF2KCL`	('i', 162/163, struct ifreq)	interface prefers 2 KB clusters
`SIOCGSTARTDELAY`	('i', 164, struct ifreq)	Add artificial delay
`SIOCAIFAGENTID`	('i', 165, struct if_agentidreq)	Add netagent id
`SIOCDIFAGENTID`	('i', 166, struct if_agentidreq)	Delete netagent id
`SIOCGIFAGENTIDS[/32/64]`	('i', 167, struct if_agentidsreq[/32/64])	Get netagent ids
`SIOCGIFAGENTDATA[/32/64]`	('i', 168, struct netagent_req[/32/64])	Get netagent data
`SIOC[S/G]IFINTERFACESTATE`	('i', 169/170, struct ifreq)	set/get interface state
`SIOC[S/G]IFPROBECONNECTIVITY`	('i', 171/172, struct ifreq)	Start/Stop or check connectivity probes
`SIOCGIFFUNCTIONALTYPE`	('i', 173, struct ifreq)	get interface functional type
`SIOC[S/G]IFNETSIGNATURE`	('i', 174/175, struct if_nsreq)	Set/Get network signature
`SIOC[G/S]ECNMODE`	('i', 176/177, struct ifreq)	Explicit Congestion Notification mode (`IFRTYPE_ECN_[[EN/DIS]ABLE/DEFAULT]`)
`SIOCSIFORDER`	('i', 178, struct if_order)	Set interface ordering
`SIO[C/G]SQOSMARKINGMODE[/ENABLED]`	('i', 180-183, struct ifreq)	Get/set QoS marking mode
`SIOCSIFTIMESTAMP[EN/DIS]ABLE`	('i', 184-185, struct ifreq)	Enable/Disable interface timestamp
`SIOCGIFTIMESTAMPENABLED`	('i', 186, struct ifreq)	Get interface timestamp enabled status
`SIOCSIFDISABLEOUTPUT`	('i', 187, struct ifreq)	Disable output (`DEVELOPMENT/DEBUG`)
`SIOCGIFAGENTLIST[/32/64]`	('i', 190, struct netagentlist_req[/32/64])	Get netagent dump
`SIOC[S/G]IFLOWINTERNET`	('i', 191/192, struct ifreq)	Set/Get low internet download/upload
`SIOC[G/S]IFNAT64PREFIX`	('i', 193/194, struct if_nat64req)	Get/set Interface NAT64 prefixes
`SIOCGIFNEXUS`	('i', 195, struct if_nexusreq)	get nexus details
`SIOCGIFPROTOLIST[/32/64]`	('i', 196, struct if_protolistreq[/32/64])	get list of attached protocols
`SIOC[G/S]IFLOWPOWER`	('i', 199/200, struct ifreq)	Low Power Mode
`SIOCGIFCLAT46ADDR`	('i', 201, struct if_clat46req)	Get CLAT (IPv4-in-IPv6) addresses

Case Study: `rvi`

The Remote Virtual Interface was introduced in iOS 5.0 and documented by Apple in QA1176^[2]. The feature allows the iOS interfaces to appear on a Mac host, so that packet tracing tools (notably, tcpdump(1)) can be used through the host.

RVI requires the cooperation of several components, both on the host and the device, all working together as shown in Figure 16-12 (next page). The binaries of the remote virtual interface package all belong to the same (unnamed and closed source) project, and are of the few in MacOS which still target 10.7 (i.e. use LC_UNIXTHREAD), and are apparently unmaintained since then.

/usr/bin/rvictl is a simple command line utilty which can list active device UUIDs (-l/L, then start (-s/S) or stop (-x/X) the remote virtual interface on the devices by their specified UUID. It does so by linking with MobileDevice.framework (for the list functionality), and with the private RemotePacketCapture.framework. The latter exports APIs which hide the IPC connection to /usr/libexec/rpmuxd.

The /usr/libexec/rpmuxd is started on (rvictl)'s demand for com.apple.rpmuxd by launchd. The daemon handles the local end of the packet capture operations, as well as provides a notification mechanism for interested clients over MIG subsystem 117731 with five messages:

**Table 16-12:** /usr/libexec/rpmuxd's MIG subsystem:
#	Routine Name
117731	`rpmuxd_start_packet_capture`
117732	`rpmuxd_stop_packet_capture`
117733	`rpmuxd_get_current_devices`
117734	`rpmuxd_register_notification_port`
117735	`rpmuxd_deregister_notification_port`

The daemon controls the RemoteVirtualInterface.kext (visible in kextstat(1) by its CFBundleIdentifier of com.apple.nke.rvi). The kext isn't normally loaded, so rpmuxd can load it (by posix_spawn(1)ing kextload). The kext sets up a PF_SYSTEM control socket with the name com.apple.net.rvi_control, which the daemon connect(2)s to, and through which it requests the kext to create the rvi# interface. At the same time, it handles the connection to the iDevice, by calling AMDeviceSecureStartService() to request the launch of com.apple.pcapd on it.

On the iDevice, when lockdownd receives the request to start com.apple.pcapd, it consults the __TEXT.__services plist embedded in the Mach-O, and resolves the name to /usr/libexec/pcapd. The pcapd is a small daemon which uses libpcap.A.dylib, calling pcap_setup_pktap_interface() and then using the Berkeley Packet Filter (explained later in this chapter) to capture all packets. These packets are relayed over the lockdown connection to the host's rpmuxd, which then injects them (through the control socket) to the kext, which in turn pushes them through the rvi# interface it has created.

The end result of this is that the rvi# interface is entirely indistinguishable from other ethernet interfaces for packet capture tools, and so tcpdump(1) can be run on the host (with the -i rvi# switch) to obtain the packets which were actually captured on the iDevice. The connection, however, is read-only, so packets cannot be sent through the rvi# interface back to the device.

Figure 16-13: The architecture of the Remote Virtual Interface facility (addresses from MacOS 14 binaries)

Networking Configuration

The network stack exposes a plethora of configuration settings (and statistics, described later) via sysctl MIBs. These are all conveniently in the net namespace. As with other MIBs, they are mostly undocumented save for a description in the kernel's __DATA.__sysctl_set (through the SYSCTL_OID's description field), which can be displayed with joker -S.

IPv4 configuration

The MIBs for controlling IPv4 and IPv6 are broken into two separate namespaces net.inet.ip and net.inet6.ip6. Owing to the similarities of both protocols, however, some MIBs are found in both namespaces, as shown in Table 16-14:

**Table 16-14:** Configuration MIBs common to IPv4 and IPv6
`net.inet[6].ip[6]` MIB	Default	Purpose
`mcast.loop`	0x00000001	Loopback multicast datagrams by default
`rtexpire`	0x0000013b	Default expiration time on dynamically learned routes
`rtminexpire`	0x0000000a	Minimum time to hold onto dynamically learned routes
`rtmaxcache`	0x00000080	Upper limit on dynamically learned routes
`forwarding`	0000000000	Enable IP forwarding between interfaces
`redirect`	0x00000001	Enable sending IP redirects
`maxfragpackets`	1536	Maximum number of fragment reassembly queue entries
`maxfragsperpacket`	128	Maximum number of fragments allowed per packet
`adj_clear_hwcksum`	0000000000	Invalidate hwcksum info when adjusting length
`adj_partial_sum`	0x00000001	Perform partial sum adjustment of trailing bytes at IP layer
`mcast.maxgrpsrc`	0x000200	Max source filters per group
`mcast.maxsocksrc`	0x000080	Max source filters per socket
`mcast.loop`	0x0000080	Multicast on loopback interface

Additional, IPv4 specific parameters, found in net.inet.ip, are shown in Table 16-15.

**Table 16-15:** Configuration MIBs specific to IPv4
`net.inet.ip` MIB	Default	Purpose
`portrange.low[first/last]`	600-1023	Low reserved port range
`portrange.[hi][first/last]`	49152-65536	High reserved port range
`ttl`	0x00000040	Default TTL value on outgoing packets
`[accept_]sourceroute`	0000000000	Enable [accepting/forwarding] source routed IP packets
`gifttl`	0x0000001e	Time-to-Live (max hop count) on GIF (IP-in-IP) interfaces
`subnets_are_local`	0000000000	Subnets of local interfaces also considered local
`random_id_statistics`	0000000000	Enable IP ID statistics
`sendsourcequench`	0000000000	Enable the transmission of source quench packets
`check_interface`	0000000000	Verify packet arrives on correct interface
`rx_chaining`	0x00000001	Do receive side ip address based chaining
`rx_chainsz`	0x00000006	IP receive side max chaining
`linklocal.in.allowbadttl`	0x00000001	Allow incoming link local packets with TTL < 255
`random_id`	0x00000001	Randomize IP packets IDs
`maxchainsent`	0x00000016	use dlil_output_list
`select_srcif_debug`	0000000000	Debug (dmesg) source address selection
`output_perf`	0000000000	Do time measurement
`rfc6864`	0x00000001	Updated Specification of the IPv4 ID Field

IPv6 configration

IPv6 has plenty of other specific parameters which affect its behavior, particularly for its sub-protocols, like Neighbor Discovery (ND).

**Table 16-16:** The IPv6 related `sysctl` control MIBs
`net.inet6` sysctl MIB	Default	Purpose
`ip6.hlim`	64	IPv6 hop limit
`ip6.accept_rtadv`	1	Accept ICMPv6 Router Advertisements
`ip6.keepfaith`	0	Unused. Apparently IPv6 has grown disillusioned.
`ip6.log_interval`	5	Throttle kernel log output to once in `log_interval`
`ip6.hdrnestlimit`	15	IP header nesting limit
`ip6.dad_count`	1	Duplicate Address Detection count (read only)
`ip6.auto_flowlabel`	1	Assign IPv6 flow labels (in header)
`ip6.defmcasthlim`	1	Default multicast hop limit
`ip6.gifhlim`	0	Hop limit on GIF (IP-in-IP) interfaces
`ip6.use_deprecated`	1	Continue use of deprecated temporary address
`ip6.rr_prune`	5	Router renumbering prefix
`ip6.v6only`	0	If 0, enable IPv6 mapped addresses. Else, native IPv6 only
`ip6.use_tempaddr`	1	RFC3041 temporary interface addresses
`ip6.temppltime`	86400	Temporary address preferred lifetime (sec)
`ip6.tempvltime`	604800	Temporary address maximum lifetime (sec)
`ip6.auto_linklocal`	1	Automatically use link local (fe80::) addresses
`ip6.prefer_tempaddr`	1	Prefer the temporary address over the assigned one
`ip6.use_defaultzone`	0	Embed default scope ID
`ip6.maxfrags`	3072	Maximum number of IPv6 fragments allowed
`ip6.mcast_pmtu`	0	Enable Multicast Path MTU discovery
`ip6.neighborgcthresh`	1024	Neighbor cache garbage collection threshold
`ip6.maxifprefixes`	16	Maximum interface prefixes adopted from router advertisements
`ip6.maxifdefrouters`	16	Maximum default routers adopted from router advertisements
`ip6.maxdynroutes`	1024	Maximum number of dynamic (via redirect) routes allowed
`ip6.input_perf_bins`	0	bins for chaining performance data histogram
`ip6.select_srcif_debug`	0	Debug (log) selection process of source interface
`ip6.select_srcaddr_debug`	0	Debug (log) selection process of source address
`ip6.select_src_expensive_secondary_if`	0	Allow source address selection to use interfaces w/high metric
`ip6.select_src_strong_end`	1	limit source address selection to outgoing interface
`ip6.only_allow_rfc4193_prefixes`	0	Use RFC4193 as baseline for network prefixes
`ip6.maxchainsent`	1	use dlil_output_list
`ip6.dad_enhanced`	1	Adds a random nonce to NS messages for DAD.

IPSec (6) Configuration

IPSec is deeply integrated into IPv6, and therefore more likely to be used with it than in the IPv4 case. Many of the ipsec6 MIB values also apply to IPv4 (i.e. exist in net.inet.ipsec as well), and values which apply to both are under net.ipsec (not shown below).

**Table 16-17:** The `net.inet6.ipsec6` related `sysctl` control MIBs
`net.inet6.ipsec6` MIB	Default	Purpose
`def_policy`	1	Default Policy
`esp_trans_deflev`	1	Encapsulating Security Payload in Transport Mode
`esp_net_deflev`	1	Encapsulating Security Payload in Network Mode
`ah_trans_deflev`	1	Authentication Header in Transport Mode
`ah_net_deflev`	1	Authentication Header in Network mode
`ecn`	0	Toggle Explicit Congestion Notifications
`debug`	0	Toggle logging and Debugging
`esp_randpad`	-1	Pad Encapsulating Security Payload with random bytes

ICMPv6 Configuration

ICMPv6 (RFC4443) is also tightly knit into IPv6, and includes the sub protocols of Neighbor Discovery (ND) and SEcure Neighbor Discovery (SEND). Another sub protocol, Multicast Listener Discovery (MLD) has subtleties between version 1 (RFC2710) and version 2 (RFC3810), both of which are supported by the Darwin network stack.

**Table 16-18:** The `net.inet6` ICMPv6, ND and SEND related `sysctl` control MIBs
`net.inet6` MIB	Default	Purpose
`icmp6.rediraccept`	1	Accept and process redirects
`icmp6.redirtimeout`	600	Expire ICMP redirected route entries after n seconds
`icmp6.rappslimit`	10	Router Advertisement Packets per second limit
`icmp6.errppslimit`	500	packet-per-second error limit
`icmp6.nodeinfo`	3	enable/disable NI response
`icmp6.nd6_prune`	1	Walk list every n seconds
`icmp6.nd6_prune_lazy`	5	Lazily walk list every n seconds
`icmp6.nd6_delay`	5	Delay first probe in seconds
`icmp6.nd6_[u/m]maxtries`	3	Maximum [unicast/multicast] ND query attempts
`icmp6.nd6_useloopback`	1	Allow ND6 to operate on loopback interface
`icmp6.nd6_debug`	0	Output ND debug messages to kernel log
`icmp6.nd6_accept_6to4`	1	Accept neighbors from 6-to-4 links
`icmp6.nd6_optimistic_dad`	63	Assume Duplicate Address Detection won't ever collide
`icmp6.nd6_onlink_ns_rfc4861`	0	Accept 'on-link' nd6 NS in compliance with RFC 4861
`icmp6.nd6_llreach_base`	30	default ND6 link-layer reachability max lifetime (in seconds)
`icmp6.nd6_maxsolstgt`	8	maximum number of outstanding solicited targets per prefix
`icmp6.nd6_maxproxiedsol`	4	maximum number of outstanding solicitations per target
`send.opmode`	1	Configured SEND operating mode
Multicast Listener Discovery
`mld.gsrdelay`	10	Rate limit for IGMPv3 Group-and-Source queries in seconds
`mld.v1enable`	1	Support MLDv1 (RFC2710)
`mld.v2enable`	1	Support MLDv2 (RFC3810)
`mld.use_allow`	1	Use ALLOW/BLOCK for RFC 4604 SSM joins/leaves
`mld.debug`	0	Output MLD debug messages to kernel log

TCP configuration

Darwin's TCP implementation has a huge number of settings, which toggle support for various RFCs and best practices. They are all under net.inet.tcp, and apply the same way for both IPv4 and IPv6, save for [v6]mssdflt. Table 16-19 lists them all:

**Table 16-19:** The TCP related `sysctl` control MIBs
`net.inet.tcp` MIB	Default	Purpose
`[v6]mssdflt`	[0x400]/0x200	Default TCP Maximum Segment Size
`keepidle`	0x006ddd00	Keepalive timeout for idle connections
`keepintvl`	0x000124f8	Keepalive interval
`sendspace`	0x00020000	Maximum outgoing TCP datagram size
`recvspace`	0x00020000	Maximum incoming TCP datagram size
`randomize_ports`	0000000000	Randomize TCP source ports
`log_in_vain`	0000000000	Log all incoming TCP packets
`blackhole`	0000000000	Do not send RST when dropping refused connections
`keepinit`	0x000124f8	TCP connect idle keep alive time
`disable_tcp_heuristics`	0000000000	Set to 1, to disable all TCP heuristics (TFO, ECN, MPTCP)
`delayed_ack`	0x00000003	Delay ACK to try and piggyback it onto a data packet
`tcp_lq_overflow`	0x00000001	Listen Queue Overflow
`recvbg`	0000000000	Receive background
`drop_synfin`	0x00000001	Drop TCP packets with SYN+FIN set
`slowlink_wsize`	0x00002000	Maximum advertised window size for slowlink
`rfc1644`	0000000000	T/TCP support
`rfc3390`	0x00000001	Increased Initial Window
`rfc3465`	0x00000001	Congestion Control with Appropriate Byte Counting (ABC)
`rfc3465_lim2`	0x00000001	Appropriate bytes counting w/ L=2*SMSS
`doautorcvbuf`	0x00000001	Enable automatic socket buffer tuning
`autorcvbufmax`	0x00100000	Maximum recieve socket buffer size
`disable_access_to_stats`	0x00000001	Disable access to tcpstat
`rcvsspktcnt`	0x00000200	packets to be seen before receiver stretches acks
`rexmt_thresh`	0x00000003	Duplicate ACK Threshold for Fast Retransmit
`slowstart_flightsize`	0x00000001	Slow start flight size
`local_slowstart_flightsize`	0x00000008	Slow start flight size (local networks)
`tso`	0x00000001	TCP Segmentation offload
`ecn_initiate_out`	0x00000002	Initiate ECN for outbound
`ecn_negotiate_in`	0x00000002	Initiate ECN for inbound
`ecn_setup_percentage`	0x00000064	Max ECN setup percentage
`ecn_timeout`	0x0000003c	Initial minutes to wait before re-trying ECN
`packetchain`	0x00000032	Enable TCP output packet chaining
`socket_unlocked_on_output`	0x00000001	Unlock TCP when sending packets down to IP
`recv_allowed_iaj`	0x00000005	Allowed inter-packet arrival jiter
`min_iaj_win`	0x00000010	Minimum recv win based on inter-packet arrival jitter
`acc_iaj_react_limit`	0x000000c8	Accumulated IAJ when receiver starts to react
`doautosndbuf`	0x00000001	Enable send socket buffer auto-tuning
`autosndbufinc`	0x00002000	Increment in send bufffer size
`autosndbufmax`	0x00100000	Maximum send buffer size
`ack_prioritize`	0x00000001	Prioritize pure ACKs
`rtt_recvbg`	0x00000001	Use RTT for bg recv algorithm

**Table 16-19 (cont.):** The TCP related `sysctl` control MIBs
`net.inet.tcp` MIB	Default	Purpose
`recv_throttle_minwin`	0x00004000	Minimum recv win for throttling
`enable_tlp`	0x00000001	Enable Tail loss probe
`sack`	0x00000001	TCP Selective ACK
`sack_maxholes`	0x00000080	Maximum # of TCP SACK holes allowed per connection
`sack_globalmaxholes`	0x00010000	Global maximum TCP SACK holes (across all connections)
`fastopen`	0x00000003	Enable TCP Fast Open (rfc7413)
`fastopen_backlog`	0x0000000a	Backlog queue for half-open TCP Fast Open connections
`fastopen_key`		TCP Fast Open key
`backoff_maximum`	0x00010000	Maximum time for which we won't try TCP Fast Open
`clear_tfocache`	0000000000	Toggle to clear the TFO destination based heuristic cache
`now_init`	0x2d88b850	Initial tcp now value
`microuptime_init`	0x000daa2d	Initial tcp uptime value in micro seconds
`minmss`	0x000000d8	Minimum TCP Maximum Segment Size
`do_tcpdrain`	0000000000	Enable tcp_drain routine for extra help when low on mbufs
`icmp_may_rst`	0x00000001	ICMP unreachable may abort connections in `SYN_SENT`
`rtt_min`	0x00000064	Minimum Round Trip Time value allowed
`rexmt_slop`	0x000000c8	Slop added to retransmit timeout
`win_scale_factor`	0x00000003	Sliding window scaling factor
`tcbhashsize`	0x00001000	Size of TCP control-block hashtable
`keepcnt`	0x00000008	number of times to repeat keepalive
`msl`	0x00003a98	Maximum segment lifetime
`max_persist_timeout`	0000000000	Maximum persistence timeout for ZWP
`always_keepalive`	0000000000	Assume SO_KEEPALIVE on all TCP connections
`timer_fastmode_idlemax`	0x0000000a	Maximum idle generations in fast mode
`broken_peer_syn_rexmit_thres`	0x0000000a	# rexmitted SYNs to disable RFC1323 on local connections
`path_mtu_discovery`	0x00000001	Enable Path MTU Discovery
`pmtud_blackhole_detection`	0x00000001	Path MTU Discovery Black Hole Detection
`pmtud_blackhole_mss`	0x000004b0	Path MTU Discovery Black Hole Detection lowered MSS
`cc_debug`	0000000000	Enable debug data collection
`use_newreno`	0000000000	Use TCP NewReno algorithm by default
`cubic_tcp_friendliness`	0000000000	Enable TCP friendliness
`cubic_fast_convergence`	0000000000	Enable fast convergence
`cubic_use_minrtt`	0000000000	use a min of 5 sec rtt
`lro`	0000000000	Used to coalesce TCP packets
`lro_startcnt`	0x00000004	Segments for starting LRO computed as power of 2
`lrodbg`	0000000000	Used to debug SW LRO
`lro_sz`	0x00000008	Maximum coalescing size
`lro_time`	0x0000000a	Maximum coalescing time
`bg_target_qdelay`	0x00000064	Target queueing delay
`bg_allowed_increase`	0x00000008	Modifier for calculation of max allowed congestion window
`bg_tether_shift`	0x00000001	Tether shift for max allowed congestion window
`bg_ss_fltsz`	0x00000002	Initial congestion window for background transport

MPTCP configuration

**Table 16-20:** The `sysctl(8)` MIBS of the `net.inet.mptcp` namespace
`net.inet.mptcp` MIB	Default	Purpose
`enable`	1	Global on/off switch
`mptcp_cap_retr`	2	Number of MP Capable SYN retries
`dss_csum`	0	Enable DSS checksum
`fail`	1	Failover threshold
`keepalive`	840	Keepalive (sec)
`rtthist_thresh`	600	Rtt threshold
`userto`	1	Disable RTO for subflow selection
`probeto`	1000	Disable probing by setting to 0
`dbg_area`	31	MPTCP debug area
`dbg_level`	1	MPTCP debug level
`allow_aggregate`	0	Allow the Multipath aggregation mode
`alternate_port`	0	Darwin 18: Set alternate port for MPTCP connections
`rto`	3	MPTCP restransmission Timeout
`rto_thresh`	1500	RTO threshold
`.tw`	60	MPTCP Timewait period

UDP configuration

UDP is a simple and stateless protocol, and therefore offers very few configuration options.

**Table 16-21:** UDP configuration parameters
`net.inet.udp` MIB	Default	Purpose
`checksum`	1	Enable UDP checksumming
`maxdgram`	9216	Maximum outgoing UDP datagram size
`recvspace`	196724	Maximum incoming UDP datagram size
`log_in_vain`	0	Log all incoming packets
`blackhole`	0	Do not send port unreachables for refused connects
`randomize_ports`	1	Randomize port numbers

ICMP configuration

ICMP behavior is similarly governed by various sysctl MIBs, which are mostly set by default to ignore known problematic protocol vulnerabilities, such as spoofed ICMP redirection and broadcast echo requests ("ping storms").

**Table 16-22:** ICMP(v4) configuration parameters
`net.inet.icmp` MIB	Default	Purpose
`maskrepl`	0000000000	Reply to Address Mask requests
`icmplim`	0x000000fa	ICMP limit
`timestamp`	0000000000	Respond to ICMP timestamp requests
`drop_redirect`	0x00000001	Ignore ICMP redirect messages
`log_redirect`	0000000000	Log ICMP redirect messages (to `dmesg(1)`)
`bmcastecho`	0x00000001	Broadcast/Multicast ICMP echo requests

Networking Statistics

It's important for any system to keep detailed statistics on detailed usage, and to provide them to the administrator in the clearest ways possible. Darwin systems contain quite a few statistics mechanisms of varying detail level and purpose. The chief tool is, of course, the aptly named netstat(1), which presents statistics about interfaces (-i/-I), routes (-r), multicast groups (-g), memory consumption (-m), general protocol statistics (-s), and - naturally - active sockets (with no other arguments or with -a). As useful as it is, though, netstat(1) is little more than a parser of the raw statistics, which it obtains from sysctl MIBS.

`sysctl` MIBs

Along with configuration settings, the network stack exports a plethora of statistics through sysctl MIBs. Per-family statistics are exported through net.family.socktype.stats, with the family being local, inet, link and systm, and the socktype being (respectively) stream/dgram, tcp/udp/igmp/icmp/ipset, generic/ether/bridge, and kevt/kctl. These end up in the much more readable form of the netstat -s output.

The live connection statistics are maintained in net.family.socktype.pcblist*, with the family and socktype being almost the same as with the stats: There are no link pcbs (as there are no connections at the link layer level), and the inet socktypes are only tcp/udp/raw/mptcp. The three pcblist* variants are pcblist, pcblist64 and pcblist_n, offering different concatenated structures for the statistics, all defined in various locations throughout the kernel headers. Listing 16-23 shows a break down of a TCP PCB strcture:

Listing 16-23: Parsing a kernel provided TCP PCB

// bsd/netinet/in_pcb.h
struct  xinpgen { ... }    // xig_ fields

// bsd/sys/socketvar.h
struct xsocket_n { ... }   // xso_ and xso_ fields

// bsd/netinet/in_pcb.h
#define   XSO_INPCB       0x010  // structure is an xinpcb_n 
struct  xinpcb_n { ... } // xi_ and inp_ fields 

#
# Raw data from sysctl -A -X: 
#
net.inet.tcp.pcblist_n: Format:S,xtcpcb_n Length:22544 Dump:

  xig_len  | xig_count | xig_gen         |  xig_sogen       |  xgn_len | xgen_kind
0x18000000   26000000  |a17d1200 00000000| da01ce00 00000000| 68000000 | 10000000  

 		         443     59305
       xi_inpp      |inp_fport|inp_lport| inp_ppcb (permuted)|    inp_gencnt        
  d7feb971 bfe44d8e |   01bb  |  e7a9   |  8f01ba71 bfe44d8e | 9b7d1200 00000000  

                 IN_IPV4   inp_ip_p(rotocol)                   104.244.42.2
inp_flags inp_flow   ↓ ttl ↓                                  inp46_foreign
40088000  00000000  01|40|00|00   00000000  00000000  00000000  68f42a02 

			192.168.0.108
	                  inp46_local
00000000 00000000 00000000 c0a8006c.... 

#
# netstat  output
#
tcp4       0      0  192.168.0.108.59305    104.244.42.2.443      ESTABLISHED

A good example of parsing the PCBs can be found in the open sources of netstat(1) (in the network_cmds project).

`com.apple.network.statistics`

Using the pcblist* MIBs has major drawbacks. Not only is collecting the statistics a lengthy operation, but the statistics themselves are just a snapshot - and with the dynamic nature of network connections, likely to be stale within minutes, if not far less. Another problem is that it is very difficult to associate the connections to their respective owners.

The PF_SYSTEM/SYSPROTO_CONTROL socket of com.apple.network.statistics provides a far better mechanism - one which not only provides a constant stream of network statistics through the control socket, but also provides a way to match connections to the originating process.

Once a control socket is set up, command messages (in the 1xxx range) may be sent to the kernel, which will be replied to with messages from the kernel (in the 1xxxx range). One command may generate quite a few replies - as is common when adding sources. Commands are defined (along with all of the interface) in bsd/net/ntstat.h. The header is marked as PRIVATE, so it does not make it into user mode.

Listing 16-24: The NSTAT_MSG_TYPEs, from bsd/net/ntstat.h

#pragma mark -- Network Statistics User Client --
#define NET_STAT_CONTROL_NAME   "com.apple.network.statistics"
enum {
        // generic response messages
        NSTAT_MSG_TYPE_SUCCESS = 0
        ,NSTAT_MSG_TYPE_ERROR = 1

        ,NSTAT_MSG_TYPE_ADD_SRC = 1001
        ,NSTAT_MSG_TYPE_ADD_ALL_SRCS = 1002
        ,NSTAT_MSG_TYPE_REM_SRC = 1003
        ,NSTAT_MSG_TYPE_QUERY_SRC = 1004
        ,NSTAT_MSG_TYPE_GET_SRC_DESC = 1005
        ,NSTAT_MSG_TYPE_SET_FILTER = 1006          // 2422
        ,NSTAT_MSG_TYPE_GET_UPDATE = 1007          // 3248
        ,NSTAT_MSG_TYPE_SUBSCRIBE_SYSINFO = 1008   // 3248

        // Responses/Notfications
        ,NSTAT_MSG_TYPE_SRC_ADDED = 10001
        ,NSTAT_MSG_TYPE_SRC_REMOVED = 10002
        ,NSTAT_MSG_TYPE_SRC_DESC = 10003
        ,NSTAT_MSG_TYPE_SRC_COUNTS = 10004
        ,NSTAT_MSG_TYPE_SYSINFO_COUNTS = 10005
        ,NSTAT_MSG_TYPE_SRC_UPDATE = 10006       };

Each source addition creates an associated descriptor, which may be queried by using the ...GET_SRC_DESC message. Descriptors are nstat_[tcp/udp/route]_descriptor structures. Apple continuously modifies these data structures, breaking the direct API between XNU versions and making it really difficult to work directly through the control socket.

The nettop(1) utility (part of the closed source NetworkStatistics package), provides an example of com.apple.network.statistics capabilities. The utility is a "live" (but crude) netstat(1), though it doesn't use the low level sockets directly, instead opting for the higher level wrappers NStatManager* APIs of the private NetworkStatistics framework. This is not without benefits, since the API is a block driven, CF* object aware interface, which serves as an adapter layer and thus decouples from the low level socket structures.

An NStatManagerManager is instantiated with a call to NStatManagerCreate, with a kCFAllocator, options and a callback block. Sources can be added with any of the NStatManagerAddAll[TCP/UDP][/With[Filter/Options]], or (for route sources) NStatManagerAddAllRoutes[WithFilter]. Adding routes triggers the callback block, which gets the NStatSource as an argument. The source objects can be manipulated through blocks with NStatSourceSet[Counts/Events/Description/Removed]Block, which are called with their respective objects as arguments.

The description object is a particularly detailed CFDictionary, providing the properties (resolving enums to human readable form where necessary) from the nstat_[tcp/udp/route]_descriptor, combining them with nstat_counts in a convenient CFDictionary form, as shown in Table 16-25:

**Table 16-25:** The keys of the `NStat` descriptor object
Property	Descriptor field
epid	`epid`
processID	`pid`
uniqueProcessID	`eupid`
processName	`pname`
euuid	`euuid`
startAbsoluteTime	`start_timestamp`
durationAbsoluteTime	`timestamp - start_timestamp`
interface	`ifindex`
[local/remote]Address (CFDATA)	`[local/remote].[v4/v6]`
provider	N/A (descriptor type)
[rx/tx]Bytes	`nstat_[rx/tx]bytes`
[rx/tx][/Cellular/WiFi/Wired]Bytes	`nstat_[cell/wifi/wired]_[rx/tx]bytes`
trafficClass	`traffic_class`
uuid	`uuid`
receiveBuffer[Size/Used]	`rcvbuf[size/used]`
TCP sources
rx[Duplicate/OutOfOrder]Bytes	`nstat_rx[duplicate/outoforder]bytes`
congestionAlgorithm	`cc_algo`
rtt[Average/Minimum/Variation]	`nstat_[min/avg/var]_rtt`
connect[Attempts/Successes]	`nstat_connect[attempt/successes]`
TCPState	`state`
txRetransmittedBytes	`nstat_txretransmit`
txUnacked	`txunacked`
TCP[Congestion]Window	`tx[c]window`
trafficManagementFlags	`traffic_mgt_flags`

The lsock(j) companion tool matches and exceeds the functionality of nettop(1) - and is available in open source. Note, that because it uses the APIs directly, it might very well be outdated by the time you try it: The example had to be updated multiple times in the past to catch up with the changing structures, and there is no guarantee Darwin 18 won't break its function.

Another hurdle is an entitlement - com.apple.private.network.statistics - which may be required for using the com.apple.network.statistics control socket. "May", because at the moment this requirement can be toggled (by the root user) using the net.statistics_privcheck sysctl MIB. This value is already set to '1' on *OS variants, but still '0' (for the moment) on MacOS. In *OS this is isn't much of an issue, since running arbitrary code implies a jailbreak, root access and arbitrary entitlements. Should the MacOS sysctl be set to '1' and possibly locked, however, administrators will need to disable SIP or only use Apple's "approved" (but painful) nettop(1).

The following experiment shows a quick and very dirty program to mimic nettop(1)'s usage of the NetworkStatistics.framework.

Experiment: Exploring the private NetworkStatistics.framework APIs

The main client for NetworkStatistics.framework is Darwin's nettop(1), which displays a live netstat(1) like output. Unfortunately, the tool is closed source, crude and hard to work over its curses interface, and unavailable for *OS variants. Fortunately, it's fairly straightforward to disassemble, and build a functional (albeit more limited) clone, shown in the following Listing.

Listing 16-26: A simple nettop(1) clone

#include <dispatch/dispatch.h>
#include <CoreFoundation/CoreFoundation.h>

// gcc-arm64 netbottom.c -o /tmp/netbottom 
//           -framework CoreFoundation -framework NetworkStatistics

// The missing NetworkStatistics.h...
typedef void    *NStatManagerRef;
typedef void    *NStatSourceRef;

extern CFStringRef kNStatSrcKeyProvider;

NStatManagerRef NStatManagerCreate (const struct __CFAllocator *,
                             dispatch_queue_t,
                             void (^)(NStatManagerRef));

int NStatManagerSetInterfaceTraceFD(NStatManagerRef, int fd);
int NStatManagerSetFlags(NStatManagerRef, int Flags);
int NStatManagerAddAllTCPWithFilter(NStatManagerRef, int, int);
int NStatManagerAddAllUDPWithFilter(NStatManagerRef, int, int);
void *NStatSourceQueryDescription(NStatSourceRef);
CFStringRef NStatSourceCopyProperty (NStatSourceRef, CFStringRef);
void NStatSourceSetDescriptionBlock (NStatSourceRef,  void (^)(void *));

void (^description_callback_block) (void *) = ^(CFDictionaryRef Desc) {
   // Simple example - just dump the Description dictionary to stderr
  CFShow(Desc);
};

void (^callback_block) (void *, void *)  = ^(NStatSourceRef arg){

  // Arg is NWS[TCP/UDP]Source. We can tell which by property:
  const CFStringRef prop  = NStatSourceCopyProperty (arg, kNStatSrcKeyProvider);

  NStatSourceSetDescriptionBlock (arg, description_callback_block);
  void *desc = NStatSourceQueryDescription(arg); // Continued in callback
};

int main (int argc, char **argv) {

   NStatManagerRef      nm = NStatManagerCreate (kCFAllocatorDefault,
                                  &_dispatch_main_q,
                                  callback_block);

   int rc = NStatManagerSetFlags(nm, 0);
  
   // A trace file will show the raw nstat messages
   int fd = open ("/tmp/netbottom.trace", O_RDWR| O_CREAT | O_TRUNC);
   rc = NStatManagerSetInterfaceTraceFD(nm, fd);
   
   rc = NStatManagerAddAllTCPWithFilter (nm, 0 , 0);
   rc = NStatManagerAddAllUDPWithFilter (nm, 0 , 0);

   dispatch_main();
}

As barebones as this listing is, it will nonetheless compile cleanly for both MacOS and the *OS variants. Note, that in the *OS case Apple has removed the private framework ".tbd" files, which are required for linkage. Those are easy enough to recreate using jtool2's --tbd option. You can find the listing online on the book's companion website^[3].

/var/networkd/netusage.sqlite

All Darwin flavors offer aggregate statistics at the process level, summing up bandwidth usage for every process on the system by its binary name. The database used is /var/networkd/netusage.sqlite, though the role of networkd is actually filled by /usr/libexec/symptomsd.

As the database name implies, it is a SQLite3 file, which makes it very easy to inspect - assuming root or _networkd (uid 24) credentials. Binaries are given unique identifiers in the ZPROCESS table, which remains across multiple times they may be executed. The unique id (Z_PK) is then used to track the binary across other tables, the most useful of which is ZLIVEUSAGE, which keeps the aggregate statistics. Using sqlite3 on the database would show something similar to Output 16-27:

Output 16-27: The netusage.sqlite database

root@Chimera(~)# sqlite3 /var/networkd/netusage.sqlite
sqlite> .headers yes
sqlite> select * from ZPROCESS;
Z_PK|Z_ENT|Z_OPT|ZFIRSTTIMESTAMP |ZTIMESTAMP      |ZBUNDLENAME             |ZPROCNAME
1   |    9| 1399|502587590.334238|565020796.563935|com.apple.configd       |com.apple.configd
2   |    9|  218|502587591.149515|563217197.06509 |com.apple.captiveagent  |com.apple.captiveagent
3   |    9|    7|502587591.625211|560016270.022554|com.apple.SetupAssistant|com.apple.SetupAssistant
..
#
# Retrieve the unique identifier of a particular process name
#
sqlite> select Z_PK from ZPROCESS where ZPROCNAME='com.apple.Safari';
Z_PK
39
#
# Retrieve the statistics for said process by joining tables
#
sqlite> select ZLIVEUSAGE.*  from ZLIVEUSAGE JOIN ZPROCESS 
           ON ZPROCESS.Z_PK = ZLIVEUSAGE.Z_PK WHERE ZPROCESS.ZPROCNAME ='com.apple.Safari';

Z_PK|Z_ENT|Z_OPT|ZKIND|ZMETADATA|ZTAG|ZHASPROCESS|Z8_HASPROCESS|ZALLFLOWS|ZBILLCYCLEEND|ZJUMBOFLOWS|
39  |6    | 1722 |   0|        0|   0|         39|            9|1737372.0|             |      240.0|

ZTIMESTAMP      |ZWIFIIN       |ZWIFIOUT    |ZWIREDIN|ZWIREDOUT|ZWWANIN    |ZWWANOUT  |ZXIN|ZXOUT
502587891.997509|261417369768.0|6955273130.0|     0.0|      0.0|402017694.0|17913084.0| 0.0|  0.0
...

Firewalling

Network connectivity extends the system's reach to the four corners of the Internet, but also vice versa. A firewall has thus become an integral part of any system's defense, and MacOS has not one, but several. This section discovers those mechanisms which are accessible through user mode - The Application Layer Firewall, ipfw (briefly, as it is deprecated), and pf. Kernel-accessible mechanisms (socket, IP and interface filters) are left for Volume II.

Figure 16-28: The MacOS Firewall settings pane

MacOS: The Application Layer Firewall

The Application Layer Firewall, commonly referred to by the fuzzy nickname ALF, is a MacOS proprietary mechanism introduced as far back as MacOS 10.5.1 to provide Application-aware firewall capabilities. Apple provides a brief article^[4] to explain its usage, which is quite straightforward. A nifty feature not found elsewhere is integration with Darwin's built-in code signing mechanism, which enables the identification of trusted (built-in and/or downloaded signed) software to receive incoming connections. Another option is to activate "stealth mode", which blocks replies to ICMP messages, making the system unresponsive to scanning.

The Application Layer Firewall is comprised of a kernel extension (ALF.kext, which identifies by its CFBundleIdentifier of com.apple.nke.applicationfirewall), and several binaries, all in /usr/libexec/ApplicationFirewall. The extension is loaded by default across versions of MacOS, even if the Firewall is turned off. The user mode binaries - the main one is socketfilterfw(8), which manages the kext. This daemon loads its defaults from the com.apple.alf.plist in the directory, and claims the com.apple.alf Mach service. When it needs UI interaction, it calls on CFUserNotificationCreate (q.v. Chapter 5) to create a pop-up dialog with the resources from ApplicationFirewall.bundle (in CoreServices). The daemon may also start Firewall(8) for user authorizations through com.apple.alf.useragent, with a protocol consisting of a single MIG message (#9999)^*.

When the firewall settings are modified through System Preferences.app, the preference pane posts CFNotificationCenter (q.v. Chapter 5), with a name of "com.apple.alf". The notification's object field designates it at "firewalloptions", "app[added/removed]", "[app/service]statechanged", etc. The userinfo field contains the firwall set request, as an XML propery list. Listing 16-29 shows two messages (in SimPLISTic format):

Figure 16-29: Tokens generated when modifying ALF behavior from the GUI, or using socketfilterfw(8)

# Generated when the GUI turns off the firewall
# Payload is plist with 'globalstate' integer, values 0, 1 or 2
 object: firewalloptions
 userinfo: [DATA, 234] .....<key>globalstate</key>\n\t<integer>0</integer>...
 name: com.apple.alf
 token: 1000001
 method: post_token
 version:1
# Enabling stealth mode
# Stealth mode: Payload is plist with 'allowdownloadsignedenabled', 
# 'allowsignedenabled', and 'stealthenabled' integer (boolean) keys
 object: stealthmodechanged
 userinfo: [DATA, 351] ..<key>allowdownloadsignedenabled</key>....
   ...<key>stealthenabled</key>\n\t<integer>1</integer>\n</dict>\n</plist>"
 name: com.apple.alf
 token: 1000001
 method: post_token
 version: 1

* - A lesser daemon, appfwloggerd, was previously used to listen on an event socket for messages from ipfw.

The /usr/libexec/ApplicationFirewall/socketfilterfw accepts the notifications, and proceeds to acts on the contents of the userinfo data, translating the XML property list into the socket filtering rules it needs to apply. The main rules are in in payloads of appadded and/or appstatechanged, under the alias key, which is a (again) a base64 encoded plist (cfdata), whose contents are a binary details with details about the application for which a rule is added:

Listing 16-30: The contents of an alias entry, twice base-64 decoded

00000000  00 00 00 00 01 2e 00 02  00 01 0c 4d 61 63 69 6e  |...........Macin|
00000010  74 6f 73 68 20 48 44 00  00 00 00 00 00 00 00 00  |tosh HD.........|
00000020  00 00 00 00 00 00 00 00  00 00 42 44 00 01 ff ff  |..........BD....|
00000030  ff ff 0d 41 70 70 20 53  74 6f 72 65 2e 61 70 70  |...App Store.app|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000070  00 00 ff ff ff ff 00 00  00 00 00 00 00 00 00 00  |................|
00000080  00 00 ff ff ff ff 00 00  0a 20 63 75 00 00 00 00  |......... cu....|
00000090  00 00 00 00 00 00 00 00  00 0c 41 70 70 6c 69 63  |..........Applic|
000000a0  61 74 69 6f 6e 73 00 02  00 1d 2f 3a 41 70 70 6c  |ations..../:Appl|
000000b0  69 63 61 74 69 6f 6e 73  3a 41 70 70 20 53 74 6f  |ications:App Sto|
000000c0  72 65 2e 61 70 70 2f 00  00 0e 00 1c 00 0d 00 41  |re.app/........A|
000000d0  00 70 00 70 00 20 00 53  00 74 00 6f 00 72 00 65  |.p.p. .S.t.o.r.e|
000000e0  00 2e 00 61 00 70 00 70  00 0f 00 1a 00 0c 00 4d  |...a.p.p.......M|
000000f0  00 61 00 63 00 69 00 6e  00 74 00 6f 00 73 00 68  |.a.c.i.n.t.o.s.h|
00000100  00 20 00 48 00 44 00 12  00 1a 41 70 70 6c 69 63  |. .H.D....Applic|
00000110  61 74 69 6f 6e 73 2f 41  70 70 20 53 74 6f 72 65  |ations/App Store|
00000120  2e 61 70 70 00 13 00 01  2f 00 ff ff 00 00        |.app..../.....  |

Rules are enforced by ALF.kext through applying kernel socket filters (sflt_* KPIs, as explained in Volume II). The sockfilterfw daemon communicates with the kernel extension over a com.apple.nke.sockwall PF_SYSTEM/SYSPROTO_CONTROL. The protocol is a simple TLV (type-length-value), with the types shown in Table 16-31.

**Table 16-31:** The `com.apple.nke.sockwall` protocol command types
#	Command	Purpose
0	result	Inserts a new rule for a process
1	proc_rules	Inserts a new rule for a process
3	ask	Kext requests a user prompt
5	dumpinfo	Useful to dump the kext process list into `dmesg`
6	verify	Kext requests process rule verification
7	setpath	Add a process path
8	updaterules	Called when rulebase changes
9	releasepcachedpath	Kext requests invalidation of a PID (by path) from cache
10	unloadkext	Unload the kernel extension, if possible
11	addapptolist	Add an application
12	changelogmode	Change kext logging mode
13	changetrustmode	Change trust mode
14	askmsgrelease	Dismiss pending ask
15	changelogopt	Change logging options
16	changeapptrustmode	Change app trust mode

Some of the message types are no longer implemented in Darwin 18. socketfilterfw also contains a few references to ipfw sysctls, which are no longer implemented (as explained next) as well. The daemon may be configured to log excessively by changing its LaunchDaemon property list's Program string to a ProgramArguments array and adding -d and/or -l. The daemon normally relays PF_SYSTEM/SYSPROTO_EVENT messages it receives from the kernel extension for the APPLE:NETWORK:LOG provider, and another unnamed provider at 1000:5:11. The ALF kext also registers the net.alf MIB namespace, with a loglevel bitmask, permission check, defaultaction and (read-only) mqcount.

`ipfw` (Deprecated)

Darwin has used BSD's ipfw mechanism for many years - until it was removed in Darwin 16. The code for implementing the mechanism - in bsd/netinet/ip_fw2[_compat].[ch] and bsd/netinet6/ip6_fw.[ch] - is very much intact, but it is contingent on IPFW2 and other #defines which are no longer enabled. The user mode controller, ipfw(8) has been removed. Some discussion of this facility can be found in the BSD respective manual pages, as well as the first edition of this work (at which time it was still deemed relevant in Darwin).

pf

The pf facility, another relic of BSD but still in wide use, provides an alternative network layer firewalling mechanism. The facility appears in user mode as two character devices - /dev/pf and /dev/pfm. The /dev/pf character device can be used to create and apply firewalling rulesets, using ioctl(2) codes. This functionality is not unlike Linux's netfilter (a.k.a iptables). The /dev/pfm device, used only in Darwin, serves a similar function.

The pf facility makes use of a configuration file, /etc/pf.conf, which is also well documented in the pf.conf(5) manual page. An additional file, /etc/pf.os, is used as an operating system finger print database. There is also an /etc/pf.anchors directory, which is used to load the com.apple anchors for AirDrop and ALF (from a load anchor statement in /etc/pf.conf).

System administrators wishing to configure pf often use the pfctl(8) command line. The tool is well documented in its man page, which is left for the interested reader to peruse. Comprehensive documentation for the set of ioctl(2) codes can be found in pf(4), but this manual page has somehow been removed from Darwin releases. The Open BSD man page^[5] thus serves in its place, although there are some differences in the set of codes. Table 16-32 (next page) shows a summary of the ioctl(2) codes defined in Darwin, though some are not actively supported.

`PacketFilter.framework`

Using the ioctl(2) codes directly on /dev/pf is not only cumbersome, but requires root privileges. An alternative is provided by the private PacketFilter.framework, which offers a richer API of exported PF* functions, and PFUser/PFManager high level calls. First, a call to PFUserCreate starts a session. Then, PFUserBeginRules declares a rule set, in which rules can be manipulated using PFUser[Add/Insert/Delete]Rule. The set can be committed using a call to PFUserCommitRules. Similar APIs are PFManager[Get/Copy/Delete]Rules.

Rule transactions are submitted to the PFManager object, which uses PFXPC abstractions to communicate with pfd((8) through the com.apple.pfd service. The daemon (running as root) translates the XPC messages into the corresponding ioctl(2) codes, and returns any replies in XPC formatted dictionaries. The protocol can be reversed easily by using XPoCe on a running instance of pfd(8).

**Table 16-32:** The set of `/dev/pf` `ioctl(2)` codes
`DIOC` `ioctl(2)` code	Argument	Purpose
`DIOC[START/STOP]`	_IO ('D', 1/2)	Start/stop the packet filter facility
`DIOCADDRULE`	_IOWR('D', 4, struct pfioc_rule)	Add a `pfioc_rule` to (inactive) ruleset
`DIOCGETSTARTERS`	_IOWR('D', 5, struct pfioc_tokens)	Get starter tokens
`DIOCGETRULE[S]`	_IOWR('D', 6/7, struct pfioc_rule)	Obtain a ticket + num rules, or specific rule
`DIOCSTARTREF`	_IOR ('D', 8, u_int64_t)	Increment ref count, get token
`DIOCSTOPREF`	_IOWR('D', 9, struct pfioc_remove_token)	Decrement ref count with provided token
`DIOCCLRSTATES`	_IOWR('D', 18, struct pfioc_state_kill)	Clear packet filter state table
`DIOCGETSTATE`	_IOWR('D', 19, struct pfioc_state)	Retrieve specific state entry
`DIOCSETSTATUSIF`	_IOWR('D', 20, struct pfioc_if)	Toggle statistics on interface
`DIOCGETSTATUS`	_IOWR('D', 21, struct pf_status)	Get `pf_status` counters and data
`DIOCCLRSTATUS`	_IO ('D', 22)	Clear all `pf_status` counters
`DIOCNATLOOK`	_IOWR('D', 23, struct pfioc_natlook)	Look up a NAT state table entry
`DIOCSETDEBUG`	_IOWR('D', 24, u_int32_t)	Toggle debug
`DIOCGETSTATES`	_IOWR('D', 25, struct pfioc_states)	Retrieve all state entries
`DIOC[CHANGE/INSERT/DELETE]RULE`	_IOWR('D', 26/27/28, struct pfioc_rule)	Various rule manipulation actions
`DIOC[SET/GET]TIMEOUT`	_IOWR('D', 29/30, struct pfioc_tm)	Set/get state timeouts
`DIOCADDSTATE`	_IOWR('D', 37, struct pfioc_state)	Add a state entry
`DIOCCLRRULECTRS`	_IO ('D', 38)	Clear rule counters
`DIOC[GET/SET]LIMIT`	_IOWR('D', 39/40, struct pfioc_limit)	Set the hard limits on the memory pools
`DIOCKILLSTATES`	_IOWR('D', 41, struct pfioc_state_kill)	Remove matching entries from the state table
`DIOC[START/STOP]ALTQ`	_IO ('D', 42/43)	Requires ALTQ support, which Darwin does not provide
`DIOC[ADD/GET]ALTQ[/S]`	_IOWR('D', 45/47, struct pfioc_altq)
`DIOC[GET/CHANGE]ALTQ`	_IOWR('D', 48/49, struct pfioc_altq)
`DIOCGETQSTATS`	_IOWR('D', 50, struct pfioc_qstats)	Get queue statistics
`DIOC[BEGIN/GET]ADDRS`	_IOWR('D', 51/53, struct pfioc_pooladdr)
`DIOC[ADD/GET/CHANGE]ADDR`	_IOWR('D', 52/54/55, struct pfioc_pooladdr)
`DIOCGETRULESETS`	_IOWR('D', 58, struct pfioc_ruleset)	Get number of rulesets (anchors)
`DIOCGETRULESET`	_IOWR('D', 59, struct pfioc_ruleset)	Get anchor by number
`DIOCR[CLR/ADD/DEL]TABLES`	_IOWR('D', 60/61/62, struct pfioc_table)	Clear/add/delete tables
`DIOCRGETTABLES`	_IOWR('D', 63, struct pfioc_table)	Get table list
`DIOCR[GET/CLR/RST]TSTATS`	_IOWR('D', 64/65/73, struct pfioc_table)	Test if the given addresses match a table
`DIOCR[CLR/ADD/DEL]ADDRS`	_IOWR('D', 66-68, struct pfioc_table)	Clear/Add/Delete addresses in table
`DIOCR[SET/GET]ADDRS`	_IOWR('D', 69/70, struct pfioc_table)	Get/set addresses in table
`DIOCR[GET/CLR]ASTATS`	_IOWR('D', 71/72, struct pfioc_table)	Get/Clear address statistics
`DIOCRSETTFLAGS`	_IOWR('D', 74, struct pfioc_table)	Change const/persist flags of table
`DIOCRINADEFINE`	_IOWR('D', 77, struct pfioc_table)	Defines a table in the inactive set
`DIOCOSFPFLUSH`	_IO('D', 78)	Flush the passive OS fingerprint table.
`DIOCOSFP[ADD/GET]`	_IOWR('D', 79/80, struct pf_osfp_ioctl)	Add/retrieve passive OS fingerprint entry
`DIOCX[BEGIN/COMMIT/ROLLBACK]`	_IOWR('D', 81/82/83, struct pfioc_trans)	Clear/commit/undo inactive rulesets
`DIOCGETSRCNODES`	_IOWR('D', 84, struct pfioc_src_nodes)	Get source nodes
`DIOCCLRSRCNODES`	_IO('D', 85)	Clear list of source nodes
`DIOCSETHOSTID`	_IOWR('D', 86, u_int32_t)	Set host ID (for `pfsync(4)`)
`DIOCIGETIFACES`	_IOWR('D', 87, struct pfioc_iface)	Get list of interfaces
`DIOC[SET/CLR]IFFLAG`	_IOWR('D', 89/90, struct pfioc_iface)	Set/clear user flags
`DIOCKILLSRCNODES`	_IOWR('D', 91, struct pfioc_src_node_kill)	Explicitly remove source tracking nodes
`DIOCGIFSPEED`	_IOWR('D', 92, struct pf_ifspeed)	Get interface speed

Packet Capture

There are times when user mode needs packet capture capabilities on a given interface. The most common example of that is when using a sniffer (or "network analyzer") such as tcpdump(1) and its ilk. Being user mode tools, they must make use of some kernel facility to enable such features as promiscuous mode (in which the interface accepts all frames, not just broadcast/multicast and its own unicast), and getting packets normally destined for other applications. Apple's proprietary PF_NDRV is inadequate for general packet capture (as it can only intercept unregistered ethertype protocols), and BSD's PF firewalls, but doesn't actually relay filtered packets. so another mechanism is required.

BPF

Darwin follows the BSD model in implementing the Berkeley Packet Filter, commonly referred to as BPF. BPF is the brainchild of McCanne and Jacobson (of PPP compression and traceroute(1) fame), who presented the mechanism in a UseNIX 1993 paper^[6]. BPF was quite revolutionary, as it provided a full language, with which dynamic filter programs could be created in user space, and loaded directly into the kernel subsystem. It has since become a standard adopted by quite a few operating systems and the ubiquitous libpcap which powers tcpdump(1), Ethereal and many other tools (/usr/libexec/airportd is a recurring client). The BPF mechanism has also been ported to non-BSD based systems (notably, Linux and Android) and the code presented in this section is actually fully portable to those operating systems. The language of BPF has even been extended well past packets - and supports Linux's SECCOMP-BPF model for system call filtering, which is an instrumental part of Android's security.

BPF appears in user mode as a number of character devices, /dev/bpf##, with numbers usually in the 0 to 5 range. Any of these devices (unless already in use) may be open(2)ed, attached to an underlying interface using a BIOCSETIF ioctl(2), configured with a few other ioctl(2) codes, and then loaded with a BPF "program" through a BIOCSETF ioctl(2). Once the filter program is installed, the device's file descriptor lends itself to read(2) operations, which will provide any packets matching the filter loaded onto it. This also marks the corresponding device node as in use, which means that there is a hard limit in the system of up to however many devices are configured. The general flow of a BPF client is shown in Listing 16-33, next page. The full list of ioctl(2)s can be found in <net/bpf.h>, along with a staggering list of DLT_ constants for Data Link types, though the only ones of actual use are DLT_EN10MB (used for all modern Ethernet, not just 10MB), and DLT_USB_DARWIN, which is an Apple extension for the XHC* interfaces provided by IOUSBHostFamily.kext's AppleUSBHostPacketFilter.kext PlugIn.

The idea behind BPF is as simple as it is elegant: Consider an automaton with a single register, which may be directed to load a value from any offset in an input frame (i.e. including the layer II header), and perform a logical test on its value. The automaton would branch on the results of that test, and the process would continue until a decision could be made as to whether to accept or reject the packet in question. The accepted packets appear on the input device, and the rejected packets are merely rejected by the filter - that is, they do not get captured, but they are not firewalled (as they would be by the PF facility, which was described earlier).

BPF Programs

Listing 16-33 can be used for just about any generic sniffer/packet analyzer, but notice it's missing the actual BPF filter, which needs to be installed for the BPF mechanism to actually sift out frames. The BPF filter program needs to be specified as an array of BPF automaton struct bpf_insn instructions. The instruction structure consists (not in this order) of a 16-bit code, a uint32 constant k (used as an argument to the code), and two unsigned 8-bit offsets, jt and jf, which represent a jump offset to branch to in case the code is a logical BPF_J* test. Most BPF filters usually consist of a mix of BPF_LD statements (to read data from various offsets in an incoming frame) and BPF_JMP, to perform logical tests and branch accordingly. Note, however, that there are quite a few other opcodes - including destructive ones (e.g. BPF_ST[X], which will alter scratch memory, allowing the filter to maintain state.

Listing 16-33: The general framework of a BPF filter client

int main(int argc, char *argv[])
{
    int fd = 0;
    char *iface = NULL;
    int port, rc = 0 , enable =1;

    if (argc < 2 || argc > 3) { return 1; }
    iface = strdup(argc < 3 ? "en0" : argv[2]);
    port  = atoi(argv[1]);

    fd = open ("/dev/bpf1", O_RDWR);
    if (fd < 0) { /* device in use - could try another */ }

    struct ifreq ifr;

    /* Associate the bpf device with an interface */
    (void)strlcpy(ifr.ifr_name, iface, sizeof(ifr.ifr_name)-1);

    if(ioctl(fd, BIOCSETIF, &ifr) < 0) return 3;

    /* Monitor outgoing packets from interface as well */
    if(ioctl(fd, BIOCSSEESENT, &enable) < 0) return 4;

    /* Return immediately when a packet received */
    if(ioctl(fd, BIOCIMMEDIATE, &enable) < 0) return 5;

    /* Ensure we are dumping the datalink we expect */
    if(ioctl(fd, BIOCGDLT, &dlt) < 0) return 6;
    if (dlt !=  DLT_EN10MB) return 7;

    /* Prepare program -- see next listing */
    installFilter (fd, IPPROTO_TCP, port); 
	
    /* Get receive buffer length */
    if(ioctl(fd, BIOCGBLEN, &blen) < 0) return 8;
    char *buf = alloca(blen);

    while ((rc = read(fd, buf, blen)) > 0) {

	   fprintf(stderr, "Got frame (%d bytes)!\n");
	   /* Do something with frame.. e.g. overlay ip_hdr, tcp_hdr.. */
		...
	}
}

Rather than initializing the structure for every single instruction, two macros are commonly used. BPF_STMT takes the code and k values, for instructions which aren't logical tests. BPF_JUMP is used for tests, whose codes are of the BPF_JMP class, with whatever BPF_J* variant. This makes the BPF "assembly" (barely) manageable for human readers.

As an example, consider Listing 16-34, which presents a sample BPF filter program in installFilter. The listing demonstrates how to traverse an IPv4 packet: In the beginning of the program, the automaton's read stream is at the first byte of the frame - i.e. the Ethernet header. Since the IPv4 EtherType is always at offset 14 (past 6 bytes of destination MAC Address and 6 more of source), the value (16-bits) is loaded as a halfword with BPF_LD + BPF_H. It is then compared to the ETHERTYPE_IP (0x0800). If there is no match, the processing jumps 10 instructions forward, to the rejection (= return 0). If it is an IPv4 packet, processing continues (jumping 0 instructions forward, which means the next, since the program counter always points to the next instruction). As tests continue, the rejection offset grows closer and closer still - two instructions later, it is 8, two more make it 6, etc. If the flow makes it to assert that the frame is an IPv4 unfragmented, TCP packet with either the source or the destination port matching the one requested, the filter returns 0, and the frame makes it back to Listing 16-34's file descriptor, where it can be read and processed in user space.

Listing 16-34: A sample BPF program

int installFilter(int   fd, 
         unsigned char  Protocol, 
             unsigned short Port)
{
    struct bpf_program bpfProgram = {0};

    /* Dump IPv4 packets matching Protocol and (for IPv4) Port only */

    /* @param: fd - Open /dev/bpfX handle.               */
    
    const int IPHeaderOffset = 6 + 6 + 2; /* 14 */
    
    /* Assuming Ethernet (DLT_EN10MB) frames, We have: 
     *  
     * Ethernet header = 14 = 6 (dest) + 6 (src) + 2 (ethertype)
     * Ethertype is 8-bits (BFP_P) at offset 12
     * IP header len is at offset 14 of frame (lower 4 bytes). 
     * We use BPF_MSH to isolate field and multiply by 4
     * IP fragment data is 16-bits (BFP_H) at offset  6 of IP header, 20 from frame
     * IP protocol field is 8-bts (BFP_B) at offset 9 of IP header, 23 from frame 
     * TCP source port is right after IP header (HLEN*4 bytes from IP header)
     * TCP destination port is two bytes later
     *
     * Note Port offset assumes that this Protocol == IPPROTO_TCP!
     * If it isn't, adapting this to UDP port is left as an exercise to the reader,
     * as is extending this to support IPv6, as well..
     */

 struct bpf_insn insns[] = {

 /* Uncomment this line to accept all packets (skip all checks) */
 // BPF_STMT(BPF_RET + BPF_K, (u_int)-1),                   // Return -1 (packet accepted)

 BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, 6+6),             // Load ethertype 16-bits from 12 (6+6)
 BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, ETHERTYPE_IP, 0, 10), // Test Ethertype or jump(10) to reject
 BPF_STMT(BPF_LD  + BPF_B   + BPF_ABS, 23),              // Load protocol (= IP Header + 9 bytes) 
 BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K  , Protocol, 0, 8),  // Test Protocol or jump(8) to reject 
 BPF_STMT(BPF_LD  + BPF_H   + BPF_ABS, IPHeaderOffset+6),// Load fragment offset field 
 BPF_JUMP(BPF_JMP + BPF_JSET+ BPF_K  , 0x1fff, 6, 0),    // Reject (jump 6) if more fragments
 BPF_STMT(BPF_LDX + BPF_B   + BPF_MSH, IPHeaderOffset),  // Load IP Header Len (x4), into BPF_IND
 BPF_STMT(BPF_LD  + BPF_H   + BPF_IND, IPHeaderOffset),  // Skip hdrlen bytes, load TCP src
 BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K  , Port, 2, 0),      // Test src port, jump to "port" if true

 /* If we're still here, we know it's an IPv4, unfragmented, TCP packet, but source port
  * doesn't match - maybe destination port does? 
  */

 BPF_STMT(BPF_LD  + BPF_H   + BPF_IND, IPHeaderOffset+2), // Skip two more bytes, to load TCP dest
/* port */
 BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K  , Port, 0, 1),       // If port matches, ok. Else reject
/* ok: */
 BPF_STMT(BPF_RET + BPF_K, (u_int)-1),                    // Return -1 (packet accepted)
/* reject: */
 BPF_STMT(BPF_RET + BPF_K, 0)                             // Return 0  (packet rejected)
    };

BPF's sheer power may be too powerful at times! BPF has been the source of quite a few vulnerabilities, thanks to the automaton implementation living in kernel space. This allows potential integer overflows to be used outside the packet scope to read arbitrary kernel memory. Coupled with BPF_ST[X] instructions, which allow storing (= writing memory), this could be conducive to full kernel compromise. Additionally, the code around the filters (i.e. the ioctl(2) implementations) has been buggy in the past - as recently as Darwin 16 for BIOCSBLEN (CVE-2017-2482) .

Pseudo-Interfaces

There are times when frames or packets need to be captures simultaneously from multiple interfaces. One way of doing so is to run multiple BPF filters at the same time (over several /dev/bpf# devices). Doing so, however, not only risks depleting the available BPF devices, but also makes it difficult to correctly sync the capture streams. Another, is to use one of the pseudo interfaces supported by XNU, iptap or pktap.

The *tap interfaces are pseudo-interfaces, and normally do not appear when interfaces are listed with ifconfig. They are created on-demand when a packet capture program (notably, tcpdump(1) is used with pktap or iptap as the name of the interface, followed by a comma-delimited list of actual interfaces. The difference between the two *tap interfaces is the encapsulation exposed - pktap provides the full packet, whereas iptap provides the network layer (IPv6 or IPv4) and upwards. Both interfaces can be used with BPF, and appear with a DLT_PKTAP (also DLT_USER2, with a value of 149).

The tap interfaces are created programmatically using an SIOCIFCREATE ioctl(2), and marked to be removed when the creating process exits (Apple's libpcap project's libcap/libpcap-darwin.c provides a clear example of doing so). Taps also allow their own in-kernel filtering rules (by interface name or type), which are independent of BPF. These can be set with the SIOCSDRVSPEC ioctl(2) code. The network-cmds project pktapctl utility (not provided in Darwin releases) shows an example of getting and setting filters.

Using DLT_PKTAP also provides a significant benefit in allowing more metadata to be included for every packet captured. XNU's bsd/net/pktap.h defines the header which is artifically prepended for every packet returned by the interface. As shown in Listing 16-35, this provides plentiful (and useful) information, including the origin interface and actual DLT_* of the the packet, as well as the owning process pid and command name:

Listing 16-35: The DLT_PKTAP header, from XNU-4570's bsd/net/pktap.h

/*
 * Header for DLT_PKTAP
 *
 * In theory, there could be several types of blocks in a chain 
 *  before the actual packet
 */
struct pktap_header {
    uint32_t    pth_length;                    /* length of this header */
    uint32_t    pth_type_next;                 /* type of data following */
    uint32_t    pth_dlt;                       /* DLT of packet */
    char        pth_ifname[PKTAP_IFXNAMESIZE]; /* interface name */
    uint32_t    pth_flags;                     /* flags */
    uint32_t    pth_protocol_family;
    uint32_t    pth_frame_pre_length;
    uint32_t    pth_frame_post_length;
    pid_t       pth_pid;                       /* process ID */
    char        pth_comm[MAXCOMLEN+1];         /* process name */
    uint32_t    pth_svc;                       /* service class */
    uint16_t    pth_iftype;
    uint16_t    pth_ifunit;
    pid_t       pth_epid;                      /* effective process ID */
    char        pth_ecomm[MAXCOMLEN+1];        /* effective command name */
    uint32_t    pth_flowid;
    uint32_t    pth_ipproto;
    struct timeval32    pth_tstamp;
    uuid_t      pth_uuid;
    uuid_t      pth_euuid;  
};

Darwin's tcpdump implementation contains a non-standard -k switch, which will parse some of that metadata (specifically, pth_ifname, pth_[e]comm, pth_[e]pid and pth_svc) to show the details of the the process (or processes, if both are on the same host) to whose session each packet belongs.

Quality of Service

We've already discussed process and thread level Quality of Service, and with such formidable capabilities it's easy to forget the Quality of Service concept was originally "born" at the network layer. Due to Net Neutrality and other considerations, QoS isn't deployed on the global Internet, but it is nonetheless applicable on internal networks, up to the egress router and sometimes beyond.

QoS recognizes two modes - Integrated Services, and Differentiated Services. The former mode is handled by RSVP (the reservation protocol), and is not supported by XNU - which is not required, since the implementation can reside in user mode. The latter mode (DiffSrv) requires packet-level labeling, and is fully supported. The IPv4 "type of service" byte (the second byte of the header, right after the version/header-length "45") has been repurposed by RFC2474 and RFC3168 to provide a six-bit "Differentiated Services Code Point" (DSCP) and two bits of Explicit Congestion Notification (ECN). XNU supports later revisions of Diffsrv, including RFC2597 (Assured Forwarding Per-Hop-Behavior) and RFC5865 (Capacity-Admitted Traffic).

Darwin 17 adds a new (and, as usual, undocumented) system call - net_qos_guideline (#525). The System call provides a net_qos_param structure specifying a bandwidth requirement (upload or download) and the structure's (fixed) length. It returns a hint to user mode specifying whether this requirement would be subject to the default QoS policy, or should be marked as a background (BK) service type, which will prefer delay based flow algorithms.

Network Link Conditioning

XCode's "Additional Tools" disk image contains in between its many fabulous "Hardware Tools" the "Network Link Conditioner" Preference Pane. This plug-in to System Preferences.app provides a simple but effective GUI to Network Link Conditioning, which is the art of imposing artificial delay and packet loss based on whimsical parameters. This is commonly used to simulate low to miserable bandwidth conditions, and test their effects on applications.

The preference pane is merely a front-end: The actual work is performed by nlcd(8), which communicates with the GUI by means of MIG subsystem 40268. But it turns out that nlcd, too, doesn't want to get its hands dirty, and instead sends XPC messages to pfd(8) with the help of the private PacketFilter.framework. Although we've discussed pfd in the context of the PF facility earlier, this time the daemon interfaces with another kernel facility, called dummynet(4), which is responsible for the dirty work.

The dummynet mechanism, a facility to provide traffic shaping, bandwidth management and delay emulation, was devised by Luigi Rizzo in 1997, and extended in 2010^[7]. It was brought into BSD and its ipfw mechanism, and migrated to Darwin. Although ipfw is defunct in modern systems, dummynet is still fully operational. Its implementation is mostly contained in XNU's bsd/netinet/ip_dummynet.[ch], with several others modifications throughout the stack - mostly in the IPv4/6 input and output paths. All these are in #ifdef DUMMYNET blocks, meaning that XNU can be build without it, though that is seldom the case.

Dummynet works by defining flows, and funneling them into one or more "pipes", which emulate links with given bandwidth/delay/loss parameters. Pipes are managed with the help of "queues", which implement Worst-case Fair Weighted Fair Queueing (WF²Q+) and Random Early Detection (RED). The pipes are entirely virtual, and packets are passed through them before or after they flow through the physical interface, which is how the connection parameters can be enforced.

Pipes can be configured by creating a raw socket, and then issuing setsockopt(2) calls. Four options are defined: IP_DUMMYNET_CONFIGURE (60) creates of modifies a dummynet pipe. The pipe may be removed with IP_DUMMYNET_DEL (61). The list of pipes can be retrieved with IP_DUMMYNET_GET (64), and pipes can be flushed with IP_DUMMYNET_FLUSH (62). The command line dnctl(8) offers a far easier way to configure, by providing an extensive command line with a well documented manual page, complete with examples. This manual page also documents the sysctl(8) MIBs.

Network Extension Control Policies (Darwin 14+)

A major addition to Darwin's network stack are Network Extension Control Policies (NECPs), added to Darwin 14. NECPs are described in bsd/net/necp.c as "..high-level policy sessions, which are ingested into low-level kernel policies that control and tag traffic at the application, socket, and IP layers". In other words, NECPs enable user mode programs to control the kernel's network routing and scheduling decisions. Naturally, these also allow QoS.

The original interface provided for NECP is through a PF_SYSTEM/SYSPROTO_CONTROL socket. Using com.apple.net.necp_control as the control name, a socket can be created, and then read to and written from through a specialized packet protocol:

Listing 16-36: The NECP control socket interface (as of Darwin 14)

struct necp_packet_header {
    u_int8_t            packet_type;
    u_int8_t            flags;
    u_int32_t           message_id;
};

/*
 * Control message commands
 */

#define NECP_PACKET_TYPE_POLICY_ADD                    1
#define NECP_PACKET_TYPE_POLICY_GET                    2
#define NECP_PACKET_TYPE_POLICY_DELETE                 3
#define NECP_PACKET_TYPE_POLICY_APPLY_ALL              4
#define NECP_PACKET_TYPE_POLICY_LIST_ALL               5
#define NECP_PACKET_TYPE_POLICY_DELETE_ALL             6
#define NECP_PACKET_TYPE_SET_SESSION_PRIORITY          7
// Lock session so that only the originator can perform actions.
#define NECP_PACKET_TYPE_LOCK_SESSION_TO_PROC          8
#define NECP_PACKET_TYPE_REGISTER_SERVICE              9
#define NECP_PACKET_TYPE_UNREGISTER_SERVICE            10
#define NECP_PACKET_TYPE_POLICY_DUMP_ALL               11

In addition to the root privileges needed to open the control socket, some actions are deemed privileged, and require the PRIV_NET_PRIVILEGED_NECP_[POLICIES/MATCH] privileges. These are are tied to com.apple.private.necp.[policies/match] entitlements, and are presently granted only to a select few daemons, as you can view on the book's entitlement database.

The actual policies which may be defined are ridiculously rich and complex. Using a set of NECP_POLICY_CONDITION_* constants allow matching a policy to a particular DNS domain, local or remote address, specific IP protocol, PID, UID, entitlement-holder, interface, and more. Policies can also be ordered, so as to prioritize their application. Once applied, a policy result can be as simple as NECP_POLICY_RESULT_[PASS/DROP], but can also be any of several other NECP_POLICY_RESULT_* constants, to divert. filter or tunnel the flow, change a route rule, trigger or use a particular netagent (discussed later).

NECP descriptors and clients

Starting with Darwin 16, just about every network-enabled process in the system uses NECPs, oftentimes without the developer even knowing what they are. This is because libnetwork.dylib calls necp_open() (#501) as part of its initialization (specifically, from nw_endpoint_handler_start). This creates a necp client, a file descriptor of type NPOLICY, which is readily visible in the output of lsof(1) or procexp ..fds. The descriptor does not offer the traditional operations (read(2)/write(2)/ioctl(2)), and only supports select(2), or use in a kqueue. The necp_client_action system call (#502) can be used to specify client actions, as shown in Listing 16-37:

Listing 16-37: The NECP client interface (from XNU-4570's bsd/net/necp.h)

// Following are all #define NECP_CLIENT_ACTION_... (omitted for brevity)
.._ADD                1 // Register a new client. Input: parameters in buffer; Output: client_id
.._REMOVE             2 // Unregister a client. Input: client_id, optional struct ifnet_stats_per_flow
.._COPY_PARAMETERS    3 // Copy client parameters. Input: client_id; Output: parameters in buffer
.._COPY_RESULT        4 // Copy client result. Input: client_id; Output: result in buffer
.._COPY_LIST          5 // Copy all client IDs. Output: struct necp_client_list in buffer
.._REQUEST_NEXUS_INSTANCE    6 // Request a nexus instance from a nexus provider, optional struct necp_stats_bufreq
.._AGENT              7 // Interact with agent. Input: client_id, agent parameters
.._COPY_AGENT         8 // Copy agent content. Input: agent UUID; Output: struct netagent
.._COPY_INTERFACE     9 // Copy interface details. Input: ifindex cast to UUID; Output: struct necp_interface_details
.._SET_STATISTICS     10 // Deprecated
.._COPY_ROUTE_STATISTICS   11 // Get route statistics. Input: client_id; Output: struct necp_stat_counts
.._AGENT_USE          12 // Return the use count and increment the use count. Input/Output: struct necp_agent_use_parameters
.._MAP_SYSCTLS        13 // Get the read-only sysctls memory location. Output: mach_vm_address_t
.._UPDATE_CACHE       14 // Update heuristics and cache
.._CLIENT_UPDATE 15 // Fetch an updated client for push-mode observer. Output: Client id, struct necp_client_observer_update in buffer
.._COPY_UPDATED_RESULT 16 // Copy client result only if changed. Input: client_id; Output: result in buffer

necp_open() is just one of several undocumented system calls, which Apple has added over time as the facility evolves. The system calls are also unexported to user mode, but Listing 16-38 reconstructs the missing header file:

Listing 16-38: The NECP related system calls (as of Darwin 14)

int necp_match_policy(uint8_t *parameters, size_t parameters_size, 
                    struct necp_aggregate_result *returned_result); // #460

// Darwin 16
int necp_open(int flags);  // 501

int necp_client_action(int necp_fd, 
		uint32_t action, 
		uuid_t client_id, 
		size_t client_id_len, 
		uint8_t *buffer, 
		size_t buffer_size); // 502

// Darwin 17
/**
  * requires PRIV_NET_PRIVILEGED_NECP_POLICIES 
  */
int necp_session_open(__unused int flags);  // 522

int necp_session_action(int necp_fd, 
		uint32_t action, 
		uint8_t *in_buffer, 
		size_t in_buffer_length, 
		uint8_t *out_buffer, 
		size_t out_buffer_length);  // 523

Darwin 17 extends the idea of NECP client descriptors, and adds the NECP session (also an NPOLICY file descriptor^*). These descriptors are created with necp_session_open (#522), and support just the close(2) operation (which deletes the associated session). NECP session descriptors are meant to be handled with the proprietary necp_session_action() system call (#523). Using NECP_SESSION_ACTION_* constants passed through the action parameter, which map to the NECP_PACKET_TYPE_POLICY* codes of the control socket, the various actions can be performed, subject to the privilege check.

The public NetworkExtension.framework is a user of NECP sessions, which it abstracts using the undocumented NEPolicySession objective-C object.

* - It's worth mentioning that both NECP file descriptor types are marked in kernel as the same type (DTYPE_NETPOLICY). The potential type confusion was exploited by CVE-2018-4425, before being fixed by Apple in MacOS 14.1.

Network Agents (Darwin 15+)

Darwin 15 introduces a novel networking concept, of network agents. These are user-mode clients to which network flow or other event handling is relayed via triggers. Those agents can then handle the triggers and act upon them, for example making network policy decisions.

Network agents create a PF_SYSTEM/SYSPROTO_CONTROL socket with the com.apple.net.netagent control name. The control is created in a manner identical to Listing 16-7 setting sc_unit to 0 and changing the control name, of course. Once the control socket is connect(2)ed, agents may send and receive messages formatted with a netagent_message_header, defined in bsd/net/network_agent.h along with one of several codes, as shown in Listing 16-39. Note this header is not exported to user mode, as Apple keeps the API private:

Listing 16-39: The netagent_message_header and types (from XNU 4570's bsd/net/network_agent.h)

struct netagent_message_header {
        u_int8_t                message_type;
        u_int8_t                message_flags;
        u_int32_t               message_id;
        u_int32_t               message_error;
        u_int32_t               message_payload_length;
};

#define NETAGENT_MESSAGE_TYPE_REGISTER         1   Pass netagent to set, no return value
#define NETAGENT_MESSAGE_TYPE_UNREGISTER       2   No value, no return value
#define NETAGENT_MESSAGE_TYPE_UPDATE           3   Pass netagent to update, no return value
#define NETAGENT_MESSAGE_TYPE_GET              4   No value, return netagent
#define NETAGENT_MESSAGE_TYPE_TRIGGER          5   Kernel init, no reply expected
#define NETAGENT_MESSAGE_TYPE_ASSERT           6   Deprecated
#define NETAGENT_MESSAGE_TYPE_UNASSERT         7   Deprecated
#define NETAGENT_MESSAGE_TYPE_TRIGGER_ASSERT   8   Kernel init, no reply expected
#define NETAGENT_MESSAGE_TYPE_TRIGGER_UNASSERT 9   Kernel init, no reply expected
// Added in XNU-3789 to support Nexus
#define NETAGENT_MESSAGE_TYPE_REQUEST_NEXUS    10  Kernel init, struct netagent_client_message
#define NETAGENT_MESSAGE_TYPE_ASSIGN_NEXUS     11  Pass struct netagent_assign_nexus_message
#define NETAGENT_MESSAGE_TYPE_CLOSE_NEXUS      12  Kernel init, struct netagent_client_message
#define NETAGENT_MESSAGE_TYPE_CLIENT_TRIGGER   13  Kernel init, struct netagent_client_message
#define NETAGENT_MESSAGE_TYPE_CLIENT_ASSERT    14  Kernel init, struct netagent_client_message
#define NETAGENT_MESSAGE_TYPE_CLIENT_UNASSERT  15  Kernel init, struct netagent_client_message

A new system call in XNU-3248 is netagent_trigger system call (#490), which enables selective wake up of a registered netagent by the caller. The system call takes the agent_uuid, which should match the one the target agent registered with, and the agent_uuidlen (which is fixed at sizeof(uuid_t), i.e. 16). If the target agent allows triggers (registered with NETAGENT_FLAG_USER_ACTIVATED) and is not already active, a NETAGENT_MESSAGE_TYPE_TRIGGER (#5) will be sent to it.

A process may create and register more than one agent (with different UUIDs), and agents may be assigned to different domains (e.g. "WirelessRadioManager", "NetworkExtension") or types (e.g. VPN, Persistent, DNSAgent..). Darwin's configd does so (with several DNSAgents), as do CommCenter, networkserviceproxy, and iOS's nesessionmanager . Other daemons are fine with one agent, e.g. identityserviced, wifid and apsd. Using procexp all fds and filtering for Control Sockets (in a manner similar to Output 16-5) will show all the agents. The sysctl MIBs of net.netagent.[active/registered]_count track the number of agents, and net.netagent.debug may be adjusted to produce verbose logging.

An open source example of creating an agent and handling notifications may be found in configd's open sources - specifically, the files in Plugins/IPMonitor show the creation of both the DNSAgent and the ProxyAgent. The following experiment demonstrates displaying agent details using specialized ioctl(2) codes.

Experiment: Displaying netagents using specialized ioctl(2) codes

The netagent facility provides ioctl(2) codes which can be used to enumerate existing agents (i.e. processes with com.apple.network.agent control sockets). The ioctl(2) codes are SIOCGIFAGENT[LIST/DATA]64, which operate similarly: On first pass, their respective data size arguments must be 0, and in turn they will be filled with the required data size. The caller is expected to allocate a sufficiently large buffer, and then call again. The call pattern is shown in Listing 16-40:

Listing 16-40: Displaying netagents with the SIOCGIFAGENT[LIST/DATA]64 ioctl(2)s

   int s = socket (AF_INET, SOCK_STREAM,0);
   struct netagentlist_req64 nalr64 ;
   nalr64.data_size = 0; // first pass

   int rc = ioctl (s, SIOCGIFAGENTLIST64, &nalr64);

   if (rc < 0) { /* could fail because of entitlements.. */ }

   // nalr64.data_size will be set by previous call
   nalr64.data = malloc(nalr64.data_size);

   rc = ioctl (s, SIOCGIFAGENTLIST64, &nalr64);
   if (rc < 0){ perror ("ioctl"); return (rc);}

   int i = 0;  char  uuid[64];

   for  (i =0 ; i < nalr64.data_size; i+= 16) {
     uuid_unparse(nalr64.data + i, uuid);

     // Get data for this UUID (pass 1)
     struct netagent_req64 nadrq;
     memcpy(nadrq.netagent_uuid, nalr64.data+i, 16);
     nadrq.netagent_data_size = 0;
     rc = ioctl (s, SIOCGIFAGENTDATA64, &datareq);
     if (rc  < 0 ) { perror("SIOCGIFAGENTDATA64"); /* ... */ }

     // Get data for this UUID (pass 2)
     nadrq.netagent_data = malloc (nadrq.netagent_data_size);
     rc = ioctl (s, SIOCGIFAGENTDATA64, &datareq);
     if (rc  < 0 ) { perror("SIOCGIFAGENTDATA64"); /* ... */ }
     printf ("%s: %s (%s/%s) %s\n", uuid, 
      nadrq.netagent_domain, nadrq.netagent_type, nadrq.netagent_desc,
      netagentFlagsToText(nadrq.netagent_flags));
     // print agent-specific data, if nadrq.netagent_data_size > 0 ..

Neither codes nor structures (nor flags, in netagentFlagsToText, above) are provided to user space headers, but it is a simple matter to copy them (from bsd/sys/sockio.h and bsd/net/network_agent.h). Note, that the ioctl(2) codes require NECP entitlements (for system privilege 10004, a.k.a PRIV_NET_PRIVILEGED_NECP_POLICIES). This means they're easier to use on Jailbroken *OS (where code signing is faked and any entitlement can be bestowed) rather than on MacOS, (even with SIP disabled, since self-signed code is disallowed). Output 16-41 shows the Output of a completed program on iOS (with UUIDs truncated since they're random anyway):

Output 16-41: Output from previous listing, on iOS

..EE9: ids501, (clientchannel/IDSNexusAgent ids501 : clientchannel) reg,active,networkprov,nexusprov
..120: Skywalk (FlowSwitch/MultiStack)) reg,active,nexusprov
..BD7: SystemConfig (DNSAgent/DNSAgent(m)-b.e.f.ip6.arpa) reg,active,user activated,
..064: SystemConfig (DNSAgent/DNSAgent(m)-a.e.f.ip6.arpa) reg,active,user activated,
..33F: SystemConfig (DNSAgent/DNSAgent(m)-9.e.f.ip6.arpa) reg,active,user activated,
..FAF: SystemConfig (DNSAgent/DNSAgent(m)-8.e.f.ip6.arpa) reg,active,user activated,
..32E: SystemConfig (DNSAgent/DNSAgent(m)-254.169.in-addr.arpa) reg,active,user activated,
..D11: SystemConfig (DNSAgent/DNSAgent(m)-local) reg,active,user activated,
..F53: NetworkExtension (PathController/PathController: (null)) reg,active,
..9F2: NetworkExtension (PathController/PathController: (null)) reg,active,
..071: NetworkExtension (PathController/PathController: (null)) reg,active,
..A7D: Cellular (Internet/CommCenter: Internet) reg,voluntary,
..2DB: WiFiManager (CallInProgress/WiFi) reg,kernel activated,user activated,voluntary,specific use

SkyWalk

The SkyWalk subsystem is an entirely undocumented networking subsystem in XNU. It provides the interconnection between other networking subsystems, such as bluetooth and user-mode tunnels. Although built-in to XNU, its source remains closed, with only error and debug strings indicating it is implemented in bsd/skywalk, and a couple of in-kernel client side implementations (namely, UTun and IPSec), which were not wiped clean by the preprocessor because of a different #ifdef block. A third implementation exists (bridge) but its source code is redacted. Skywalk's memory subsystem is also laregely self-managed: There are about three dozen skywalk related kernel zones, and the subsystem has its own arena based allocator (similar in concept to the Nanov2 allocator) with caching, which is used for in-kernel, non-blocking packet allocation and other uses.

Please note, that SkyWalk is intentionally redacted out of XNU's sources by Apple, and is still rarely used. Reversing the object structures and APIs paints an incomplete and quite possibly inaccurate picture of its possible use, whether internal to Apple or in some future release of Darwin. The author's understanding and explanation of SkyWalk may therefore differ from Apple's design - but even a partial view of this subsystem is better than none.

Nexuses & Channels

Skywalk makes use of two special object types. A nexus is an endpoint, identified by a UUID, through which data packets can flow, prior to actually getting to an underlying network interface. Nexuses may be created in kernel or user mode, and when used in the latter appear as file descriptors (of DTYPE_NEXUS).

Nexuses are created through the use of Nexus Providers. There are currently four known provider types:

User pipes: are pipes whose provider is in userspace, created directly through the nexus_create system call (#506) or libnetwork.dylib's nw_nexus_create, which internally calls os_nexus_controller_create and os_nexus_controller_register_provider. Examples are identityservicesd's IDSChannelClientNexus[OS] and bluetoothd's com.apple.bluetooth.scalablePipe.
Kernel pipes: are pipes whose provider is in the kernel, usually some kernel extension. An example of that is IOSkywalkBSDClient, calling kern_nexus_controller_create.
Network interfaces: provided by interfaces, such as com.apple.netif.utun*, or com.apple.netif.ipsec*.
Flow Switches: to direct network flows. Can be of subtype bridge (layer II) or multi-stack (layer III). Here,too, examples are com.apple.multistack.utun*, or com.apple.multistack.ipsec*.

Registering a Nexus is a privileged operation. A set of sandboxed-enforced entitlements - com.apple.private.skywalk.register-[flow-switch/net-if/user-pipe] - protects registrtion for each of the corresponding types. Nexuses have one or more channels to provide data flows. Each channel commonly has two rings, one for transmission (tx) and one for reception (rx), each with 128 slots.

Nexuses can interoperate with network agents. The NETAGENT_MESSAGE_TYPE_ [REQUEST/ASSIGN/CLOSE]_NEXUS messages (from Listing 16-39) allow the interoperation, by letting a network agent control nexus creation on demand. You can see both nexuses and network agents in action when using VPN applications: Setting up a VPN connection commonly creates both a net-if (usually, com.apple.netif.utun2) and a multistack flow-switch (com.apple.multistack.utun1) provider.

The ifconfig(8) utility (as of network-cmds 520+, provided in the *OS binpack) can display network agent and nexus details. Output 16-42 demonstrates the nexus enabled (user-mode tunneling) interfaces when a VPN connection is active:

Output 16-42: Using ifconfig(8) to view interface netagent and nexus details

utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 2000 rtref 0 index 7
	eflags=5002080<TXSTART,NOAUTOIPV6LL,ECN_ENABLE,CHANNEL_DRV>
	options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
	inet6 fe80::f686:bb0:335f:2ffc%utun0 prefixlen 64 scopeid 0x7 
        netif: BB5FC293-96DF-4130-80B6-0D45560B199B
	multistack: 52B79C5E-C085-4DE4-8E68-F11609C2B6D1
	nd6 options=201<PERFORMNUD,DAD>
	agent domain:ids501 type:clientchannel flags:0xc3 desc:"IDSNexusAgent ids501 : clientchannel"
	state availability: 0 (true)
	scheduler: FQ_CODEL 
	qosmarking enabled: no mode: none
utun1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1400 rtref 8 index 8
	eflags=5002080<TXSTART,NOAUTOIPV6LL,ECN_ENABLE,CHANNEL_DRV>
	options=6403<RXCSUM,TXCSUM,CHANNEL_IO,PARTIAL_CSUM,ZEROINVERT_CSUM>
	inet 10.47.19.145 --> 10.47.19.145 netmask 0xffffff00 
	netif: B5207F98-A1D5-485B-A197-162EC1F1AFC9
	multistack: 57F3E47D-8A75-4B9A-889A-20D102587FD3
	agent domain:NetworkExtension type:VPN flags:0x3 desc:"VPN: Free VPN"
	agent domain:Persistent type:Persistent flags:0x3 desc:"Persistent interface guidance"
	state availability: 0 (true)
	scheduler: FQ_CODEL 
	effective interface: en0
	qosmarking enabled: no mode: none

System calls and APIs

As with the other skywalk components, the system calls used to handle nexus and channels are purposely left out of XNU's public sources - including even syscalls.master (thus, not even a prototype), which is interesting since other #if blocks in it are still present. Fortunately, the names of the system calls can be gleaned from the user mode header <sys/syscall.h>

Listing 16-43: The undocumented headers for nexus and channel calls

/**
  *  Nexus calls
  */
int __nexus_open(void);		// 503
int __nexus_register(int NexusFD,..., int Flags) ;     // 504
int __nexus_deregister(int NexusFD, ., int Flags);    // 505
int __nexus_create(....,int Flags);    // 506
int __nexus_destroy (int nexusFD, void *, int Flags);	// 507
int __nexus_get_opt(int NexusFD, int Type, void *OptBuf, size_t *Size); // 508
int __nexus_set_opt(int NexusFD, int Type, void *OptBuf, size_t Size);  // 509

/*
 * Channel calls:
 */
int __channel_open  (int NexusFD, Flags);   // 510
int __channel_get_info (int ChannelFD, void *info, int Flags);  // 511
int __channel_sync (....); // 512
int __channel_get_opt(int ChannelFD, int Type, void *OptBuf, size_t *Size); // 513
int __channel_set_opt(int ChannelFD, int Type, void *OptBuf, size_t Size);  // 514

Handling nexuses

Rather than using the system calls directly, libsystem_kernel.dylib provides higher level _os_nexus and _os_channel objects. This API holds provides metadata about the underlying file descriptors (for example, the guard value needed to guarded_close_np a channel, through _os_channel_destroy). An even higher level API can be found in libsystem_network, with its nw_nexus and nw_channel objects (with OS_ prefixed Objective-C objects.

Three objects - os_nexus, os_nexus_attr and os_nexus_controller manage nexuses, and four more - os_channel, os_channel_slot, .._attr and .._packet are used for channels. This way, a nexus can be created directly, through a call to ___nexus_open, but the preferred way is to use os_nexus_controller_create, which also ensures the descriptor is guarded. Once created, a Nexus can be registered directly with a system call (__nexus_register) by using its file descriptor, or through the higher level os_nexus_controller_register_provider. Other calls offered by os_nexus_* APIs are os_nexus_[dis]connect, os_nexus_if[attach/detach], and os_nexus_ns[un]bind, all of which wrap the __nexus_set_opt system call. Unsurprisingly, the os_nexus_* APIs aren't anywhere near documented as well, but Listing 16-44 reconstructs the missing header file:

Listing 16-44: The header file for os_nexus_* APIs

typedef struct os_nexus_controller	*os_nexus_t;
typedef enum { tx_rings  = 0, rx_rings, tx_slots, rx_slots, slot_buf_size, 
  slot_meta_size, anonymous, mhints,  pipes, extensions = 9 } nexus_attr_t;
      
os_nexus_t os_nexus_controller_create(void *attrs);
int os_nexus_controller_get_fd(os_nexus_t);
int os_nexus_controller_register_provider
    (os_nexus_t, char *name, int type, void *, out uuid_t provUUID);

int os_nexus_controller_alloc_provider_instance
     (os_nexus_t, in uuid_t provUUID, out uuid_t provInstance);

int os_nexus_controller_free_provider_instance(os_nexus_t, in uuid_t provInstance);

int os_nexus_attr_set(os_nexus_t, nexus_attr_t, int);

`skywalkctl(8)`

A vital piece of the SkyWalk puzzle is the skywalkctl(8) utility, apparently a debugging tool left in /usr/sbin, but nonetheless even with a partial manual page. The utility is actively maintained by Apple, as can be seen by the increasing number of subcommands in offers in Darwin 18. Of particular interest is the "tree" command, which provides a JSON output of all providers (by reading information from the SkyWalk sysctl(8) interface, described next).

sysctl MIBs

The SkyWalk subsystem outputs its statistics through several MIBs, of which kern.skywalk.nexus_provider_list and kern.skywalk.nexus_channel_list are the most interesting, as they provide detailed information about Nexus providers and channels (as nexus_provider_info_t and nexus_channel_entry_t structures). Accessing these MIBs requires the com.apple.private.skywalk.observe-all entitlement, enforced by a mac_priv_check_hook (from Sandbox.kext) for the undocumented 12010 privilege. Even the other, more basic Nexus statistics have an entitlement associated with them, com.apple.private.skywalk.observe-stats (the undocumented 12011). There are additional privileges (12000-12003), all nexus related but undocumented (and, for lack of source, nameless), all of which depend on the skywalk entitlements. The *OS binutils' sysctl(8) is properly entitled, as is the aforementioned skywalkctl(1), which can decipher the opaque MIBs into a human readable form.

Obtaining information about a particular channel file descriptor can be achieved through proc_pidfdinfo with the undocumented PROC_PIDFDCHANNELINFO (10). This returns a channel_fdinfo containing the channel type, UUID, port and flags.

Review Questions

What is the difference in operation between PF_NDRV packet capture capabilities and those of BPF?

What is the advantage of using User Mode Tunneling (the utun## facility) for VPN?

What is a specific advantage of using MPTCP, in particular for bandwidth intensive applications like FaceTime?

By peeking into Apple's NetworkExtension.framework, how do NECP and Nexuses provide underlying support for the public exposed classes?

What is the benefit of using a DNS agent?

What is a good reason to protect skywalk's considerable amount of code with entitlements, even for non-sensitive operations such as statistics?

How do network agents and nexuses interoperate?

What do utun and ipsec both have in common, which merits them using a nexus?

References

"Improving Network Reliability Using Multipath TCP" - https://developer.apple.com/documentation/foundation/nsurlsessionconfiguration/ improving_network_reliability_using_multipath_tcp
Apple Developer - QA1776 - https://developer.apple.com/library/archive/qa/qa1176/_index.html
NewOSXBook.com - "NetBottom.c" - http://newosxbook.com/src.jl?tree=listings&file=netbottom.c
Apple - HT201642 (The Application Level Firewall) -
https://support.apple.com/en-us/HT201642
Open BSD Manual Pages - pf(4) - https://man.openbsd.org/pf.4
McCanne & Van Jacobson - "The BSD Packet Filter" - https://www.usenix.org/legacy/publications/library/proceedings/sd93/mccanne.pdf
Luigi Rizzo - "Dummynet, Revisited" - https://www.researchgate.net/publication/220194992_Dummynet_Revisited

This was the complete 16^th chapter from *OS Internals, Volume I (in its v1.2 update) It's free, but please respect the copyright and immense amounts of research devoted to creating it. If any of this is useful, please cite using the original link. You might also want to consider getting the book, or Checking out Tg's training

Нет-Work: Darwin Networking

Darwin Extensions of the BSD Socket APIs

PF_NDRV

PF_SYSTEM

SYSPROTO_EVENT

SYSPROTO_CONTROL

Proprietary socket system calls

pid_shutdown_sockets (#436)

socket_delegate (#450)

[dis]connectx and [send/recv]msg_x (#447-8, #480-1)

peeloff (#449)

Interfaces

Interface Configuration

Case Study: rvi

Networking Configuration

IPv4 configuration

IPv6 configration

IPSec (6) Configuration

ICMPv6 Configuration

TCP configuration

MPTCP configuration

UDP configuration

ICMP configuration

Networking Statistics

sysctl MIBs

com.apple.network.statistics

/var/networkd/netusage.sqlite

Firewalling

MacOS: The Application Layer Firewall

ipfw (Deprecated)

pf

PacketFilter.framework

Packet Capture

BPF

BPF Programs

Pseudo-Interfaces

Quality of Service

Network Link Conditioning

Network Extension Control Policies (Darwin 14+)

NECP descriptors and clients

Network Agents (Darwin 15+)

SkyWalk

Nexuses & Channels

System calls and APIs

Handling nexuses

skywalkctl(8)

sysctl MIBs

Review Questions

References

`PF_NDRV`

`PF_SYSTEM`

`SYSPROTO_EVENT`

`SYSPROTO_CONTROL`

`pid_shutdown_sockets` (#436)

`socket_delegate` (#450)

`[dis]connectx` and `[send/recv]msg_x` (#447-8, #480-1)

`peeloff` (#449)

Case Study: `rvi`

`sysctl` MIBs

`com.apple.network.statistics`

`ipfw` (Deprecated)

`PacketFilter.framework`

`skywalkctl(8)`