DYLD Detailed
Jonathan Levin, http://newosxbook.com/ - 8/12/13
1. About
While maintaining and adding more functionality to JTool, I found myself deeply bogged down in implementing support for Mach-O's LINKEDIT sections, LC_SYMTAB, and other arcane and relatively undocumented corners of DYLD. Add to that, DYLD has been relatively skimmed in my book *, and not much in that of my predecessor. Scouring the Internet with Google finds only one decent reference1, though it's woefully incomplete and basically just rehashes stuff from the book. Needless to say Apple makes no effort to provide documentation outside its "Mach-O Programming Topics"2 document, which is by now very dated. What better way, then, to right a wrong and shed some light on it, than an article?
Why should you care? (Target Audience)
I said so in the book, and I'll state it again - There is no knowledge that is not power, and in the case of linking - we're talking about a lot of power. Virtually every binary run in OS X or iOS is dynamically linked, and being able to intervene in the linking process bestows significant capabilities - function interception, auditing and hooking, being the most important ones. Reverse engineers, security-oriented developers (i.e. Anti-Malware) and hackers will hopefully find this information very useful. It should be noted that dyld allows for hooking and interception via environment variables - most notably DYLD_INSERT_LIBRARIES (akin to ld's LD_PRELOAD) and DYLD_LIBRARY_PATH (like ld's LD_LIBRARY_PATH), and its function interposing mechanism. These are covered in the book (somewhere in Chapter 4, with a demo on this website3), and are therefore not discussed in this document.
Prerequisite: About Linking
Nearly all binaries, in UN*X and Windows systems alike, are dynamically linked. The benefits of dynamic linking are many, and include:- Code reuse: commonly used code can be extracted to a library, which is then shared by many clients
- Easy updating: code residing in a library can easily be updated, and the library replaced, so long as the symbols are by and large the same. A classic example of this can be seen in Windows' "CreateWindow", which creates totally different-looking windows for the same application throughout Windows versions (think Win95 vs. XP vs. 7-8). The developer merely says "CreateWindow", not knowing how the window gets created. The OS does the rest, and different versions of the OS may do so differently.
- Reducing disk usage: as commonly used code now has only one copy, as opposed to having to include the code in every single binary which uses it.
- Reducing RAM usage: is by far, the most important advantage: A single copy of the library may be mmap(2)-ed into all processes, thereby only getting hit by the library's RAM usage once. The library code is usually marked r-x (read only, executable), so the same physical copy is implicitly shared by many consumers. This is crucial and saves immense amounts of memory, especially in RAM-challenged systems like Android.
Nomenclature
Throughout this article, the following terms are used:
- dylib: A dynamic library. Akin to a UN*X shared object. A Mach-O object of type MH_DYLIB (0x6), loaded into other executables by the LC_LOAD_DYLIB (0xc) Mach-O command or the dlopen(3) API. For the record, it's worth noting that OS X also supports the concept of a fixed library (A Mach-o object of type MH_FVMLIB (0x3) loaded into other executables by the LC_LOADFVMLIB (0x6) command. Fixed libraries, however, are virtually extinct.
- symbol: A variable or function in a Mach-O file which may or may not be visible outside that file.
- binding: Connecting a symbol reference to its address in memory. Binding may be load-time, lazy (deferred) or (missing/overridable). These can be controlled at compile time: ld's -bind_at_load specifies load-time binding, and __attribute((weak_import)) for weak symbols. There is also an option to prebind libraries to fixed addresses (-prebind switch of ld)
Tools:
Apple provides otool(1), dyldinfo(1) and pagestuff(1) - if you have Xcode. If you don't, or - if you want to analyze Mach-O binaries on Linux - you are welcome to use JTool instead (http://www.newosxbook.com/files/jtool.tar). This is an all-in-one replacement for the above tools, with far more capable features, including an experimental disassembler. The tar file contains an OS X and iOS version bundled into one universal binary, as well as an ELF version (for Linux 64-bit). It's free to download and use, and will remain so.In the outputs shown, I've color coded: white is what you should type. yellow is for my own annotations. Everything else is verbatim the output of the commands.
Calling external functions
If you disassemble any Mach-O dynamically linked binary, you will no doubt see, sooner or later, a call to an external function, supplied by some library (commonly, libSystem.B.dylib). These calls are implemented as calls to the Mach-O's symbol stub section. Consider the following example, from OS X's /bin/ls:
Following on the experiment from page 116**, If you have gdb or lldb (as of Xcode 5), you can use either to examine the contents of this "stub" section: The book goes on (till page 121) to explain how DYLD manages the stubs, and populates them with the actual addresses of the functions, using dyld_stub_binder. It does not, however, explain HOW that's done. This is what we'll discuss here. But before we do, a bit about LINKEDIT:
DYLD_INFO and LINKEDIT
Starting with OS X 10.5 or 10.6, Apple decided to implement a special segment in Mach-O files for DYLD's usage. This segment, traditionally called __LINKEDIT, consists of information used by DYLD in the process of linking and binding symbols. This section is (for the most part) meaningful only to DYLD - the kernel is completely oblivious to its presence. DYLD relies on a special load command, DYLD_INFO, to serve as a "table of contents" for the segment. This can be seen with otool(1) or jtool: Jtool contains a useful option, --pages, which presents a mapping of the Mach-O regions (segments, sections, and load command data), somewhat similar to (but more detailed than) pagestuff(1). This can be used, among other things, to dump the contents of __LINKEDIT: As can be seen from the above output, the general layout of the __LINKEDIT is as follows:Indexed by LC_DYLD_INFO | Rebase Info | Image rebase info - contains rebasing opcodes |
Bind Info | Image symbol binding info for required import symbols | |
Lazy Bind Info | Image symbol binding info for lazy import symbols. This will be 0 for binaries compiled with ld's -bind_at_load | |
Weak Bind Info | Image symbol binding info for weak import symbols | |
Export Info | Image symbol binding info for symbols exported by this image | |
Pointed to by LC_SEGMENT_SPLIT_INFO | Segment Split, if any | Segment split information |
Pointed to by LC_FUNCTION_STARTS | Function start information | Function start point information (ULEB128) |
Pointed to by LC_DATA_IN_CODE | Data regions in code | Data region information (ULEB128) |
Pointed to by LC_CODE_SIGN_DRS | Code Signing DRs | Code signing DRs of dependent dylibs |
Pointed to by LC_SYMTAB | Symbol Table | Table of symbols, in nlist format |
Pointed to by LC_DYSYMTAB | Indirect Symbol Table | Table of indirect symbols |
String Table | Array of symbol names | |
Pointed to by LC_CODE_SIGNATURE | Code Signature | Code Signing blob (discussed in a future article) |
DYLD makes extensive use of the ULEB128 encoding, which is (in the author's humble opinion) a crude and stingy encoding method. Low level implementors would be wide to familiarize themselves with the encoding, which is also used in DWARF and other binary-related formats.
DYLD OpCodes
DYLD uses a special encoding - consisting of various "opcodes" - to store and load symbol binding information. These opcodes are used to populate the rebase information and binding tables pointed to by the LC_DYLD_INFO command. There are two types of opcodes: Rebasing opcodes and Binding opcodes.Binding opcodes
Binding opcodes (used for both lazy and non-lazy symbols) are defined inDONE | 0x00 | End of opcode list |
SET_DYLIB_ORDINAL_IMM | 0x10 | Set dylib ordinal to immediate (lower 4-bits). Used for ordinal numbers from 0-15 |
SET_DYLIB_ORDINAL_ULEB | 0x20 | Set dylib ordinal to following ULEB128 encoding. Used for ordinal numbers from 16+ |
SET_DYLIB_SPECIAL_IMM | 0x30 | Set dylib ordinal, with 0 or negative number as immediate. the value is sign extended. Currently known values are:
|
SET_SYMBOL_TRAILING_FLAGS_IMM | 0x40 | Set the following symbol (NULL-terminated char[]). The flags (in the immediate value) can be either BIND_SYMBOL_FLAGS_WEAK_IMPORT(0) or BIND_SYMBOL_FLAGS_NON_WEAK_DEFINITION(8). |
SET_TYPE_IMM | 0x50 | Set the type to immediate (lower 4-bits). Known types are:
|
SET_ADDEND_SLEG | 0x60 | Set the addend field to the following SLEB128 encoding. |
SET_SEGMENT_AND_OFFSET_ULEB | 0x70 | Set Segment to immediate value, and address to the following SLEB128 encoding |
ADD_ADDR_ULEB | 0x80 | Set the address field to the following SLEB128 encoding. |
DO_BIND | 0x90 | Perform binding of current table row |
DO_BIND_ADD_ADDR_ULEB | 0xA0 | Perform binding, also add following ULEB128 as address |
DO_BIND_ADD_ADDR_IMM_SCALED | 0xB0 | Perform binding, also add immediate (lower 4-bits) using scaling |
DO_BIND_ADD_ADDR_ULEB_TIMES_SKIPPING_ULEB | 0xC0 | Perform binding for several symbols (as following ULEB128), and skip several bytes (as the ULEB128 which follows next). Rare. |
Each opcode is specified in the topmost 4-bits (e.g. BIND_OPCODE_MASK (0xF0) in
bind information: segment section address type addend dylib symbol lazy binding information (from lazy_bind part of dyld info): segment section address index dylibFor example, consider the following output from jtool (or dyldinfo) -opcodes, annotated: The opcodes are used by our special friend, dyld_stub_binder, as we discuss later. But before we can get to it, we have to make another segue to explain the two types of symbol tables in Mach-O.
Symbol Tables
The Symbol Table (LC_SYMTAB)
The Symbol Table in a Mach-O file is described in an LC_SYMTAB command. This command is defined in
struct symtab_command {
uint32_t cmd; /* LC_SYMTAB */
uint32_t cmdsize; /* sizeof(struct symtab_command) */
uint32_t symoff; /* symbol table offset */
uint32_t nsyms; /* number of symbol table entries */
uint32_t stroff; /* string table offset */
uint32_t strsize; /* string table size in bytes */
};
This can be seen with jtool, using -l:
LC 05: LC_SYMTAB Symbol table is at offset 0x66c4, with 83 entries
String table is at offset 6e68, 968 bytes
The symbol table itself is an array of nsyms entries, each a struct nlist or struct nlist_64 - depending on the file type (MH_MAGIC or MH_MAGIC_64, respectively). The nlist structures follow the BSD format, with some minor modifications. The String Table is nothing more than an array of NULL-terminated strings, which follow one another
The Indirect Symbol Table (LC_DYSYMTAB)
The Indirect Symbol Table in a Mach-O file is described in an LC_DYSYMTAB command. This command details (among other things) the offset of this table, and the number of symbols it contains. This can be seen with otool (or jtool) -l, as follows:
..
LC 06: LC_DYSYMTAB
1 local symbols at index 0
1 external symbols at index 1
81 undefined symbols at index 2
No TOC
No modtab
157 Indirect symbols at offset 0x6bf4
..
The indirect symbol table is, in fact, nothing more than an array of indices into the main symbol table (the one pointed to by LC_SYMTAB). Dumping the indirect symbol table is straightforward with jtool, by specifying an offset (or address) inside the table:
The indirect symbol table is used with two specific Mach-O sections - the __DATA.__nl_symbol_ptr, and __DATA.__lazy_symbol. We discuss these next.
__DATA.__nl_symbol_ptr and __DATA.__lazy_symbol
The __DATA.__nl_symbol_ptr section contains the "non-lazy" symbol pointers. Recall, that binding of symbols can be performed either at load time, or on first use. The "non lazy" pointers are those which must be bound at load time (that is, if binding is unsuccessful, the binary will fail to load). The name of the section is somewhat of a convention, but it is the section type (0x06 - S_NON_LAZY_SYMBOL_POINTERS) which defines its contents. As for the section contents, they are detailed in <mach-o/loader.h> as follows:
/*
* For the two types of symbol pointers sections and the symbol stubs section
* they have indirect symbol table entries. For each of the entries in the
* section the indirect symbol table entries, in corresponding order in the
* indirect symbol table, start at the index stored in the reserved1 field
* of the section structure. Since the indirect symbol table entries
* correspond to the entries in the section the number of indirect symbol table
* entries is inferred from the size of the section divided by the size of the
* entries in the section. For symbol pointers sections the size of the entries
* in the section is 4 bytes and for symbol stubs sections the byte size of the
* stubs is stored in the reserved2 field of the section structure.
*/
#define S_NON_LAZY_SYMBOL_POINTERS 0x6 /* section with only non-lazy
symbol pointers */
#define S_LAZY_SYMBOL_POINTERS 0x7 /* section with only lazy symbol
pointers */
#define S_SYMBOL_STUBS 0x8 /* section with only symbol
stubs, byte size of stub in
the reserved2 field */
It is worth mentioning that __nl_symbol_ptr is not the only "non-lazy" section: The binary's Global Offset Table (GOT) is in its own section, __DATA.__GOT, similarly marked with S_NON_LAZY_SYMBOL_POINTERS. It's also noteworthy that only one of these values is held in the section's flags field (which erroneously implies these are bit-flags - they are not, but there are some higher bit flags which may be or'ed with these values).
The __DATA.__lazy_symbol section contains lazy symbols. These are symbols which will be bound on first use. The code to do so is in an additional section, referred to as the symbol stubs. The "stubs" consist of boilerplate code, which is naturally architecture dependent. Apple Developer's "OS X Assembler Reference"4 details this well, but unfortunately only for the deprecated PowerPC architecture. JTool's disassembler is almost fully functional for ARM (but still very partial for x86_64). We therefore show the ARMv7 (iOS) case next.
dyld_stub_binder and _helper (in iOS)
Stub resolution in iOS and OS X is practically the same. The __TEXT.__stub_helper contains a single function, which sets up a call to the dyld_stub_binder according to the value pointed to by R12, a.k.a the Intra-Procedural register***. The other entries in stub_helper are trampolines to this function, each setting up R12 to hold the value of the indirect symbol table entry corresponding to the function to be bound. This is shown in the annotated jtool disassembly of ScreenShotr (the screen capture utility used by Xcode, from iOS's DeveloperDiskImage.dmg), below: dyld_stub_binder is exported by libSystem.B.dylib, though in actuality it is a re-export from /usr/lib/system/libdyld.dylib. Using Jtool again, we can see: Jtool's disassembly is corroborated by DYLD's source, which surprisingly enough contains an #if __arm__ statement for iOS 5 which Apple has not removed. If you're following with x86_64 (e.g. with /bin/ls), the 0x100004040 from the lldb example is the trampoline to dyld_stub_binder. In other words, the code will look something like this when you break on 0x100004040:Hopefully, this fills in the missing pieces, showing you not just what symbols are bound, but HOW they are bound. I hope to provide more information about LINKEDIT (specifically, the juicy parts of codesigning. You are always welcome to go online at the Book Forum and comment, ask questions, etc.
References:
- MikeAsh.com, article on DYLD by G. Raskind
- Apple Developer - Mach-O Programming Topics
- Source code of DYLD Interpose example from the book
- Apple Developer - OS X Assembler Reference
- http://opensource.apple.com/source/dyld/dyld-210.2.3/src/dyld_stub_binder.s - Source of DYLD's stub_binder, for both x86_64 and ARM
Footnotes
* (something I heard several times already by now as a criticism is a "lack of detail" - considering that Wiley restricted the book originally to 500 pages, I'm very lucky to have been able to extend it to the 800 pages it is - but some things just had to be left out, folks.. which is why I'm providing lots of extra content on the website..)** - While we're on the subject, there's a typo in page 116 (should be "using Xcode's dyldinfo(1) or nm(1). One of the all too many omissions and editorial mistakes inserted, ironically, by the copy editor. Incidentally, nm(1) only shows the symbols, not where they are located. You might want to try jtool's -S feature (cloning nm(1)) with -v.
*** - This is a register which the ARM ABI allows for use in between functions/procedures.