Demystifying the DMG File Format
Jonathan Levin, http://newosxbook.com/ - 6/12/13
1. About
As part of writing HFSleuth, a "bonus" tool for my book, I decided to implement DMG (disk image support). I realized, however, that the DMG file format (being Apple proprietary) was woefully undocumented. I briefly mention DMGs (pages 589-590), but due to the page constraints of an already large book, I had failed to delve into their format sufficiently. This article, therefore, is an attempt to rectify that shortcoming. The DMG file format has been painstakingly reverse-engineered by several[1,2], and this article/addendum aims to consolidate their hard work into a single document. HFSleuth can operate fully on all known DMG types (to date), and can serve as a complementary tool to Apple's hdiutil(1), or - as it is POSIX portable - even as a replacement for it, on non OS X systems. When set to verbose mode, HFSleuth also provides step by step information as it processes DMGs, and is used in the examples below.2. The Disk Image file format
The first noteable fact about the DMG file format is, that there is no DMG file format. DMGs come in a variety of sub-formats, corresponding to the different tools which create them, and their compression schemes. The common denominator of most of these is the existence of a 512-byte trailer at the end of the file. This trailer is identifiable by a magic 32-bit value, 0x6B6F6C79, which is "koly" in ASCII. As other references to this trailer call it the "koly" block, we can do the same. Note, that "most" is not "all": images created with hdiutil(1), for example, can simply be raw dd(1)-like images of the disk layout, with no metadata. In those cases, however, there is nothing special or noteworthy about the file, which can be read as any disk would, by its partition table (commonly APM, or GPT). Images created with the DiscRecording.Framework contain the koly block. The koly block, when present, is formatted according to the following:
Note: All fields in the koly block (and, in fact, elsewhere in the DMG format) are in big endian ordering. This is to preserve compatibility with older generations of OS X, which were PPC-based. This requires DMG implementations to use macros such as be##_to_cpu (16, 32, and 64).
The most important elements in the koly block are the fields pointing to the XML plist: This property list, embedded elsewhere in the DMG, contains the DMG block map table. Commonly, the plist is placed in the blocks leading up to the koly block, which fits the simple algorithm to create a DMG: First compress the image blocks, then place the XML plist, and finalize with the koly block. This is shown in figure 1:
The Data Fork: Disk blocks, compressed in various ways ... |
XML property list (variable) |
koly trailer (512 bytes) |
This method of creating DMGs also explains why commands such as "file" have a hard time identifying the DMG file type: In the absence of a fixed header, a DMG can start with any type of data (disk or partition headers), which can be further compressed by myriad means. DMG files compressed with BZlib, for example, start with a BZ2 header. They cannot be opened with bunzip2, however, since compression methods are intermingled, and bunzip2 will discard blocks which do not start with a bz2 header.
DMGs compressed with zlib often incorrectly appear as "VAX COFF", due to the zlib header. The XML Property list (which is uncompressed and easily viewable by seeking to the DOCTYPE declaration using more(1) or using tail(1)) is technically the resource fork of the DMG. The property list file contains, at a minimum, a "blkx" key, though it may contain other key/values, most commonly "plst", and sometimes a service level agreement (SLA) which will be displayed by the OS (specifically, /System/Library/PrivateFrameworks/DiskImages.framework/Versions/A/Resources/DiskImages UI Agent.app/Contents/MacOS/DiskImages UI Agent) as a pre-requisite to attaching the DMG*. Due to XML parser restrictions, data in the property list is 7-bit. This forces all binary (8-bit) data to be encoded using Base-64 encoding (a wiser choice would have been using CDATA blocks). The output of such a property list is shown below: A detailed discussion of both APM and GPT can be found in chapter 15 of the book[3], as well as Apple's notes on APM[4] and GPT[5]. What makes the blxx data useful, however, is that it allows an implementation to skip past the partition table data, and isolate the partition of interest directly from the DMG. The "data" in the blxx header is a structure, which (like its sibling, koly) is also identifiable by a fixed signature - in this case "mish". In Base-64 this encodes as "bWlza", which is readily evident in the previous listing. The mish block is formatted like this: In other words, for each entry, the chunk of SectorCount sectors, starting at SectorNumber are stored at CompressedLength bytes, at offset CompressedOffset in the data fork. When expanded, each such chunk will take SectorCount * SECTOR_SIZE bytes. Each chunk of blocks in a given entry is stored using the same compression, but different entries can contain different compression methods.
Question:
What are two advantages of breaking the image into block chunks, as described above? (Answer at end of document)
The various block chunk entry types are shown below:
Type | Scheme | Meaning |
---|---|---|
0x00000000 | --- | Zero-Fill |
0x00000001 | UDRW/UDRO | RAW or NULL compression (uncompressed) |
0x00000002 | --- | Ignored/unknown |
0x80000004 | UDCO | Apple Data Compression (ADC) |
0x80000005 | UDZO | zLib data compression |
0x80000006 | UDBZ | bz2lib data compression |
0x7ffffffe | --- | No blocks - Comment: +beg and +end |
0xffffffff | --- | No blocks - Identifies last blxx entry |
3. Mounting DMGs
DMGs can be mounted, just like any other file system, though technically this is what is known as a "loopback" mount (i.e. a mount backed by a local file, rather than a device file). To mount a DMG, the system uses the DiskImages kernel extension (KExt), also known as the IOHDIXController.kext. This is clearly visible in both OS X and iOS, using kextstat (or jkextstat, in the latter): The kext is provided with a number of "PlugIn" kexts, namely:- AppleDiskImagesCryptoEncoding.kext
- AppleDiskImagesKernelBacked.kext
- AppleDiskImagesReadWriteDiskImage.kext - for UDRO/UDRW
- AppleDiskImagesFileBackingStore.kext
- AppleDiskImagesPartitionBackingStore.kext - Uses the Apple GUID 444D4700-0000-11AA-AA11-00306543ECAC
- AppleDiskImagesSparseDiskImage.kext - for UDSP
- AppleDiskImagesHTTPBackingStore.kext - Allows DMGs to reside on a remote HTTP server. Uses a "KDISocket" with HTTP/1.1 partial GETs (206) to get the chunks it needs from a DMG
- AppleDiskImagesRAMBackingStore.kext
- AppleDiskImagesUDIFDiskImage.kext
- hdik-unique-identifier - A UUID created by the caller (e.g. CFUUIDCreate())
- image-path - the path to the DMG in question
#0 0x00007fff8cfd4c0d in mach_msg () # Actual message passer
#1 0x00007fff887e3fbc in io_connect_method () # I/O Kit internal connect
#2 0x00007fff887978ea in IOConnectCallMethod () # I/O kit connector, generic argument
#3 0x00007fff88797ae8 in IOConnectCallStructMethod () # I/O Kit connector, with structure argument
#4 0x00007fff86e5b79f in DI_kextDriveGetRequest () # DiskImages framework function
Commands and support
Apple provides extensive support for DMGs, which is only natural given their role in everything, from aspects of OS installation to software distribution. The DMG support is provided by the DiskImages project, which contains both the user mode (hdid, hdiutil) and kernel mode (kexts) required for operation. Lamentably, Apple keeps this as one of the non-open source projects in Darwin.- hdid
- hdiutil
- DiskImages.framework - The private framework lending support to both the above tools, communicating with the KExts (below), as well as the user mode helper processes for mounting images (diskimages-helper and hdiejectd)
- IOHDIXController.kext
- Optimize compression for type of data: For example, discard blocks of zeros rather than compressing them, or even leaving data uncompressed
- Allow an implementation to selectively decompress chunks, rather than the whole image, which may take a lot of filesystem space and/or memory (especially in kernel-mode).