Page 1 of 1

Mach-O ARM64e pointers on disk

PostPosted: Mon Sep 02, 2019 6:34 pm
by LOLgrep
Hey J & community:

For masochistic fun, I'm implementing my own nm tool that doubles as an Objective-C and Swift class dump tool (using Swift's reflection APIs). Everything is going quite well but it's clear that I don't fully understand the ARM64e pointers on disk when introspecting a cpusubtype CPU_SUBTYPE_ARM64E executable.

When dumping on disk pointers in an ARM64e cpusubtype, it's immediately clear there's pointers for data and code:

I'll often see DATA pointers look like the following:
Code: Select all
0x0008000100458020 // ex1 data

Note the (1UL << 51) in the above example OR'd with a "normal looking MH_EXECUTE" virtual address

In code pointers, it's a bit more funky with the following examples I've pulled from disk code pointers:

Code: Select all
0x802100000000272c // ex1 code
0x80090000003a6bc0 // ex2 code
0x80110000001e95bc // ex3 code

So my questions are:
    *Can you give me a procedural way to determine if an ARM64e on disk pointer is a DATA pointer or a CODE pointer via a certain bit(s)?

    *How do you resolve the Code pointer to a virtual address? I have 2 theories on how you did it in jtool2, J. For example, you resolve 0x802100000000272c to 0x10000272c. My theory1 is you are sliding a bit down (unlikely). My theory2 is you apply a mask to the lower 32 bits, get the file offset 0x272c and determine the virtual address of the file offset based on the Mach-O load commands.

    *Where did find this information you are telling me for ARM64e disk pointers? I can find a bunch of info for ARM64e memory pointers, but can't find any resources for ARM64e disk pointers.

    Do you have any other insights to give about the differences between CODE and DATA pointers that you think I could have overlooked?

Thank you J and community. Y'all are amazing

Re: Mach-O ARM64e pointers on disk

PostPosted: Wed Sep 04, 2019 6:14 pm
by morpheus
So, I ran into this with jtool2 - ARM64e pointers in __DATA.__got, __auth_got etc are indeed tagged in this way. The tagging is always in the 16 most significant bits, and you can get the bits from the latest dyld sources. The mask I use to detect it changes between kernel and user, but in user mode it's generally (topmost bits !=0 && bits 48-31 all 0)