[译]反-反汇编 & 混淆 #1：苹果没有遵循自己制定的Mach-O规范？

浏览数：32 / 时间：2015年06月11日

原文地址：http://reverse.put.as/2012/02/02/anti-disassembly-obfuscation-1-apple-doesnt-follow-their-own-mach-o-specifications/

当想到这个特性时，我非常高兴！因为我喜欢突破束缚，并且写了一个CrackMe来展示这个有趣的特性。

产生问题的原因是：苹果没有遵循自己的文档与标准（Mach-O方面的），但是逆向工具却要遵循。

当逆向修改过Section信息的Mach-O文件时，IDA可能会崩溃、输出错误的反汇编结果、混乱的字符串，

LLDB输出错误的反汇编结果（不是GDB），class-dump 会失败，逆向工程师看到是无意义的Mach-O文件头。

最后，这是一个有趣的混淆手段。^_^

当你使用IDA加载CrackMe时，程序会报这样的错误：负的Section大小或偏移。

当Sections的信息（偏移或者大小）超过文件大小时，otool也会输出错误的结果。

造成这个问题的具体方法是：修改Mach-O的Section信息。在32位下，Section的结构如下：

struct section { /* for 32-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint32_t	addr;		/* memory address of this section */
	uint32_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
};

让我们从最容易引起问题的offset字段说起。根据标准offset的定义如下：指示当前Section在文件中的偏移值。

我的理解是：这个字段用来指示代码或者数据在文件的位置。这么理解没错吧？

既然是一个偏移值，那么理论上Section是没有必要是按照顺序排列的或者按照指定的循序排列（主要是指：没有必要跟Section在Segment中顺序一致）。这就打开了错误之门。

如果我们将offset指向其他地址？比如：IDA需要根据offset指向的地址来读取相应的数据。

我们来做一个测试，修改cstring setion的偏移值，然后使用IDA加载修改后的文件。

喔，现在程序中的字符串被“混淆”了，因为IDA加载了错误的数据。

很有意思，是吗？如果你修改Section信息（将offset改成一个错误的值），然后运行对应的程序，程序的行为还完全正确！

同样，修改 text section的便宜后，程序的中指令应该都错了，但是程序还是可以正常运行。

为什么程序还可以正确运行？这是非常有趣的。我认为主要的原因是：内核只是将文件线性的加载到内存而忽略了offset。

《Mac OS X Internal》812页中对execve()系统调用的说明可以解释问题原因。

exec_mach_imgact()函数（bsd/kern/kern_exec.c）会调用load_machfile()函数，

后者主要用来加载可执行文件，处理具体的Mach-O加载命令等。代码片段如下：

@bsd/kern/kern_exec.c

        /*
         * Actually load the image file we previously decided to load.
         */
        lret = load_machfile(imgp, mach_header, thread, map, &amp;load_result);

在load_machfile()内部会调用parse_machfile()函数来解析文件，

@bsd/kern/mach_loader.c

        lret = parse_machfile(vp, map, thread, header, file_offset, macho_size,
                              0, result);

在这里我们可以看到有趣的注释：

/*
 * The file size of a mach-o file is limited to 32 bits; this is because
 * this is the limit on the kalloc() of enough bytes for a mach_header and
 * the contents of its sizeofcmds, which is currently constrained to 32
 * bits in the file format itself.  We read into the kernel buffer the
 * commands section, and then parse it in order to parse the mach-o file
 * format load_command segment(s).  We are only interested in a subset of
 * the total set of possible commands.
 */

在实现的下部，我们可以看到处理所有command的循环，其中section command是在segment（LC_SEGMENT/LC_SEGMENT_64）command下处理的。

因为我们需要看下load_segment()的实现。

在load_segment()内，我们发现对于可执行文件合法性的验证只是做到了segment一层，并没有验证section。

这也造成我们没法混淆segment :-))。

当parse_machfile()函数返回时，所有的解析工作已经完成，链接的库被加载，程序的入口函数被调用。

程序的布局与其在文件系统中一致（这就是我前面所说的线性），并且section信息根本没有被使用。

这是一种隐性的约定：可执行文件的格式是正确的。

这种行为（指内核加载可执行文件）正确吗？我认为是错误的。因为内核并没有遵循Mach-O标准，或者是我对标准理解有错误？

这又是一个信任不可信数据的例子，我们应该显式的校验输入数据。

我们应该继续了解真个加载过程，在CrackMe中还有另一个有趣的特性;-)。

我们还可以改变这些section结构的这些字段：flags，size， section和segment的名字，section 的顺序。

这样可以迷惑工具和逆向工程师。这里需要注意的是跟内核遵循同样的隐式约定，忽略如上的字段。