利用符号执行破解Kryptonite混淆：静态分析方法详解

引言

Kryptonite是一个我在LLVM中间表示层构建的代码混淆概念验证工具。其核心思想是使用语义保持的转换，以确保不破坏原始程序。例如，主要思路之一是构建一个自制的32位加法器来替代LLVM的add指令。这样，在流水线末端生成的将不是单个汇编指令，而是一大堆仅执行加法操作的汇编代码。如果你从未读过我的文章，但对此感兴趣，可以查看：钢铁混淆：遇见我的Kryptonite。

在这篇文章中，我想展示如何通过符号执行来破解这种混淆。我们将使用IDAPython编写一个非常小的符号执行引擎，并利用Z3Py来简化所有方程。请注意，我的朋友@elvanderb使用了类似的方法（虽然更通用）来简化破解挑战的某些部分；但他不想公开发布，所以这就是我的博客文章！

目标

在这篇博客文章中，我们首先将处理由llvm-cpp-frontend-home-made-32bits-adder.cpp生成的LLVM代码。简而言之，该代码使用LLVM前端API在LLVM中间语言中发出一个自制的32位加法器。然后，你可以直接将输出提供给clang，为你的平台生成一个可执行二进制文件；这里我选择仅针对x86平台工作。我也将二进制文件上传到了这里：adder。

如果你在IDA中打开生成的二进制文件，你会看到一个看似无穷无尽的例程，仅执行加法操作。乍一看，这确实有点吓人：

每条指令似乎都很重要，没有垃圾代码
似乎只使用了二进制操作：加法、左移、右移、异或、与等
这也是一个包含两千条指令的例程

这篇文章的想法是编写一个非常基础的符号执行引擎，以查看在例程结束时EAX寄存器将保存什么样的结果。希望我们能获得高度简化且更易读的内容，而不是这一大堆汇编代码！

符号执行引擎方法

但实际上，这段代码让我们很容易编写符号执行引擎。以下是主要原因：

没有分支，没有循环，完美
指令不操作EFLAGS寄存器
指令仅使用32位寄存器（不是16位或8位）
唯一指令的数量非常少：mov、shr、shl、xor、and、xor、add
使用的指令易于模拟

请理解，这里我们处于一个特定情况，引擎不会那么容易实现以覆盖最常用的x86指令；但幸运的是，我们不需要那样！

引擎实际上是一个伪模拟器，传播汇编指令执行的不同操作。以下是我们的引擎工作原理：

每次找到符号变量时，实例化一个Z3 BitVector并将其保存在某处。符号变量基本上是攻击者可以控制的变量。例如，在我们的案例中，我们将有两个符号变量：传递给函数的两个参数。稍后我们将看到一个简单的启发式方法，以“自动”找到符号变量。
当有一条指令时，模拟它并更新引擎的CPU状态。如果涉及方程，则更新方程集。
重复直到例程结束。

当然，当引擎成功执行后，你可能想问它一些问题，比如“在例程结束时EAX寄存器保存什么？”。你希望得到计算EAX所需的所有操作。在我们的案例中，我们希望获得“symbolic_variable1 + symbolic_variable2”。

以下是一个小例子来总结我们刚才所说的：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


mov eax, [arg1]  ; 此时我们有了第一个符号变量，将其推入方程列表
mov edx, [arg2]  ; 同样这里也是

shr eax, 2   ; EAX=sym1 >> 2
add eax, 1   ; EAX=(sym1 >> 2) + 1
shl eax, 3   ; EAX=((sym1 >> 2) + 1) << 1
and eax, 2   ; EAX=(((sym1 >> 2) + 1) << 1) & 2
inc edx      ; EDX=sym2 + 1
xor edx, eax ; EDX=(sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2)
mov eax, edx ; EAX=(sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2)

因此，最后你可以要求引擎给出EAX的最终状态，例如，它应该返回类似这样的内容：

1

EAX=(sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2)

有了这个方程，你可以自由使用Z3Py来简化它，或者尝试找出如何通过仅控制符号变量来使EAX具有特定值：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


In [1]: from z3 import *
In [2]: sym1 = BitVec('sym1', 32)
In [3]: sym2 = BitVec('sym2', 32)

In [4]: simplify((sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2))
Out[4]: 1 + sym2 ^ Concat(0, 1 + Extract(0, 0, sym1 >> 2), 0)

In [5]: solve((sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2) == 0xdeadbeef)
[sym1 = 0, sym2 = 3735928556]

In [6]: solve((sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2) == 0xdeadbeef, sym1 !=  0)
[sym1 = 1073741824, sym2 = 3735928556]

In [7]: sym1 = 1073741824
In [8]: sym2 = 3735928556

In [9]: hex((sym2 + 1) ^ ((((sym1 >> 2) + 1) << 1) & 2) & 0xffffffff)
Out[9]: '0xdeadbeefL'

正如你可以想象的，这种工具在逆向工程任务或漏洞挖掘中非常宝贵/方便。不幸的是，我们的概念验证不够准确/通用/完整，无法在“正常”情况下使用，但没关系。

开始编码

为了实现我们的小型概念验证，我们将仅使用IDAPython和Z3Py。

反汇编器

我们必须做的第一件事是使用IDA的API来获取有关汇编指令的检查信息。想法只是轻松地获得助记符、源操作数和目标操作数；以下是我为此目的设计的类：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


class Disassembler(object):
    '''一个简单的类，用于在IDA中轻松解码指令'''
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self.eip = start

    def _decode_instr(self):
        '''返回助记符、目标、源'''
        mnem = GetMnem(self.eip)
        x = []
        for i in range(2):
            ty = GetOpType(self.eip, i)
            # 常量
            if 5 <= ty <= 7:
                x.append(GetOperandValue(self.eip, i))
            else:
                x.append(GetOpnd(self.eip, i))

        return [mnem] + x

    def get_next_instruction(self):
        '''这是一个方便的生成器，你可以轻松地遍历每条指令'''
        while self.eip != self.end:
            yield self._decode_instr()
            self.eip += ItemSize(self.eip)

符号执行引擎

我们的引擎有几个重要部分：

模拟汇编指令的部分
存储例程中使用的不同方程的部分。它是一个简单的Python字典：键是唯一标识符，值是方程
CPU状态。我们也使用字典：键是寄存器名称，值是该特定时刻寄存器保存的内容。注意，我们只存储方程的唯一标识符。实际上，我们的设计与Jonathan在“Binary analysis: Concolic execution with Pin and z3”中的设计非常相似，所以如果不清楚，请参考他的酷图：P
内存状态；在这个字典中，我们存储内存引用。记住，如果我们发现对内存区域的未初始化访问，我们实例化一个符号变量。这是我们自动找到符号变量的启发式方法

以下是概念验证代码：

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169


def prove(f):
    '''取自 http://rise4fun.com/Z3Py/tutorialcontent/guide#h26'''
    s = Solver()
    s.add(Not(f))
    if s.check() == unsat:
        return True
    return False

class SymbolicExecutionEngine(object):
    '''符号执行引擎是处理符号执行的类。它将跟踪遇到的不同方程以及程序每个点的CPU上下文。

    符号变量必须由用户找到（或使用数据污染）。这不是本类的目的。

    我们很幸运，只需要处理这些操作和编码：
        . mov:
            . mov reg32, reg32
            . mov reg32, [mem]
            . mov [mem], reg32
        . shr:
            . shr reg32, cst
        . shl:
            . shl reg32, cst
        . and:
            . and reg32, cst
            . and reg32, reg32
        . xor:
            . xor reg32, cst
        . or:
            . or reg32, reg32
        . add:
            . add reg32, reg32

    我们也不关心：
        . EFLAGS
        . 分支
        . 较小的寄存器（16/8位）
    长话短说：完美；这种环境使得玩符号执行非常容易。'''
    def __init__(self, start, end):
        # 这是每个时刻的CPU上下文
        # 寄存器的值是方程字典中的索引
        self.ctx = {
            'eax' : None,
            'ebx' : None,
            'ecx' : None,
            'edx' : None,
            'esi' : None,
            'edi' : None,
            'ebp' : None,
            'esp' : None,
            'eip' : None
        }

        # 符号执行开始的地址
        self.start = start

        # 符号执行停止的地址
        self.end = end

        # 我们的反汇编器
        self.disass = Disassembler(start, end)

        # 这是指令可以用来保存临时值/结果的内存
        self.mem = {}

        # 每个方程必须有唯一id
        self.idx = 0

        # 符号变量将存储在那里
        self.sym_variables = []

        # 每个方程将存储在这里
        self.equations = {}

    def _check_if_reg32(self, r):
        '''XXX: 做一个装饰器？'''
        return r.lower() in self.ctx

    def _push_equation(self, e):
        self.equations[self.idx] = e
        self.idx += 1
        return (self.idx - 1)

    def set_reg_with_equation(self, r, e):
        if self._check_if_reg32(r) == False:
            return

        self.ctx[r] = self._push_equation(e)

    def get_reg_equation(self, r):
        if self._check_if_reg32(r) == False:
            return

        return self.equations[self.ctx[r]]

    def run(self):
        '''从开始地址到结束地址运行引擎'''
        for mnemonic, dst, src in self.disass.get_next_instruction():
            if mnemonic == 'mov':
                # mov reg32, reg32
                if src in self.ctx and dst in self.ctx:
                    self.ctx[dst] = self.ctx[src]
                # mov reg32, [mem]
                elif (src.find('var_') != -1 or src.find('arg') != -1) and dst in self.ctx:
                    if src not in self.mem:
                        # 尝试读取未初始化的位置，我们得到了一个符号变量！
                        sym = BitVec('arg%d' % len(self.sym_variables), 32)
                        self.sym_variables.append(sym)
                        print 'Trying to read a non-initialized area, we got a new symbolic variable: %s' % sym
                        self.mem[src] = self._push_equation(sym)

                    self.ctx[dst] = self.mem[src]
                # mov [mem], reg32
                elif dst.find('var_') != -1 and src in self.ctx:
                    if dst not in self.mem:
                        self.mem[dst] = None

                    self.mem[dst] = self.ctx[src]
                else:
                    raise Exception('This encoding of "mov" is not handled.')
            elif mnemonic == 'shr':
                # shr reg32, cst
                if dst in self.ctx and type(src) == int:
                    self.set_reg_with_equation(dst, LShR(self.get_reg_equation(dst), src))
                else:
                    raise Exception('This encoding of "shr" is not handled.')
            elif mnemonic == 'shl':
                # shl reg32, cst
                if dst in self.ctx and type(src) == int:
                    self.set_reg_with_equation(dst, self.get_reg_equation(dst) << src)
                else:
                    raise Exception('This encoding of "shl" is not handled.')
            elif mnemonic == 'and':
                x = None
                # and reg32, cst
                if type(src) == int:
                    x = src
                # and reg32, reg32
                elif src in self.ctx:
                    x = self.get_reg_equation(src)
                else:
                    raise Exception('This encoding of "and" is not handled.')

                self.set_reg_with_equation(dst, self.get_reg_equation(dst) & x)
            elif mnemonic == 'xor':
                # xor reg32, cst
                if dst in self.ctx and type(src) == int:
                    self.set_reg_with_equation(dst, self.get_reg_equation(dst) ^ src)
                else:
                    raise Exception('This encoding of "xor" is not handled.')
            elif mnemonic == 'or':
                # or reg32, reg32
                if dst in self.ctx and src in self.ctx:
                    self.set_reg_with_equation(dst, self.get_reg_equation(dst) | self.get_reg_equation(src))
                else:
                    raise Exception('This encoding of "or" is not handled.')
            elif mnemonic == 'add':
                # add reg32, reg32
                if dst in self.ctx and src in self.ctx:
                    self.set_reg_with_equation(dst, self.get_reg_equation(dst) + self.get_reg_equation(src))
                else:
                    raise Exception('This encoding of "add" is not handled.')
            else:
                print mnemonic, dst, src
                raise Exception('This instruction is not handled.')

    def get_reg_equation_simplified(self, reg):
        eq = self.get_reg_equation(reg)
        eq = simplify(eq)
        return eq

测试

好的，我们只需要实例化引擎，给他例程的开始/结束地址，并要求他给出EAX中保存的最终方程。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


def main():
    '''这里我们将尝试攻击我在“钢铁混淆：遇见我的Kryptonite”中讨论的语义保持混淆：http://0vercl0k.tuxfamily.org/bl0g/?p=260。

    想法是使用一个小的符号执行引擎来击败这些混淆。'''
    sym = SymbolicExecutionEngine(0x804845A, 0x0804A17C)
    print 'Launching the engine..'
    sym.run()
    print 'Done, retrieving the equation in EAX, and simplifying..'
    eax = sym.get_reg_equation_simplified('eax')
    print 'EAX=%r' % eax
    return 1

if __name__ == '__main__':
    main()

以下是我看到的：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


Launching the engine..
Trying to read a non-initialized area, we got a new symbolic variable: arg0
Trying to read a non-initialized area, we got a new symbolic variable: arg1
Done, retrieving the equation in EAX, and simplifying..
EAX=(~(Concat(2147483647, Extract(0, 0, arg1)) |
    Concat(2147483647, ~Extract(0, 0, arg0)) |
    4294967294) |
    ~(Concat(2147483647, ~Extract(0, 0, arg1)) |
    Concat(2147483647, Extract(0, 0, arg0)) |
    4294967294)) +
Concat(~(Concat(1073741823, Extract(1, 1, arg1)) |
            Concat(1073741823, ~Extract(1, 1, arg0)) |
            Concat(1073741823,
                ~(~Extract(0, 0, arg1) |
                    ~Extract(0, 0, arg0)))) |
        ~(Concat(1073741823, ~Extract(1, 1, arg1)) |
            Concat(1073741823, Extract(1, 1, arg0)) |
            Concat(1073741823,
                ~(~Extract(0, 0, arg1) |
                    ~Extract(0, 0, arg0)))) |
        ~(Concat(1073741823, Extract(1, 1, arg1)) |
            Concat(1073741823, Extract(1, 1, arg0)) |
            Concat(1073741823, ~Extract(0, 0, arg1)) |
            Concat(1073741823, ~Extract(