CodeQL从零到精通第五部分：调试查询

当你刚开始使用CodeQL时，可能会遇到查询未返回预期结果的情况。调试这些查询可能很棘手，因为CodeQL是一种类似Prolog的语言，其评估模型与Python等主流语言有很大不同。这意味着你不能"逐步执行"代码，附加gdb或添加print语句等技术也不适用。幸运的是，CodeQL提供了各种内置功能来帮助你诊断和解决查询中的问题。

下面，我们将深入探讨这些功能——从抽象语法树（AST）到部分路径图——使用CodeQL用户的问题作为示例。如果你有自己的问题，可以访问并在GitHub Security Lab的公共Slack实例中提问，CodeQL工程师会监控该实例。

这篇博客可以独立阅读；但是，如果你是CodeQL新手或想更深入地了解静态分析和CodeQL，你可能想查看我的CodeQL从零到精通博客系列的其他部分。每部分都涉及不同的主题：状态分析基础、编写CodeQL、使用CodeQL进行安全研究以及在CodeQL中建模新框架——Gradio。

CodeQL从零到精通第1部分：漏洞研究的静态分析基础
CodeQL从零到精通第2部分：开始使用CodeQL
CodeQL从零到精通第3部分：使用CodeQL进行安全研究
CodeQL从零到精通第4部分：Gradio框架案例研究

每部分（包括本篇）都有附带的CodeQL查询和练习，这些内容在博客和CodeQL从零到精通存储库中。

最小代码示例

我们将使用用户NgocKhanhC311提出的问题，后来zhou noel也提出了类似的问题。两人都遇到了编写CodeQL查询来检测使用Gradio框架的项目中的漏洞的困难。由于我亲自为CodeQL添加了Gradio支持——甚至写了一篇关于这个过程的博客（CodeQL从零到精通第4部分：Gradio框架案例研究），其中包括对Gradio及其攻击面的介绍——我跳进来回答。

zhou noel想要检测在browser-use/web-ui v1.6中发现的不安全反序列化漏洞的变体。参见下面的简化代码。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


import pickle
import gradio as gr

def load_config_from_file(config_file):
    """从UUID.pkl文件加载设置。"""
    try:
        with open(config_file.name, 'rb') as f:
            settings = pickle.load(f)
        return settings
    except Exception as e:
        return f"Error loading configuration: {str(e)}"

with gr.Blocks(title="Configuration Loader") as demo:
    config_file_input = gr.File(label="Load Config File")

    load_config_button = gr.Button("Load Existing Config From File", variant="primary")

    config_status = gr.Textbox(label="Status")

    load_config_button.click(
        fn=load_config_from_file,
        inputs=[config_file_input],
        outputs=[config_status]
    )

demo.launch()

使用load_config_button.click事件处理程序（来自gr.Button），用户提供的文件config_file_input（类型为gr.File）被传递给load_config_from_file函数，该函数使用open(config_file.name, 'rb')读取文件，并使用pickle.load加载文件内容。

这里的漏洞更像是一个"二阶"漏洞。首先，攻击者上传一个恶意文件，然后应用程序使用pickle加载它。在这个例子中，我们的源是gr.File。当使用gr.File时，上传的文件存储在本地，路径在name属性config_file.name中可用。然后应用程序使用open(config_file.name, 'rb') as f:打开文件，并使用pickle.load(f)加载它，导致不安全的反序列化。

创建CodeQL数据库

使用我们的最小代码示例，我们将创建一个CodeQL数据库，类似于我们在CodeQL ZtH第4部分中的做法，在仅包含最小代码示例的目录中运行以下命令。

1

codeql database create codeql-zth5 --language=python

此命令将创建一个新目录codeql-zth5，其中包含CodeQL数据库。将其添加到你的CodeQL工作区，然后我们就可以开始了。

简化查询和快速评估

查询已经简化为谓词和类，因此我们可以使用"快速评估"按钮或右键单击谓词名称并选择CodeQL：快速评估来快速评估它。

单击isSource和isSink谓词上的快速评估会为每个谓词显示一个结果，这意味着源和汇都被正确找到。但是请注意，isSink结果突出显示了整个pickle.load(f)调用，而不仅仅是调用的第一个参数。通常，我们更喜欢将汇设置为调用的参数，而不是调用本身。

在这种情况下，Decoding抽象汇有一个getAnInput谓词，它指定了汇调用的参数。为了区分正常的Decoding汇（例如，json.loads）和可能执行代码的汇（例如pickle.load），我们可以使用mayExecuteInput谓词。

1
2


predicate isSink(DataFlow::Node sink) { 
    exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }

isSink谓词的快速评估给我们一个结果。

通过这一点，我们验证了源和汇被正确报告。这意味着在源和汇之间存在问题，CodeQL无法传播通过。

抽象语法树（AST）查看器

我们在识别源或汇节点时没有遇到问题，但如果在识别源或汇节点时出现问题，检查代码的抽象语法树（AST）以确定特定代码元素的类型将会很有帮助。

在isSink上运行快速评估后，你将看到CodeQL识别汇的文件。要查看文件的抽象语法树，右键单击你感兴趣的代码元素并选择CodeQL：查看AST。

该选项将在VS Code的CodeQL选项卡下的AST查看器部分显示文件的AST。

一旦你从AST中知道了给定代码元素的类型，为你感兴趣的代码元素编写查询就更容易了。

getAQlClass谓词

找出你感兴趣的代码元素类型的另一个好策略是使用getAQlClass谓词。通常，最好创建一个单独的查询，这样就不会弄乱原始查询。

例如，我们可以编写一个查询来检查传递给gradio.Button.click的函数fn的参数类型：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


/**
 * @name getAQlClass on Gradio Button input source
 * @description This query reports on a code element's types.
 * @id 5/2
 * @severity error
 * @kind problem
 */

import python
import semmle.python.ApiGraphs
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources

from DataFlow::Node node
where node = API::moduleImport("gradio").getMember("Button").getReturn()
        .getMember("click").getACall().getParameter(0, "fn").getParameter(_).asSource()
select node, node.getAQlClass()

运行查询提供了五个显示参数类型的结果：FutureTypeTrackingNode、ExprNode、LocalSourceNodeNotModuleVariableNode、ParameterNode和LocalSourceParameterNode。从结果来看，对于编写查询最有趣和最有用的类型是ExprNode和ParameterNode。

部分路径图：前向

既然我们已经确定连接源和汇存在问题，我们应该验证污点流停止的位置。我们可以使用部分路径图来实现这一点，它显示源流向的所有汇以及这些流停止的位置。这也是为什么拥有最小代码示例如此重要——否则我们会得到很多结果。

如果你最终在一个大型代码库上工作，你应该尝试将你开始的源限制为，例如，具有类似于以下条件的特定文件：

1
2


predicate isSource(DataFlow::Node source) { source instanceof GradioButton 
    and source.getLocation().getFile().getBaseName() = "example.py" }

部分图有两种形式：前向FlowExplorationFwd，它从给定源跟踪流到任何汇；和后向/反向FlowExplorationRev，它从给定汇跟踪流回到任何源。

我们在CodeQL Community Packs中为大多数语言的查询提供了部分路径图的公共模板——参见Python的模板。

以下是我们如何为当前问题编写前向部分路径图查询：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42


/**
 * @name Gradio Button partial path graph
 * @description This query tracks data flow from inputs passed to a Gradio's Button component to any sink.
 * @kind path-problem
 * @problem.severity warning
 * @id 5/3
 */

import python
import semmle.python.ApiGraphs
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.dataflow.new.TaintTracking

// import MyFlow::PathGraph
import PartialFlow::PartialPathGraph

class GradioButton extends RemoteFlowSource::Range {
    GradioButton() {
        exists(API::CallNode n |
        n = API::moduleImport("gradio").getMember("Button").getReturn()
        .getMember("click").getACall() |
        this = n.getParameter(0, "fn").getParameter(_).asSource())
    }

    override string getSourceType() { result = "Gradio untrusted input" }
}

private module MyConfig implements DataFlow::ConfigSig {
    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }

    predicate isSink(DataFlow::Node sink) { exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }

}

module MyFlow = TaintTracking::Global<MyConfig>;
int explorationLimit() { result = 10 }
module PartialFlow = MyFlow::FlowExplorationFwd<explorationLimit/0>;

from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode sink
where PartialFlow::partialFlow(source, sink, _)
select sink.getNode(), source, sink, "Partial Graph $@.", source.getNode(), "user-provided value."

改变了什么：

我们注释掉了import MyFlow::PathGraph，改为导入PartialFlow::PartialPathGraph
我们将explorationLimit()设置为10，这控制了分析深度。在具有复杂流的大型代码库中尤其有用
我们使用FlowExplorationFwd创建一个PartialFlow模块，意味着我们从指定源跟踪流到任何汇。如果我们想从汇开始跟踪回任何源，我们将使用FlowExplorationRev，并在查询本身中进行小的更改。参见FlowExplorationRev的模板
最后，我们对from-where-select查询进行了更改，使用PartialFlow::PartialPathNodes和PartialFlow::partialFlow谓词

运行查询给我们一个结果，该结果在with open(config_file.name, 'rb') as f:行的config_file处结束。这意味着CodeQL没有传播到config_file.name中的name属性。

这里的config_name是gr.File的一个实例，它具有name属性，该属性存储上传文件的路径。

通常，如果一个对象被污染，我们无法判断其所有属性是否也被污染。默认情况下，CodeQL不会传播到对象的属性。因此，我们需要通过编写污点步骤来帮助污点从对象传播到其name属性。

污点步骤

最快的方法（虽然不是最漂亮的）是编写一个污点步骤，从任何对象传播到该对象的name属性。这自然不是我们希望在生产CodeQL查询中包含的内容，因为它可能导致误报。对于我们的用例来说没问题，因为我们正在为安全研究编写查询。

我们通过使用isAdditionalFlowStep谓词将污点步骤添加到污点跟踪配置中。这个污点步骤将允许CodeQL传播到任何对name属性的读取。我们指定我们想要连接的两个节点——nodeFrom和nodeTo——以及它们应该如何连接。nodeFrom是访问name属性的节点，nodeTo是表示属性读取的节点。

1
2
3
4
5
6


predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(DataFlow::AttrRead attr |
        attr.accesses(nodeFrom, "name")
        and nodeTo = attr
    )
}

让我们将其作为一个单独的谓词以便于测试，并将其插入到我们的部分路径图查询中。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


/**
 * @name Gradio Button partial path graph
 * @description This query tracks data flow from Gradio's Button component to any sink.
 * @kind path-problem
 * @problem.severity warning
 * @id 5/4
 */

import python
import semmle.python.ApiGraphs
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.dataflow.new.TaintTracking

// import MyFlow::PathGraph
import PartialFlow::PartialPathGraph

class GradioButton extends RemoteFlowSource::Range {
    GradioButton() {
        exists(API::CallNode n |
        n = API::moduleImport("gradio").getMember("Button").getReturn()
        .getMember("click").getACall() |
        this = n.getParameter(0, "fn").getParameter(_).asSource())
    }

    override string getSourceType() { result = "Gradio untrusted input" }
}

predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    // Connects an attribute read of an object's `name` attribute to the object itself
    exists(DataFlow::AttrRead attr |
      attr.accesses(nodeFrom, "name")
      and nodeTo = attr
    )
}

private module MyConfig implements DataFlow::ConfigSig {
    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }

    predicate isSink(DataFlow::Node sink) { exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }

    predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    nameAttrRead(nodeFrom, nodeTo)
    }
}

module MyFlow = TaintTracking::Global<MyConfig>;
int explorationLimit() { result = 10 }
module PartialFlow = MyFlow::FlowExplorationFwd<explorationLimit/0>;

from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode sink
where PartialFlow::partialFlow(source, sink, _)
select sink.getNode(), source, sink, "Partial Graph $@.", source.getNode(), "user-provided value."

运行查询给我们两个结果。在第二条路径中，我们看到污点传播到了config_file.name，但没有进一步传播。发生了什么？

再次污点步骤？

这段特定的代码结果有点特殊情况。我之前提到这个漏洞本质上是一个"二阶"漏洞——我们首先上传一个恶意文件，然后加载那个本地存储的文件。通常在这些情况下，我们认为文件的路径是被污染的，而不是文件本身的内容，所以CodeQL通常不会在这里传播。在我们的例子中，在Gradio中，我们确实控制正在加载的文件。

这就是为什么我们需要另一个污点步骤来从config_file.name传播到open(config_file.name, 'rb')。

我们可以编写一个谓词，从open()的参数传播到open()的结果（并且也从os.open的参数传播到os.open调用，既然我们正在处理它）。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


predicate osOpenStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    // Connects the argument to `open()` to the result of `open()`
    // And argument to `os.open()` to the result of `os.open()`
    exists(API::CallNode call |
        call = API::moduleImport("os").getMember("open").getACall() and
        nodeFrom = call.getArg(0) and
        nodeTo = call)
    or
    exists(API::CallNode call |
        call = API::builtin("open").getACall() and
        nodeFrom = call.getArg(0) and
        nodeTo = call)
}

然后我们可以将这第二个污点步骤添加到isAdditionalFlowStep中。

1
2
3
4
5


predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    nameAttrRead(nodeFrom, nodeTo)
    or
    osOpenStep(nodeFrom, nodeTo)
}

让我们将污点步骤添加到最终的污点跟踪查询中，并使其再次成为正常的污点跟踪查询。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67


/**
 * @name Gradio File Input Flow
 * @description This query tracks data flow from Gradio's Button component to a Decoding sink.
 * @kind path-problem
 * @problem.severity warning
 * @id 5/5
 */

import python
import semmle.python.ApiGraphs
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.dataflow.new.TaintTracking

import MyFlow::PathGraph

class GradioButton extends RemoteFlowSource::Range {
    GradioButton() {
        exists(API::CallNode n |
        n = API::moduleImport("gradio").getMember("Button").getReturn()
        .getMember("click").getACall() |
        this = n.getParameter(0, "fn").getParameter(_).asSource())
    }

    override string getSourceType() { result = "Gradio untrusted input" }
}

predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    // Connects an attribute read of an object's `name` attribute to the object itself
    exists(DataFlow::AttrRead attr |
      attr.accesses(nodeFrom, "name")
      and nodeTo = attr
    )
}

predicate osOpenStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    // Connects the argument to `open()` to the result of `open()`
    // And argument to `os.open()` to the result of `os.open()`
    exists(API::CallNode call |
        call = API::moduleImport("os").getMember("open").getACall() and
        nodeFrom = call.getArg(0) and
        nodeTo = call)
    or
    exists(API::CallNode call |
        call = API::builtin("open").getACall() and
        nodeFrom = call.getArg(0) and
        nodeTo = call)
}

private module MyConfig implements DataFlow::ConfigSig {
    predicate isSource(DataFlow::Node source) { source instanceof GradioButton }

    predicate isSink(DataFlow::Node sink) {
        exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput()) }

    predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
        nameAttrRead(nodeFrom, nodeTo)
        or
        osOpenStep(nodeFrom, nodeTo)
        }
}

module MyFlow = TaintTracking::Global<MyConfig>;

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Data Flow from a Gradio source to decoding"

运行查询提供了一个结果——我们一直在寻找的漏洞！

更漂亮的污点步骤

请注意，本节中编写的CodeQL非常特定于Gradio，你在其他框架中不太可能遇到类似的建模。接下来是先前污点步骤的更高级版本，我为那些想更深入地编写更可维护的解决方案来解决这个污点步骤问题的人添加了这部分内容。作为安全研究员，你不太可能需要编写这种细粒度的CodeQL，但如果你在工作中使用CodeQL，本节可能会派上用场。

正如我们提到的，通过任何对象上的name属性读取传播污点的污点步骤是一个hacky的解决方案。并非每个通过name读取传播污点的对象都会导致漏洞。我们希望限制污点步骤，使其仅传播类似于这种情况——仅针对gr.File类型。

但我们遇到了一个问题——Gradio源被建模为传递给gr.Button.click事件处理程序中函数的任何参数，因此CodeQL不知道传递给gr.Button.click中函数的给定参数的类型。因此，我们不能轻易编写一个直接的污点步骤，在传播到name属性之前检查源是否为gr.File类型。

我们必须"回看"源实例化的位置，检查其类型，然后将该对象连接到name属性读取。

回想我们的最小代码示例。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


import pickle
import gradio as gr

def load_config_from_file(config_file):
    """从UUID.pkl文件加载设置。"""
    try:
        with open(config_file.name, 'rb') as f:
            settings = pickle.load(f)
        return settings
    except Exception as e:
        return f"Error loading configuration: {str(e)}"

with gr.Blocks(title="Configuration Loader") as demo:
    config_file_input = gr.File(label="Load Config File")

    load_config_button = gr.Button("Load Existing Config From File", variant="primary")

    config_status = gr.Textbox(label="Status")

    load_config_button.click(
        fn=load_config_from_file,
        inputs=[config_file_input],
        outputs=[config_status]
    )

demo.launch()

污点步骤通过创建两个指定节点之间的边（连接）来工作。在我们的例子中，我们正在寻找连接同一路径上的两组节点。

首先，我们希望CodeQL连接传递给输入（这里是config_file_input）的变量，例如在gr.Button.click中，并将其连接到load_config_from_file函数中的参数config_file。这样它将能够传播回实例化，到config_file_input = gr.File(label="Load Config File")。

其次，我们希望CodeQL从我们检查为gr.File类型的节点传播到它们读取name属性的情况。

有趣的是，我已经编写了一个名为ListTaintStep的污点步骤，可以跟踪回实例化，甚至在之前的CodeQL从零到精通中写了一节关于它的内容。我们可以重用已实现的逻辑，并将其添加到我们的查询中。我们将通过修改nameAttrRead谓词来实现。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    // Connects an attribute read of an object's `name` attribute to the object itself
    exists(DataFlow::AttrRead attr |
      attr.accesses(nodeFrom, "name")
      and nodeTo = attr
    )
    and
    exists(API::CallNode node, int i, DataFlow::Node n1, DataFlow::Node n2 |
		node = API::moduleImport("gradio").getAMember().getReturn().getAMember().getACall() and
        n2 = node.getParameter(0, "fn").getParameter(i).asSource()
        and n1.asCfgNode() =
          node.getParameter(1, "inputs").asSink().asCfgNode().(ListNode).getElement(i)
        and n1.getALocalSource() = API::moduleImport("gradio").getMember("File").getReturn().asSource()
        and (DataFlow::localFlow(n2, nodeFrom) or DataFlow::localFlow(nodeTo, n1))
        )
}

污点步骤连接任何对象到该对象的name读取（像以前一样）。然后，它查找传递给fn的函数，传递给输入（例如在gr.Button.click中）的变量，并通过使用整数i来跟踪变量的位置，将输入中的变量连接到赋予函数fn的参数。

然后，通过使用：

1
2


nodeFrom.getALocalSource()
        = API::moduleImport("gradio").getMember("File").getReturn().asSource()

我们检查我们正在跟踪的节点是gr.File类型。

1

and (DataFlow::localFlow(n2, nodeFrom) or DataFlow::localFlow(nodeTo, n1)

最后，我们检查在fn函数参数n2和属性读取nodeFrom之间存在本地流（具有任意数量的路径步骤），或者在特定的name属性读取nodeTo和传递给gr.Button.click的输入的变量之间存在本地流。

我们所做的本质上是两个污点步骤（我们连接，即在两组节点之间创建边）通过本地流连接，将它们组合成一个污点步骤。我们将其变成一个污点步骤的原因是因为一个条件不能没有另一个而存在。我们使用localFlow是因为在我们将传递给输入的变量连接到gr.Button.click中fn定义的函数以及后来在对象上读取name属性之间可能存在几个步骤。localFlow允许我们连接这两者。

它看起来很复杂，但它源于有向图的工作原理。

完整的CodeQL查询：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76


/**
 * @name Gradio File Input Flow
 * @description This query tracks data flow from Gradio's Button component to a Decoding sink.
 * @kind path-problem
 * @problem.severity warning
 * @id 5/6
 */

import python
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.Concepts
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.ApiGraphs

class GradioButton extends RemoteFlowSource::Range {
    GradioButton() {
        exists(API::CallNode n |
        n = API::moduleImport("gradio").getMember("Button").getReturn()
        .getMember("click").getACall() |
        this = n.getParameter(0, "fn").getParameter(_).asSource())
    }

    override string getSourceType() { result = "Gradio untrusted input" }
}

predicate nameAttrRead(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    // Connects an attribute read of an object's `name` attribute to the object itself
    exists(DataFlow::AttrRead attr |
      attr.accesses(nodeFrom, "name")
      and nodeTo = attr
    )
    and
    exists(API::CallNode node, int i, DataFlow::Node n1, DataFlow::Node n2 |
		node = API::moduleImport("gradio").getAMember().getReturn().getAMember().getACall() and
        n2 = node.getParameter(0, "fn").getParameter(i).asSource()
        and n1.asCfgNode() =
          node.getParameter(1, "inputs").asSink().asCfgNode().(ListNode).getElement(i)
        and n1.getALocalSource() = API::moduleImport("gradio").getMember("File").getReturn().asSource()
        and (DataFlow::localFlow(n2, nodeFrom) or DataFlow::localFlow(nodeTo, n1))
        )
}

predicate osOpenStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    exists(API::CallNode call |
        call = API::moduleImport("os").getMember("open").getACall() and
        nodeFrom = call.getArg(0) and
        nodeTo = call)
    or
    exists(API::CallNode call |
        call = API::builtin("open").getACall() and
        nodeFrom = call.getArg(0) and
        nodeTo = call)
}

module MyConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) { source instanceof GradioButton }

  predicate isSink(DataFlow::Node sink) {
    exists(Decoding d | d.mayExecuteInput() | sink = d.getAnInput())
  }

  predicate isAdditionalFlowStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
    nameAttrRead(nodeFrom, nodeTo)
    or
    osOpenStep(nodeFrom, nodeTo)
   }
}

import MyFlow::PathGraph

module MyFlow = TaintTracking::Global<MyConfig>;

from MyFlow::PathNode source, MyFlow::PathNode sink
where MyFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "Data Flow from a Gradio source to decoding"

运行污点步骤将返回从gr.File到pickle.load(f)的完整路径。

这种形式的污点步骤可以贡献给CodeQL上游。然而，这是一个非常特定的污点步骤，对某些漏洞有意义，对其他漏洞则没有意义。例如，它适用于文章中描述的不安全反序列化漏洞，但不适用于路径注入。这是因为这是一个"二阶"漏洞——我们控制上传的文件，但不控制其路径（存储在"name"中）。对于具有像open(file.name, 'r')这样的汇的路径注入漏洞，这将是一个误报。

结论

我们在GHSL Slack上遇到的一些关于跟踪污点的问题可能是一个挑战。像这样的情况不常发生，但当它们发生时，它们成为分享经验教训和撰写博客文章（如本篇）的好候选。

我希望我追逐污点的故事能帮助你调试查询。如果在尝试了本博客中的提示后，你的查询仍然存在问题，请随时在我们的公共GitHub Security Lab Slack实例或github/codeql讨论中寻求帮助。

CodeQL查询调试全攻略：从零到精通的故障排除技巧

本文详细介绍了如何调试CodeQL查询，包括创建最小代码示例、使用快速评估、查看抽象语法树、利用部分路径图和编写污点步骤等技术，帮助解决查询结果不符合预期的问题。