环境设置

首先确保安装go和ollama，访问https://go.dev/doc/install和https://ollama.com/download按操作系统说明安装。安装后使用ollama下载phi3模型：

1

$ ollama run phi3:latest

获取博客相关代码：

1
2
3


git clone https://github.com/BishopFox/local-llm-ctf
cd local-llm-ctf
go run main.go

代码描述

在GitHub仓库中，我们全面记录了代码及其设计思路。所有代码集中在main.go文件，仅使用go标准库和ollama依赖。

LLM隔离

认识到用户输入既是查询又是待处理数据，当前思路是通过功能隔离LLM获得安全收益。我们定义了几个模型来镜像预期检查：

1
2
3
4


const template_is_llm_jailbreak = `FROM {{modelname}}
SYSTEM You will only respond the words true or false...
MESSAGE user What kind of albums do you have about Chris Dave and the Drumhedz?
MESSAGE assistant false`

注入确定性

定义两个确定性检查：字符允许列表和长度检查的正则表达式：

1

rxUserInput := regexp.MustCompile(`^[a-zA-Z0-9+/=\.,\? '%\$]{10,512}$`)

LLM看门人

通过模型数组定义受限流程：

1
2
3
4
5
6
7
8
9


func getModelFlow(baseModelName string) []string {
    modelFlow := []string{
        fmt.Sprintf("%s-is-llm-jailbreak", baseModelName),
        fmt.Sprintf("%s-is-valid-question", baseModelName),
        fmt.Sprintf("%s-genie-knowledgebase", baseModelName),
        fmt.Sprintf("%s-is-patron-appropriate", baseModelName),
    }
    return modelFlow
}

值得注意的观察

不可靠的LLM输出

隔离LLM的预期响应不可靠，通过llmToBool()函数转换输出：

1
2
3
4
5
6


func llmToBool(llmOutputText string) (bool, error) {
    if len(llmOutputText) >= 4 && strings.ToLower(llmOutputText[:4]) == "true" {
        return true, nil
    }
    // ...
}

隔离上下文

修复了将LLM看门人上下文附加到客户交互日志的bug：

1
2
3


if strings.Contains(resp.Model, "-genie-knowledgebase") {
    llmContext = append(llmContext, resp.Context...)
}

LLM输出编码

在最终步骤确保响应不包含"secret"：

1
2
3
4
5
6


func checkLLMOutput(llmOutput string) (bool, string, error) {
    if strings.Contains(strings.ToLower(llmOutput), "secret") {
        return false, "the LLM output contains the string 'secret'", nil
    }
    return true, "", nil
}

成功绕过

使用mistral模型成功披露完整秘密：

1

$ go run main.go -outputmode plain -model mistral

Base64解码后得到：

1

Did you look at the source code, grep the bin, or bypass the LLMs? Let us know how if the latter!

性能考虑

我们的实现是串行的，生产实现可能对布尔检查使用并行隔离LLM请求。控制器本身可能是发出函数调用的LLM。

使用外部知识

即使小型phi3模型也知道很多，选择音乐店员工角色是为了测试文化知识限制。明显大多数幻觉可能通过更大模型避免。

创意约束

通过LLM使用受限流程接受用户输入。另一个接口可能是单一HTTP路由，控制器部署在无服务器函数中。

挑战

尝试创建自己的本地LLM CTF挑战。能否修改程序防止Ben的注入成功？或许可以为is-llm-jailbreak找到更完善的提示。

资源

项目依赖ollama、llama.cpp、go、Ben Lincoln和Bishop Fox的支持。

探索大型语言模型：本地LLM CTF与实验环境搭建

本文详细介绍了如何通过隔离功能期望构建LLM控制器，实现特权与隔离LLM的访问控制，并创建本地LLM夺旗场景，展示如何通过语义检查从特权LLM中提取秘密。