Wingardium Trivia-osa! 基于Gemma、Ollama、USearch和RETSim的本地设备分类帽机器人

在本篇文章中，我们将探索如何构建自己的本地设备LLM驱动的AI代理，该代理利用RAG（检索增强生成）技术来正确回答关于《哈利波特》魔法世界角色的问题。为实现这一目标，我们将结合使用Ollama作为本地推理引擎、Gemma作为本地LLM、我们新发布的RETSim超快速近重复文本嵌入以及USearch进行高效索引和检索。想要直接查看代码的读者可以在UniSim Github上获取相关笔记本。

得益于Ollama，当您想快速实验或不想依赖付费API时，运行本地LLM变得非常容易。这些模型尺寸显著减小（约80亿参数，而服务器模型有数百亿参数），能够在普通笔记本电脑上运行，且性能表现惊人。例如，我们自家的LLM Gemma在本周的Google I/O上获得了更新，在所有基准测试中都表现出非常强大的性能。

然而，较小的模型规模意味着模型对世界的了解和记忆较少，因此回答问题的能力较弱。此外，它们对于训练数据中显然不包含的独特数据集的问题回答能力也较差。解决这种数据新鲜度问题的主要方法是执行所谓的检索增强生成（RAG），如图所示，这是将LLM能力与检索系统结合，快速查找与用户查询最相关的数据，并将其添加到LLM上下文中，以便用它来回答问题。

迄今为止，大多数用于RAG的文本嵌入都专注于语义相似性，对拼写错误没有弹性，并且相当大（数亿参数），这使得它们在设备上使用速度较慢，并且没有优化用于查找精确或近重复记录。在过去的两年里，我们与Marina、Owen和团队一起致力于开发一种新型文本相似性嵌入来解决这些缺点。结果就是RETSim（ICLR ‘24），这是一个非常小的定制Transformer模型（少于100万个参数），在近重复文本相似性和检索方面显著优于其他嵌入。

这种速度和处理拼写错误能力的结合使RETSim成为与Gemma和Ollama结合构建RAG管道的完美嵌入，用于按消费者姓名、地址或任何其他可能出现拼写错误的数据查找数据。请注意，20-30%的用户查询通常存在拼写错误，因此这是一个不容忽视的问题。

为了说明如何将所有技术结合成一个可工作的原型，我们将构建一个端到端的RAG系统，使用Kaggle的《哈利波特》书籍角色数据集来回答关于哈利波特魔法世界角色的问题。包含这里讨论的所有步骤的笔记本可在UniSim Github上获取，RETSim即通过该包提供。

设置

首先，我们安装所需的包：Ollama用于本地运行Gemma，UniSim用于使用RETSim索引数据并使用USearch检索。

1

!pip install -U tqdm Iprogress unisim ollama tabulate

接下来，我们执行预检检查以确保一切正常工作，包括在需要时下载最新版本的Gemma。

1
2
3
4
5
6
7
8
9


# 确保Gemma已通过Ollama安装，否则安装它
MODEL = 'gemma'
try:
    ollama.show(MODEL)
except Exception as e:
    print(f"can't find {MODEL}: {e} installing it")
    ollama.pull(MODEL)
info = ollama.show(MODEL)
print(f"{MODEL.capitalize()} {info['details']['parameter_size']} loaded")

然后，我们通过测试对拼写错误的字符串相似性能力来检查UniSim/RETSim和Usearch是否正确工作，这对于许多其他应用（如记录去重和数据集清理）也很有用。

1
2
3
4
5
6
7
8


VERBOSE = True  # 交互式演示，因此我们希望看到发生的情况
txtsim = unisim.TextSim(verbose=True)
# 检查是否按预期工作
sim_value = txtsim.similarity("Gemma", "Gemmaa")
if sim_value > 0.9:
    print(f"Similarity {sim_value} - TextSim works as intended")
else:
    print(f"Similarity {sim_value} - Something is very wrong with TextSim")

测试问题

在构建RAG之前，让我们通过提出几个难度递增的问题来评估Gemma对魔法世界的了解程度，并在其中加入一些拼写错误，以查看它如何影响模型性能。我在问题旁边添加了类型以表达测试的类型。

1
2
3
4
5
6
7


questions = [
             {"q":'Which School is Harry Potter part of?', 'type': 'basic fact'},
             {"q": 'Who is ermionne?', 'type': 'typo'},
             {"q": 'What is Aberforth job?', 'type': 'harder fact'},
             {"q": "what is dubldore job?", 'type': 'harder fact and typo'},
             {"q": 'Which school is  Nympadora from?', 'type': 'hard fact'},
]

直接生成答案

让我们通过Ollama将这些问题运行通过Gemma，看看得到什么类型的答案。

1
2
3
4
5
6


print("[answers without retrieval]\n")
for q in questions:
    a =  q['direct'][:100].replace('\n', ' ')
    print(f"Q:{q['q']}? (type: {q['type']})")
    print(f"Direct answer: {a}..")
    print("")

以下是我们从Gemma获得的没有任何检索的答案。

Which School is Harry Potter part of?? (type: basic fact)
Answer: Hogwarts School of Witchcraft and Wizardry is the school that Harry Potter attends…
这不是一个很好的答案，因为霍格沃茨是正确的，但我们缺少格兰芬多。

Who is ermionne?? (type: typo)
Answer: Ermionne is a French fashion designer known for her colorful and playful designs, primarily focused …
这里模型完全被拼写错误搞乱了，因为它没有足够的查询上下文。在提示中添加问题 specifically for Harry Potter 可能会有所帮助（请随意尝试提示调整！）

What is Aberforth job?? (type: harder fact)
Answer: Aberforth is a fictional character in the Harry Potter series of books and films. He does not have a…
这里模型无法回答，因为这需要非常熟悉哈利波特的所有次要角色。

What is dubldore job?? (type: harder fact and typo)*
Answer: Dublador is a voice actor who provides voices for characters in animated films, television shows…
这里模型试图纠正拼写错误但搞错了。RETSim可能会做得更好，因为它经过明确训练，将常见拼写错误投影到嵌入空间中。对于赫敏的情况也是如此，在提示中添加问题 specifically for Harry Potter 可能会有所帮助。

Which school is Nympadora from?? (type: hard fact)
Answer: Nympadora is a character from the book series “Harry Potter” and did not attend any school. She is a…
与Aberforth问题相同，模型知道Nymphadora Tonks是这个世界的角色，但没有回答所需的知识，并给出了错误的答案。

索引哈利波特角色数据

构建RAG管道以帮助LLM获得额外上下文的第一步是加载数据、计算嵌入并索引它们。我们简单地使用RETSim嵌入索引角色名称，并在检索过程中返回与之相关的数据以帮助模型。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


raw_data = json.loads(open('data/harry_potter_characters.json').read())
CHARACTERS_INFO = {}  # 我们使用名称作为键来去重数据
for d in raw_data:
    name = d['Name'].lower().strip()
    CHARACTERS_INFO[name] = d
print(f'{len(CHARACTERS_INFO)} characters loaded from harry_potter_characters.json')

# 使用text sim索引数据
txtsim.reset_index()  # 清理以防多次运行此单元格
idx = txtsim.add(list(CHARACTERS_INFO.keys()))

让我们快速测试我们的索引，看看它是否适用于我最喜欢的角色之一Newt Scamander，但他的名字有拼写错误。

1
2
3
4
5


r = lookup("New Scamramber", verbose=True)   # verbose显示所有匹配项
print('')
print('[best lookup result]')
print(f"name: {r[0]['Name']} / School: {r[0]['School']} / Profession: {r[0]['Profession']}")
print(f"Description: {r[0]['Descr']}")

Query 0: “new scamramber”
Most similar matches:

idx	is_match	similarity	text
1005	False	0.81	newt scamander
1006	False	0.71	newt scamander’s mother
1172	False	0.63	sam

[best lookup result]
name: Newt Scamander / School: Hogwarts - Hufflepuff / Profession: Magizoologist
Description: Newton “Newt” Scamander is a famous magizoologist and author of Fantastic Beasts and Where To Find Them (PS5) as well as a number of other books. Now retired, Scamander lives in Dorset with his wife Porpentina (FB). He received the Order of Merlin, second…

结果看起来很棒——RETSim嵌入按预期工作，因此我们现在将只使用第一个结果，并在回答用户问题之前将数据传递给LLM上下文。

RAG实现

RAG实现将分为四个步骤：

询问Gemma角色的名称，以便我们可以查找。鉴于我们可以访问强大的LLM，使用它来提取命名实体我认为是最简单和更稳健的方式
从我们的UniSim索引中检索最接近的匹配信息
将用户查询中的名称替换为查找的名称以修复拼写错误（这非常重要且经常被忽视），然后将我们检索到的信息注入查询中
回答用户的问题，并用我们对哈利波特魔法世界的广泛知识给他们留下深刻印象！

这转化为以下简单代码，辅助函数较早定义并在colab中可用。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


def rag(prompt: str, k: int = 5, threshold: float = 0.9, verbose: bool = False) -> str:
    # 规范化提示
    prompt = prompt.lower().strip()

    # 询问Gemma角色是谁
    char_prompt = f"In the following sentence: '{prompt}' who is the subject? reply only with name."
    if verbose:
        print(f"Char prompt: {char_prompt}")
    character = generate(char_prompt)
    if verbose:
        print(f"Character: '{character}'")

    # 查找角色
    data = lookup(character, k=k, threshold=threshold, verbose=verbose)

    # 增强提示
    # 将提示中的名称替换为rag中的名称
    prompt = prompt.replace(character.lower().strip(), data[0]['Name'].lower().strip())

    aug_prompt = f"Using the following data: {data} answer the following question: '{prompt}'. Don't mention your sources - just the answer."

    if verbose:
        print(f"Augmented prompt: {aug_prompt}")
    response = generate(aug_prompt)

    return response
rag(questions[-1]['q'], verbose=True)

RAG答案

让我们看看我们的RAG在行动中（抱歉，应该是in action），并与之前获得的直接生成答案进行比较。

以下是我们从Gemma通过检索获得的答案：

Which School is Harry Potter part of?? (type: basic fact)
Direct Answer: Hogwarts School of Witchcraft and Wizardry is the school that Harry Potter attends…
RAG answer: Harry Potter is part of Hogwarts - Gryffindor.
这里RAG答案更精确，它添加了格兰芬多，这是预期的答案。答案也更简洁和切中要点。

Who is ermionne?? (type: typo)
Direct Answer: Ermionne is a French fashion designer known for her colorful and playful designs, primarily focused ..
RAG answer: Hermione Granger is a resourceful, principled, and brilliant witch known for her academic prowess an
使用RAG信息，Gemma推断出这是关于哈利波特中的赫敏，并正确回答。

What is Aberforth job?? (type: harder fact)
Direct Answer: Aberforth is a fictional character in the Harry Potter series of books and films. He does not have a..
RAG answer: Aberforth was a barman.
LLM能够利用检索到的信息来纠正其知识不足，这是RAG的基本优势。这种技术使我们能够用个性化数据丰富Gemma的推理能力，以提供更准确和有用的答案。

What is dubldore job?? (type: harder fact and typo)
Direct Answer: Dublador is a voice actor who provides voices for characters in animated films, television shows..
RAG answer: Headmaster at Hogwarts School
再次，RAG确实有助于提高答案的上下文和质量。RETSim嵌入的拼写错误弹性工作完美，使其成为构建RAG管道以查找记录并使用它们回答充满拼写错误的用户查询时与LLMs的完美匹配。

Which school is Nympadora from?? (type: hard fact)
Direct Answer: Nympadora is a character from the book series “Harry Potter” and did not attend any school. She is a..
RAG answer: Hogwarts - Hufflepuff
凭借其检索到的数据，Gemma现在成为回答关于魔法世界问题的大师，尽管其参数规模小于服务器端LLMs！

结论

总之，这篇短文希望清楚地强调了为什么检索增强技术在创建AI代理时至关重要，特别是使用对世界了解较少的较小模型的本地设备代理。它还突出了为什么RETSim专注于近重复匹配和速度，而不是更传统的语义文本相似性嵌入，使其对RAG非常有用，并希望激励您在自己的管道中使用它。我们期待看到您将使用Gemma、Ollama、Usearch和RETSim构建什么——请通过社交媒体+1我们保持联系。

感谢您阅读这篇文章直到最后！如果您觉得这篇文章有用，请花点时间与可能受益的人分享。

要在我下一篇文章上线时收到通知，请在Twitter、Facebook或LinkedIn上关注我。您还可以通过订阅邮件列表或通过RSS直接将完整文章发送到您的收件箱。

基于Gemma、Ollama、USearch和RETSim的本地设备问答机器人实现

本文详细介绍了如何利用Gemma本地大语言模型、Ollama推理引擎、RETSim近重复文本嵌入和USearch检索系统，构建一个能够处理拼写错误的哈利波特角色问答机器人，展示了完整的RAG技术实现方案。