importspacy# 加载spaCy模型nlp=spacy.load("en_core_web_sm")# 将分块组件添加到管道nlp.add_pipe("spacy_chunks",last=True,config={"chunking_method":"sentence","chunk_size":2,"overlap":1,"truncate":True})# 处理文本text="This is the first sentence. This is the second one. And here's the third. The fourth is here. And a fifth."doc=nlp(text)# 输出分块结果print("Chunks:")fori,chunkinenumerate(doc._.chunks,1):print(f"Chunk {i}: {[sent.textforsentinchunk]}")
输出结果:
1
2
3
4
5
Chunks:
Chunk 1: ['This is the first sentence.', 'This is the second one.']
Chunk 2: ['This is the second one.', "And here's the third."]
Chunk 3: ["And here's the third.", 'The fourth is here.']
Chunk 4: ['The fourth is here.', 'And a fifth.']