-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Description
Why is it that such a simple task can go wrong? The first time I ran the program, the extraction was correct, but after running it several more times, there were chaotic outputs. I'm using the locally deployed Ollama model qwen7:b. Additionally, the Chinese characters in the generated visualization results are garbled. How can I solve these issues?
code such as
import langextract as lx
import textwrap
# 1. 定义「提取任务」:告诉模型要提取什么、怎么提取(简化版本)
prompt = "从文本中提取人名"
# 2. 提供「示例」:给模型参考,让提取更精准(简化示例)
examples = [
lx.data.ExampleData(
text="张三是项目经理",
extractions=[
lx.data.Extraction(
extraction_class="personName",
extraction_text="张三"
)
]
)
]
# 3. 准备「输入文本」:你要提取信息的内容(简化文本)
input_text = "李四负责技术开发,李明负责项目管理"
# 4. 运行提取(使用本地 Ollama 模型)
try:
result = lx.extract(
text_or_documents=input_text,
prompt_description=prompt,
examples=examples,
language_model_type=lx.inference.OllamaLanguageModel,
model_id="qwen:7b", # 使用你刚启动的本地模型
model_url="http://localhost:11434",
fence_output=False,
use_schema_constraints=False
)
print("✅ 模型连接成功!")
except Exception as e:
print(f"❌ 连接失败: {e}")
print("请确保 Ollama 服务正在运行")
exit(1)
# 5. 查看结果
print("=== 提取结果 ===")
for extraction in result.extractions:
print(f"类别: {extraction.extraction_class}")
print(f"文本: {extraction.extraction_text}")
print(f"属性: {extraction.attributes}")
print("-" * 30)
# 6. 可视化结果
print("\n🎨 生成可视化...")
# 步骤1:保存提取结果为 JSONL 文件
print("📁 保存提取结果到文件...")
lx.io.save_annotated_documents(
[result], # 提取结果列表
output_name="extraction_results.jsonl", # 输出文件名
output_dir="./temp" # 输出目录(temp目录)
)
print("✅ 已保存到: temp/extraction_results.jsonl")
# 步骤2:生成 HTML 可视化
print("🌐 生成 HTML 可视化...")
html_content = lx.visualize("temp/extraction_results.jsonl") # 从文件生成可视化
with open("temp/visualization.html", "w", encoding="utf-8") as f:
f.write(html_content)
print("✅ 已生成: temp/visualization.html")
print("\n🎉 可视化完成!")
print("📂 生成的文件:")
print(" - temp/extraction_results.jsonl (提取结果数据)")
print(" - temp/visualization.html (可视化网页)")
print("\n💡 打开 temp/visualization.html 查看可视化结果")
Metadata
Metadata
Assignees
Labels
No labels