Skip to content

How to ensure reliability #95

@hnsqls

Description

@hnsqls

Why is it that such a simple task can go wrong? The first time I ran the program, the extraction was correct, but after running it several more times, there were chaotic outputs. I'm using the locally deployed Ollama model qwen7:b. Additionally, the Chinese characters in the generated visualization results are garbled. How can I solve these issues?

code such as

import langextract as lx
import textwrap


# 1. 定义「提取任务」:告诉模型要提取什么、怎么提取(简化版本)
prompt = "从文本中提取人名"


# 2. 提供「示例」:给模型参考,让提取更精准(简化示例)
examples = [
    lx.data.ExampleData(
        text="张三是项目经理",
        extractions=[
            lx.data.Extraction(
                extraction_class="personName",
                extraction_text="张三"
            )
        ]
    )
]


# 3. 准备「输入文本」:你要提取信息的内容(简化文本)
input_text = "李四负责技术开发,李明负责项目管理"



# 4. 运行提取(使用本地 Ollama 模型)
try:
    result = lx.extract(
        text_or_documents=input_text,
        prompt_description=prompt,
        examples=examples,
        language_model_type=lx.inference.OllamaLanguageModel,
        model_id="qwen:7b",  # 使用你刚启动的本地模型
        model_url="http://localhost:11434",
        fence_output=False,
        use_schema_constraints=False
    )
    
    print("✅ 模型连接成功!")
except Exception as e:
    print(f"❌ 连接失败: {e}")
    print("请确保 Ollama 服务正在运行")
    exit(1)


    
# 5. 查看结果
print("=== 提取结果 ===")
for extraction in result.extractions:
    print(f"类别: {extraction.extraction_class}")
    print(f"文本: {extraction.extraction_text}")
    print(f"属性: {extraction.attributes}")
    print("-" * 30)

# 6. 可视化结果
print("\n🎨 生成可视化...")

# 步骤1:保存提取结果为 JSONL 文件
print("📁 保存提取结果到文件...")
lx.io.save_annotated_documents(
    [result],                           # 提取结果列表
    output_name="extraction_results.jsonl",  # 输出文件名
    output_dir="./temp"                 # 输出目录(temp目录)
)
print("✅ 已保存到: temp/extraction_results.jsonl")

# 步骤2:生成 HTML 可视化
print("🌐 生成 HTML 可视化...")
html_content = lx.visualize("temp/extraction_results.jsonl")  # 从文件生成可视化
with open("temp/visualization.html", "w", encoding="utf-8") as f:
    f.write(html_content)
print("✅ 已生成: temp/visualization.html")

print("\n🎉 可视化完成!")
print("📂 生成的文件:")
print("  - temp/extraction_results.jsonl (提取结果数据)")
print("  - temp/visualization.html (可视化网页)")
print("\n💡 打开 temp/visualization.html 查看可视化结果")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions