Large Scale Data Processing Recommendations

Hello maintainers @kartikpersistent @jexp @prakriti-solankey, 

Firstly thank you for building this product. I am running the llm-graph-builder locally on one of my servers and i am having issues scaling up my current deployment to more than 150 research papers. 

After 150 papers, the `/sources_list` api takes a lot of time (approx 5mins) to give the response and get the initial list loading on the UI. And then the extraction runs for indefinite period if a new file is provided(even for small files like that are 50kb). 

My configuration is as follows: 
- Setup: Docker setup 
- LLM: Llama-4-scout (deployed in another server with 2 x NVIDIA H100 ) - Average inference  time is 83 seconds per token
- LLM-graph Builder server config
    Model name:             AMD Ryzen Threadripper PRO 7985WX 64-Cores
      CPU family:           	    25
      Model:                        24
      Thread(s) per core:   2
      Core(s) per socket:   64
      Socket(s):                  1
      Stepping:                   1
      Frequency boost:      enabled
      CPU max MHz:          8240.6250
      CPU min MHz:           1500.0000
      BogoMIPS:                 6390.47

- Neo4j Database 
    Community Edition Instance deployed on
    Model name:             AMD EPYC-Milan Processor
        CPU family:           25
        Model:                   1
        Thread(s) per core:   1
        Core(s) per socket:   1
        Socket(s):            8
        Stepping:             1
        BogoMIPS:             3992.49

I want to scale the system to nearly 10k papers and want to hear any tips and tricks from you all. Please help me get through this issue.   

Thank you.

Best.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large Scale Data Processing Recommendations #1346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large Scale Data Processing Recommendations #1346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions