Feature Request: Add JSON Support #1816
Replies: 12 comments
-
Interested to learn more about this as I was planning to insert JSON. How does it behave currently when you pass a JSON file in? |
Beta Was this translation helpful? Give feedback.
-
I plan to use this for storing JSON-formatted data, but I haven’t extracted the semantic data from my dataset yet. Once I complete that, I’ll provide more detailed information. |
Beta Was this translation helpful? Give feedback.
-
I tried uploading a somewhat large Json Structure, it was somewhat okay, but wasn't great. I think the lack of sentences and fluff didn't allow the LightRAG processing calls to understand the context of the information. E.g. the nesting of objects should almost be the edges in the graph (at least that's how I see it in my data structure), but it failed to output it like that. |
Beta Was this translation helpful? Give feedback.
-
I anticipated this scenario, as LightRAG is primarily designed for natural language processing and storage. My plan is to refactor the JSON data into a complete sentence for storage, rather than directly inputting structured data. |
Beta Was this translation helpful? Give feedback.
-
If you don't mind sharing what you end up doing, I'd love to see what you found was the best method. I had brainstormed doing a synthesize pass to create those single fact sentences ("owner has fact"). But wondered if it would be best practice to pass one sentence in at a time or could I pass the entire converted json-as-sentence text in, and how that would link to the embedded unstructured data chunks. |
Beta Was this translation helpful? Give feedback.
-
I plan to implement this solution and will evaluate its outcomes in the next phase. |
Beta Was this translation helpful? Give feedback.
-
Thank you for sharing your thoughts! I'd be happy to share my approach and findings so far, though I am still working through some challenges. Here's an outline of what I have done and the issues I encountered: Currently, I have designed a template to convert my custom pydantic.BaseModel into a narrative string for insertion into the LightRAG framework. This approach is intended to facilitate entity extraction and allow for structured data to be processed more effectively. While the initial step of converting the model to a narrative string was successful, I noticed that the extracted entities were incomplete and did not meet my expectations. As a result, I have raised an issue to seek further insights from the community: #749 Below is the template I used to convert a CVE descriptor model into a paragraph: def convert_model_to_string(model: RAGModel) -> str:
"""
Convert a RAGModel object to a string for insertion into the LightRAG.
"""
return f"""
{model.cve_meta.cve_number} is a vulnerability with title {model.cve_meta.title}. There are several functions related to this vulnerability:
{chr(10).join([f"{func.function_name}: {func.general_purpose}" for func in model.desc.funcs_desc.functional_desc])}
{model.cve_meta.cve_number} is related to the following CWEs:{model.cve_meta.weaknesses}
{model.cve_meta.cve_number} is caused by the following flaws:{model.desc.sec_desc.description}: {model.desc.sec_desc.vulnerability_cause_details}
{model.cve_meta.cve_number} is fixed by deploying the following patch methods:{model.desc.sec_desc.patch_details}
"""
# chr(10) is the \n to bypass the SyntaxError: f-string expression part cannot include a backslash While the narrative string generation works as intended, the entity extraction results are not complete. For example, key terms such as enhance_image, channel_map, and PixelInfo were not extracted as entities, which diminishes the utility of the knowledge graph. Given this, I am uncertain if the issue lies in the configuration of the RAG pipeline, the entity recognition logic, or something else. Questions I Am Exploring:
I hope this provides some clarity on my approach. I would greatly appreciate any insights or recommendations from collaborators to address these challenges. If you'd like, I can share more details as I iterate further on this and work together. |
Beta Was this translation helpful? Give feedback.
-
Thank you for this. That's an interesting approach on the narrative string, have you played with how an LLM could generate that dynamically by passing in the model_dump method on the BaseModel class? My data structure is still going to change often, so I want to avoid manually declaring it like that.
I found similar results with LightRAG. Have you tried passing the same input into an alternative like GraphRAG (it has had a lot of optimizing updates since LightRAG's release)?
Two questions I would also like to understand. I think the current approach in LightRAG works well for large bodies of text, like the "A Christmas Story" book, but when I'm passing in highly tailored data incrementally, would I be better off just getting an LLM to generate Graph queries without the RAG index pipeline? Maybe an external (from LightRAG) evaluator or guardrail would work better, but that doesn't seem like the most efficient approach. |
Beta Was this translation helpful? Give feedback.
-
That's an interesting point. In my initial implementation, I manually created a template and populated it with the relevant fields. If your data models are highly dynamic, leveraging an LLM could indeed be a practical solution. Specifically, you could send a request to the LLM to generate a narrative paragraph based on the
I’ll take another look at the latest version of GraphRAG. I did try it previously, but the results didn’t quite meet my expectations, which is why I shifted to this approach 😊. That said, I’ll revisit GraphRAG to see if the updates address the earlier issues.
That's a valid idea. It might be worth exploring whether LLM-generated Graph queries can work effectively without relying on the RAG index pipeline. However, before diving into this, I’d like to confirm with the LightRAG collaborators whether my results are expected and correct. If not, I’ll consider alternative methods or approaches to refine the workflow. |
Beta Was this translation helpful? Give feedback.
-
I also have a much of techincal cybersecurity data i need to process and i have then in json strctured format.. as stated on #962
Did you made some progress on GraphRAG ? i visited it months ago but it was buggy and it seems to only works with text-story corpus (tailored to process the book and if you put any other thing else.. it breaks).. |
Beta Was this translation helpful? Give feedback.
-
If we change the "entity_extraction" prompt from prompts.py.. to reflect that he is being feed a structured json structure works .. ? something like : source json data
and modify the entity extraction to something like """You will be given a json structure in wich the "text" key contains the relevant document to where u will extract the entites listed on the "entities" key.. """"that will work ? |
Beta Was this translation helpful? Give feedback.
-
Unfortunately, it still remains on my to-do list. I will explore an alternative solution in a few days and maybe share some views then. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently, the project supports plaintext well for embedding and retrieval. However, adding JSON support would significantly enhance its usability by enabling users to integrate the RAG framework more seamlessly into modern workflows where JSON is the standard format for data interchange. Will LightRAG add support for structured data like JSON?
Beta Was this translation helpful? Give feedback.
All reactions