-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Important Note
Look at Getting Started before asking questions.
Background
Similar to other professions, biologists need fast and reliable ways of summarizing massive amounts of research content found in literature. AI agents that produce detailed, factual, and unbiased research reports with citations are highly important. Tools like OpenAI Deep Research are steps in this direction along with open-source tools like the GPT-Researcher (https://github.com/assafelovic/gpt-researcher). One particular common type of report seeks to understand the biological networks involved in particular diseases or other processes. Communities like DiseaseMaps (https://disease-maps.io/projects/) seek to do this at the scale of diseases like Parkinson's disease which include both written material (https://pmc.ncbi.nlm.nih.gov/articles/PMC4153395/) and interactive diagrams (https://pdmap.uni.lu) that often become the source for mathematical models.
Goal
The goal of the project develop additional features or fine-tuning capabilities for gpt-researcher to support the automatic generation of reports for the curation of disease maps. The Related Links has a list of potential resources, including examples of existing reports and automatically generated reports as well as resources (e.g., EMMAA) that remain inaccessible to current state of the art tools like Deep Research.
Getting Started
- Read the entirety of this Project Description.
- Look at Getting Started, Goal and Related Links (especially Deep Research versus Human).
- Compare GPT-Researcher and Deep Research in versus the human reports (Related Links).
- Write a feasible proposal draft that outlines at some of the differences you see between automated and human reports and presents a plan to address this subset of differences.
Important: Avoid asking the generic questions such as how to start or if you can work on this project; just write a draft proposal if interested; draft feedback can be provided. Also, good proposals tend to have basic demonstrative code or point to specific code that will be enhanced/modified.
Difficulty Level: Medium
This is a more open-ended project; it will require a proposal that is focused enough to be achievable during the time frame.
Size and Length of Project
- medium: 175 hours
- 12 weeks
Skills
- Essential skills: Python
- Nice to have skills: Javascript
Related Links
- https://github.com/assafelovic/gpt-researcher
- Collections of automatically extracted interactions for diseases (including lung cancer): https://emmaa.indra.bio/
- Deep Research versus Human: Deep Research: https://drive.google.com/file/d/19z-PIz3JQLlOlv1wj-tFTgtK0px2-rqg/view?usp=sharing Human: https://pmc.ncbi.nlm.nih.gov/articles/PMC3898398/
- Example Reports:
- https://disease-maps.io/projects/
- Lung Cancer Report (try to replicate this): https://link.springer.com/article/10.1007/s00335-025-10110-6
- SIRT1/PARP1: https://pmc.ncbi.nlm.nih.gov/articles/PMC3898398/
Potential Mentors
Anbumathi Palanisamy
Augustin Luna