feat: Enhance academic and medical research capabilities #1406

techycardiac · 2025-05-22T01:53:54Z

This commit introduces several improvements to focus my research on academic and valid medical literature, prioritize high-impact sources, and speed up the retrieval process for such queries.

Key changes:

Retriever Output Standardization:
- All sources I consult now include a retriever_name field in their output dictionaries.
- SemanticScholarSearch now includes citation_count, venue, and year.
- PubMedCentralSearch now includes journal_title (when available).
Improved Metadata Pipeline:
- I've modified my approach to ensure that rich metadata from sources (including retriever_name, citation_count, etc.) is preserved and combined with scraped web content (raw_content).
- This consolidated list of structured dictionaries is now correctly passed for further curation.
Enhanced Curation Prompt:
- The way I curate sources has been significantly updated.
- I now explicitly:
  - Prioritize sources from "semantic_scholar", "pubmed_central", and "arxiv".
  - Utilize citation_count from "semantic_scholar" for ranking.
  - Filter out non-academic/medical content.
  - Consider journal quality and relevance to your query.
Academic Search Focus Configuration:
- I now leverage the existing (and now verified) FOCUS_ACADEMIC_MEDICAL_SOURCES configuration flag.
- When True, this flag ensures that I only use academic-focused sources (semantic_scholar, pubmed_central, arxiv), improving relevance and speed for such queries.
Speed Enhancements:
- The primary speed improvements stem from my focused source selection and more effective curation, reducing the amount of data I need to process.
- My existing scraping mechanism was confirmed to be asynchronous and efficient.

These changes collectively enable me to perform more targeted and higher-quality research for academic and medical topics, aligning with your requirements.

This commit introduces several improvements to focus my research on academic and valid medical literature, prioritize high-impact sources, and speed up the retrieval process for such queries. Key changes: 1. **Retriever Output Standardization:** * All sources I consult now include a `retriever_name` field in their output dictionaries. * `SemanticScholarSearch` now includes `citation_count`, `venue`, and `year`. * `PubMedCentralSearch` now includes `journal_title` (when available). 2. **Improved Metadata Pipeline:** * I've modified my approach to ensure that rich metadata from sources (including `retriever_name`, `citation_count`, etc.) is preserved and combined with scraped web content (`raw_content`). * This consolidated list of structured dictionaries is now correctly passed for further curation. 3. **Enhanced Curation Prompt:** * The way I curate sources has been significantly updated. * I now explicitly: * Prioritize sources from "semantic_scholar", "pubmed_central", and "arxiv". * Utilize `citation_count` from "semantic_scholar" for ranking. * Filter out non-academic/medical content. * Consider journal quality and relevance to your query. 4. **Academic Search Focus Configuration:** * I now leverage the existing (and now verified) `FOCUS_ACADEMIC_MEDICAL_SOURCES` configuration flag. * When `True`, this flag ensures that I only use academic-focused sources (`semantic_scholar`, `pubmed_central`, `arxiv`), improving relevance and speed for such queries. 5. **Speed Enhancements:** * The primary speed improvements stem from my focused source selection and more effective curation, reducing the amount of data I need to process. * My existing scraping mechanism was confirmed to be asynchronous and efficient. These changes collectively enable me to perform more targeted and higher-quality research for academic and medical topics, aligning with your requirements.

assafelovic · 2025-05-31T05:44:58Z

@techycardiac this is truly great! Have you fully tested other retrievers and general experience with this addition? Happy to know how and where to help test this before we merge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Enhance academic and medical research capabilities #1406

feat: Enhance academic and medical research capabilities #1406

Uh oh!

techycardiac commented May 22, 2025

Uh oh!

assafelovic commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Enhance academic and medical research capabilities #1406

Are you sure you want to change the base?

feat: Enhance academic and medical research capabilities #1406

Uh oh!

Conversation

techycardiac commented May 22, 2025

Uh oh!

assafelovic commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants