This Flask application, also available at lszoszk.pythonanywhere.com, is designed to perform in-depth analysis and search through a collection of the General Comments/Recommendations adopted by the UN Treaty Bodies. It offers functionalities such as keyword searching, concerned groups filtering, analysis of collocations and export search results to Excel. 🇺🇳 🔍📊📄
The app processes JSON data, enabling users to search through the General Comments/Recommendations (paragraph-level search) based on keywords, concerned groups/persons labels, and Treaty Bodies. It features an advanced text analysis pipeline using NLTK for tokenization, term frequencies, bigram extraction, and custom stopwords processing. The application also provide a search-within-search functionality, which allows for a more advanced filtering of search results.
- Python 3.6+
- Flask
- Pandas
- NLTK
- BeautifulSoup
GC-info.json
file for the app's document metadata
- Clone the repository:
git clone [URL of this repository]
- Navigate to the project directory:
cd [project_name]
- Install the required Python packages:
pip install -r requirements.txt
- Run the Flask application:
python app.py
- Access the application through a web browser at
localhost:5000
.
- Advanced Search 🔍: A robust search functionality that allows users to filter relevant paragraphs from the documents based on keyword, concerned groups/persons (e.g., children, women, indigenous peoples), and by the UN Treaty Bodies (e.g., Committee on the Rights of the Child, Committee on Economic, Social and Cultural Rights).
- Text Analysis 📊: Text processing capabilities, leveraging the NLTK for word frequencies, bigram analysis, custom UN-related stopwords list, and search within search results functionality.
- Custom Labels and Stopwords 🏷️: Ability to define and use custom labels (e.g., concerned groups, human rights issues) and custom stopwords for text analysis.
- Interactive Results 💡: Highlights search terms and displays results interactively.
- Data Export 📁: Export search results to Excel format for further analysis.
Main page with search functionality.
Search results. You can visit the source document (OHCHR website) and copy it to a clipboard with automatically generated references.
Analytical dashboard. Insert a query in "Narrow your search" to run an additional, dynamic search within your search results.
If you encounter any issues, please check if all dependencies are correctly installed and the GC-info.json
file is properly formatted and located in the root directory of the project.
E-mail: l.szoszkiewicz@amu.edu.pl
E-mail: zuzkow4@st.amu.edu.pl
-
0.1. Initial Release (8 January 2024)
- Includes General Comments adopted by the Committee on the Rights of the Child and the Committee on Economic, Social and Cultural Rights.
-
1.0. Full Release (31 January 2025)
- Incorporates General Comments from all treaty bodies.
- Enhancements and updates based on feedback from the initial release.
This project is licensed under the MIT License - see the LICENSE.md file for details