A simple yet powerful Search Engine built in Java that indexes text and PDF documents, allowing users to search through them quickly and efficiently. This project demonstrates the power of Object-Oriented Programming (OOP) principles and Data Structures to solve real-world problems, offering features like auto-completion, relevance ranking, and pagination.
- Document Indexing: Index text and PDF documents placed in the
db/
folder. - Relevance Ranking: Search results are ranked by relevance to the user query.
- Auto-Completion: Provides suggestions as you type, based on indexed keywords.
- Spell Checker: Suggests corrections for common spelling mistakes in search queries.
- Pagination: Search results are paginated for better user experience with large data sets.
- Real-Time Folder Monitoring: Automatically detects new files added to the
db/
folder and reindexes them. - Error Handling: Custom exceptions for file processing errors and invalid queries.
- Command-Line Interface (CLI): User interacts with the search engine through the terminal.
Before running the project, ensure that the following dependencies are installed:
- Java 11+ (JDK)
- Apache PDFBox (for PDF processing)
- VSCode (for development, with the Java Extension Pack)
git clone https://github.com/Ludis-et/search-engine.git
cd search-engine
In order to work with additional libraries like Apache PDFBox (used for PDF text extraction), you need to download the JAR files and include them in your project.
- Download Apache PDFBox from the official site: PDFBox Download.
- Place the downloaded
.jar
files into thelib/
folder inside the project directory. Create thelib/
folder if it does not exist. - Ensure that the
classpath
is set to include these JAR files when compiling and running the project. In VSCode, this can be done by adding them to thelaunch.json
configuration or manually specifying them in the terminal using the-cp
flag.
For example, if using the terminal, you might run:
javac -cp ".:lib/pdfbox-app2.0.27.jar" -d bin src/com/searchengine/*.java
java -cp ".:bin:lib/pdfbox-app2.0.27.jar" com.searchengine.Main
To compile and run the project:
- Build the Project: Open the terminal in VSCode or your preferred terminal and navigate to the project directory. Run the following command to compile the Java files:
javac -d bin src/com/searchengine/*.java src/com/searchengine/engine/*.java src/com/searchengine/models/*.java src/com/searchengine/processors/*.java src/com/searchengine/utils/*.java src/com/searchengine/exceptions/*.java
- Run the Project: After the project is compiled, you can run it using the java command. Ensure you're in the project directory and run the following:
java -cp ".:bin:lib/*" com.searchengine.Main
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! If you'd like to contribute, please fork the repository, create a new branch, and submit a pull request. All contributions must adhere to the code of conduct and coding style of the project.