DD2476-Search-Engines-and-Information-Retrieval-Systems

Individual assignments for the course Search Engines and Information Retrieval Systems at KTH Royal Institute of technology

Information Retrieval (IR) is finding material (usually documents) of an unstructured nature that satisfies an information need from within large collections (usually stored on computers).

Note: Provided code for running the engine has not been added because of privacy policy. Files in the assignments contains the parts that have been asked to be implemented. Also implementation for Persistent hashed index has not been uploaded, although all query tests have been executed using Persistent hashed index.

The dataset that was used was davisWiki. Also, a small version of this has been used for testing reasons. Finally, it has been used my favorite Monte-Carlo method to approximate the pageranks of the full Swedish Wikipedia link structure.

Dataset folders have not been uploaded.

Assignment 1: Boolean Retrieval

The purpose of Assignment 1 is to learn how to build an inverted index. You will learn 1) how build a basic inverted index; 2) how to handle multiword queries; 3) how to handle phrase queries; 4) how to evaluate a search system; and 5) techniques for handling large indexes. In realistic applications, we of course cannot index the whole document collection every time we start the search engine. Moreover, the complete index would be too large to fit in working memory. So, we implement the index by means of a persistent hash table on disk. Indexing the davisWiki corpus does not take more than 3 minutes. Search is immediate (definitely less than 0.1s) for any search query. So the implementation of a persistent hash table works fine. Exercise 1.1-1.7 have been implemented

Assignment 2: Ranked Retrieval

The purpose of Assignment 2 is to learn how to implement ranked retrieval. You will learn 1) how to include tf_idf scores in the inverted index; 2) how to handle ranked retrieval from multiword queries; 3) how to use PageRank to score documents; and 4) how to combine tf_idf and PageRank scoring Implementation of Monte-Carlo PageRank Approximation. Testing and comparing the 5 approximations (results are presented in excel file). A record of the experimentation has being shown with the four method variants and their N parameter settings for the linksDavis.txt graph

Assignment 3: Relevance Feedback and Tolerant Retrieval

The purpose of Assignment 3 is to learn about ways to get more powerful representations of query and documents. You will learn 1) how to use relevance feedback to improve the query representation; 2) why query expansion is an alternative to relevance feedback; 3) how to build k-gram index; 4) how to perform tolerant retrieval with wildcard queries Exercises 3.1-3.4 have been implemented

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assignment_1		assignment_1
assignment_2		assignment_2
assignment_3		assignment_3
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DD2476-Search-Engines-and-Information-Retrieval-Systems

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Xenia-Io/Search-Engines-and-Information-Retrieval-Systems

Folders and files

Latest commit

History

Repository files navigation

DD2476-Search-Engines-and-Information-Retrieval-Systems

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages