Skip to content

Suggester module in Solr builds up the dictionary based on sentences and not on words #2

@innovationchef

Description

@innovationchef

I added the following XML to the solrconfig.xml file for implementing suggester module.

 <searchComponent name="suggest" class="solr.SuggestComponent">
      <lst name="suggester">
        <str name="name">mySuggester</str>
        <str name="lookupImpl">FuzzyLookupFactory</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">BioChemEntity.name</str>
        <str name="suggestAnalyzerFieldType">string</str>
      </lst>
    </searchComponent>
    <requestHandler name="/suggest" class="solr.SearchHandler"
                    startup="lazy" >
      <lst name="defaults">
        <str name="suggest">true</str>
        <str name="suggest.count">10</str>
      </lst>
      <arr name="components">
        <str>suggest</str>
      </arr>
    </requestHandler>

This would capture the BioChemEntity.name entries and populate the suggester dictionary that is used internally by Solr to provide suggestions.

The suggestions were captured by the following -
http://localhost:8983/solr/solr_core_name/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=sam
to get suggestions for the word starting with 'sam'. However, it does not provide suggestions like - "sample", "sample SAMD00007" which are indexed in the core field "name" and has been used for building up the dictionary. I could figure out the following issues here -

  1. The current Suggester module is case-sensitive. The BioChemEntity.name = "Sample 1" cannot come up in the suggestions if the suggestions query is starting with small letters- "sam".
  2. The current Suggester module is using the whole phrase present in BioChemEntity.name field to build up the internal dictionary. So, if we indexed BioChemEntity.name=Source GSM00089 and passed a query to provide suggestions for "GSM", it would not return any results as the word GSM00089 is the second word of the entry and the suggester module starts looking from the first character of the first word.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions