Context snippet returns first occurance even if the word is appearing as a substring

You have a small bug in NYT-first-said.parsers.simple_scrape.context: if the word appears as a substring of a word before appearing on its own, the context snippet returns the first occurrence of that word and not the standalone word.

This bug manifests itself if there's a new word that appears plural first (with an s at the end) and then singular, the snippet will always return the context of the plural (since str.find() returns the index of the first occurrence). See: https://twitter.com/NYT_first_said/status/1135591139413778433

One possible fix would be to find the shortest word (token) in the article that contains the new word and use that to determine the snippet:
```python
def context(content, word):
    tokens_containing_word = []
    tokens = content.split()
    for token in tokens:
        if word in token:
            tokens_containing_word.append(token)
    # you also might want to write a custom key function here that calculates length after 
    # removing punctuation, otherwise "crocodyliforms" is the same length as "crocodyliform."
    context_token = min(tokens_containing_word, key=lambda x: len(x))
    loc = content.find(context_token)
    # existing logic proceeds...
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Context snippet returns first occurance even if the word is appearing as a substring #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Context snippet returns first occurance even if the word is appearing as a substring #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions