Ignore Scopes in word count [including proof of concept]

Resulting from a discussion with @X-Raym in the recently accepted PR #94 I had a bit of a thought on why it should not be possible to ignore _scopes_ rather than re-inventing regexes that are implemented as grammars already.

Taking [tokenised lines](https://discuss.atom.io/t/access-to-tokenized-contents-of-texteditor/24015/4) is theoretically possible. Together with a [scope selector](https://github.com/atom/first-mate) it is possible to filter (positively or negatively) all text elements that match certain scopes.
This would further allow to ignore punctuation (much requested although now dealt with in a new word count regex), different grammars, etc (see e.g. #55 and #65 ).

See the following minimal example to filter out all scopes that are comments, quotes, or punctuation in their respective grammar. It should be copyable to the console of atom 1:1.
```javascript
let scopes = ['comment.*', 'quote.*', 'punctuation.*'];
let editor = atom.workspace.getActiveTextEditor();
let scopeselector = require('first-mate').ScopeSelector;

let buffer = [];
editor.displayBuffer.tokenizedBuffer.tokenizedLines.forEach(line => line.tokens.forEach(token => buffer.push(token)));

scopes.forEach(scope => {
    let selector = new scopeselector(scope);
    buffer = buffer.filter(token => !selector.matches(token.scopes));
});

let text = buffer.map(token => token.value).join('');
console.log(text.match(/\S+/g).length);
```

At the moment it reduces all lines into a single buffer to make looping easier. this destroys line breaks, and may thus not desirable, but I thought as a proof of concept this would be sufficient for now. This does change the final word count a little bit though. — It should be no difficulty to instead loop through the tokens and filter those that do not match the ignored selectors.

Maybe instead of the many current settings, one could have one text box where all to-be-ignored scopes are listed, and then a little pop-up menu on right-click on the word count (similar to the one for the minimap) would allow to activate/deactivate certain scopes in filtering.
This would reduce bulk in the settings (see discussions elsewhere in this package).

Disclaimer:
(A) The tokenised lines are not documented and thus [subject to change without further notice](https://discuss.atom.io/t/access-to-tokenized-contents-of-texteditor/24015/2), although the current selector seems to have been stable for at least two years.
(B) I do not know how much the speed of calculation would suffer from this way of filtering (compared to regex, and compared to not filtering).

Looking forward to any discussions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ignore Scopes in word count [including proof of concept] #99

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Ignore Scopes in word count [including proof of concept] #99

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions