-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
Description
It's not an unglueing feature but more about aggregating by pattern.
Say I have some file names , like those but in big numbers and with more patterns:
c(
"John report January.doc",
"Brian report March.doc",
"Summary 2018.xls",
"Summary 2017.xls",
"unstructured isolated file name.doc")
in order to count or to aggregate, it would be nice to be able to give as input the patterns
"{name} doc {month}.doc"
and "Summary {year}.doc"
, and get as an output :
c(
"{name} doc {month}.doc",
"{name} doc {month}.doc",
"Summary {year}.doc",
"Summary {year}.doc",
"unstructured isolated file name.doc")
Maybe the default should be to output :
c(
"{name} doc {month}.doc",
"{name} doc {month}.doc",
"Summary {year}.doc",
"Summary {year}.doc",
NA)
And then it's an option to keep original string if unmatched ?
No real good name idea...
Maybe something like unglue_simplify()
, unglue_generalize()
, unglue_to_pattern()
?