Skip to content
slevithan edited this page Sep 3, 2014 · 195 revisions

See version history for released versions.

XRegExp 3.0.0-pre: completed changes

The following changes have been completed and are available via the latest build here on GitHub:

  • New feature: Unicode addons: Added the new flag A (A for "astral"), which opts-in to full 21-bit Unicode support for \p and \P (i.e., code points up to U+10FFFF, rather than the default upper limit of U+FFFF). This extended support applies to all Unicode categories, scripts, blocks, and properties that include astral code points. Flag A also enables support for scripts and blocks that include only astral code points, and disables the use of \p and \P within character classes (a descriptive error is thrown). This is because JavaScript character classes can match code units only, not code points. In astral mode, use e.g. (\p{L}|[0-9_]) rather than [\p{L}0-9_]. Thanks to Mathias Bynens for providing the supporting data. (#25, #29, #33)
  • New feature: Added feature 'astral' to XRegExp.install/uninstall/isInstalled. Running XRegExp.install('astral') implicitly sets the new flag A for all regexes. (#28)
  • New feature: Added function XRegExp.replaceEach for batch replacements. (#20)
  • New feature: Added function XRegExp.match, as a re-implementation of String.prototype.match that gives you the result types you actually want, lets you override flag g and ignore lastIndex, and fixes browser bugs. (#32)
  • New feature: XRegExp-All: Can now be loaded as a RequireJS AMD module that creates no globals. (#38)
  • New feature: XRegExp-All: Can now be imported in Node.js using require('xregexp'), as an alternative to the CommonJS-compatible require('xregexp').XRegExp.
  • New feature: Unicode addons: Single-letter Unicode categories can now be referenced without curly brackets, e.g., \pL or \PL. (#15)
  • New feature: Added the reparse option to the XRegExp.addToken options object, which allows simplified syntax and flag tokens. (#18)
  • New feature: Added function XRegExp.cache.flush for improved control during performance testing.
  • Fine tuning: Major performance optimization for creating regexes. (Pattern caching, #24, etc.)
  • Fine tuning: Major performance optimization for copying regexes with XRegExp and XRegExp.globalize. (Native constructor, #24, etc.)
  • Fine tuning: Major performance optimization for XRegExp.exec/test/replace/forEach/split. (#23, etc.)
  • Fine tuning: Unicode addons: Updated Unicode data from version 6.1.0 to 7.0.0. [Mathias Bynens] (#39, #73)
  • Fine tuning: Unicode addons: Moved character data for Unicode category L (aka Letter) from Unicode Base to Unicode Categories.
  • Fine tuning: Unicode addons: Changed the format for providing Unicode data, via XRegExp.addUnicodeData. (#29)
  • Fine tuning: Replaced XRegExp.addToken's trigger and customFlags options with new flag and optionalFlags options that provide easier-to-use replacements for the same functionality.
  • Fine tuning: Removed the this.hasFlag function previously available within token definition functions. In its place, you can use the new third argument (flags) that is passed into token handler functions.
  • Fine tuning: Using the same name for multiple named capturing groups in a single regex is now a SyntaxError. (#22)
  • Fine tuning: Removed the 'all' shortcut used by XRegExp.install/uninstall. (#27)
  • Fine tuning: XRegExp now overwrites itself when loaded twice, rather than silently skipping the script. (#17)
  • Fine tuning: Removed 'extensibility' as an installable/uninstallable option, since it did little more than add hassle for addons. The functionality it provided is now always available.
  • Fix: XRegExp.matchRecursive addon: When given valueNames, the value and name properties in results were reversed. (#26)
  • Fix: XRegExp.build addon: A trailing unescaped $ in subpattern definitions was stripped even when a leading ^ wasn't present. (#35)
  • Fix: A ReferenceError is now thrown when accessing an unknown backreference via XRegExp.matchChain.
  • Fix: A TypeError is now thrown when XRegExp.replace/split are given a null or undefined subject string, and when String.prototype.replace/split/match are called on null or undefined context when overridden by XRegExp.install('natives') and in ES5 strict mode.
  • Fix: A SyntaxError is now thrown if the reserved words length or __proto__ are used as capture names.
  • Fix: A quantifier now applies to the preceding token when separated by a combination of both inline comments and free-spacing.
  • Fix: In edge cases, XRegExp.cache didn't treat a forward slash in the list of flags as an error.
  • Other: Harmonized the version numbers for XRegExp and its official addons.
  • Other: Removed the BackCompat and Prototypes addons.
  • Other: Converted all QUnit unit tests to Jasmine, and added several thousand new tests.
  • Other: Added performance tests in tests/perf/.
  • Other: Added a .editorconfig file for consistent coding styles between different editors. [Mathias Bynens]

Potential changes

The following changes are being considered for v3.0.0, but I haven't had time to work on them yet. Feedback or help is welcome. Some of these ideas might end up being dropped, pushed back to later versions, or significantly adapted prior to release.

  • Change the XRegExp.matchRecursive addon to allow unbalanced delimiters in the target string. Add an option called requireBalance to its options object, that when set to true lets you revert to the handling where scanning past an unbalanced delimiter throws an error.
  • Add a way to make XRegExp.union use an empty string joiner/separator. See #43.
  • Allow an options object as the first nonrequired argument for XRegExp.exec/test/replace/match/forEach, and allow an options object for each XRegExp.replaceEach replacement array. This will replace or overload/supersede the following optional arguments: pos and sticky of XRegExp.exec/test, scope of XRegExp.replace/replaceEach/match, and context of XRegExp.forEach.
    • Since XRegExp.forEach's current context argument is currently allowed to be a plain object, a fully backward compatible argument overload cannot be achieved. However, the common case of using an array as the context argument can be supported. Alternatively, the options object could be the fifth argument and not overload context.
    • Since XRegExp.replace's current scope argument is infrequently used (it's more common to perform the same mode switching via flag g on the provided regex), it might make sense to not overload the scope argument but instead just replace it (scope will still be settable as a property of the the options object). Since XRegExp.replaceEach/match are new in v3.0.0, there is no backward compatibility concern if they support scope via an options object only.
  • Add support for pos and sticky to XRegExp.match/forEach, via their options objects.
  • Add support for pos, sticky, and scope to the XRegExp.matchRecursive addon, via its options object. (XRegExp.matchRecursive already supports scope via flag g, and sticky via flag y.)
  • Stop providing the regex being used to iterate over the string as the fourth argument to XRegExp.forEach callback functions. This is rarely useful, and feels a bit dirty/risky (even though XRegExp.forEach avoids letting any mutation of the regex alter its operation).
  • Change the {{name}} syntax used by the XRegExp.build addon to %{name}, and change ({{name}}) to %(name).
    • The current syntax can be maintained for backward compatibility, if users of this addon provide feedback in favor of doing so.
    • Note that the current XRegExp.build allows {{1}} but not ({{1}}). The latter is blocked by the base library because it doesn't allow bare integers as capture names. When implementing the new syntax, %{1} should become illegal, both because of syntax ambiguity with things like %{1}{2}, and because %(1) wouldn't be allowed anyway.
  • Consider letting the XRegExp.build addon use /[\s\S]*/ (which matches anything and everything) as the default value for named subpatterns when definitions are not provided. This would allow, e.g., XRegExp.build('%(year)/%(month)/%(day)').exec(str).
  • Consider adding $<n> and $<name> as alternative backreference syntax for ${n} and ${name} in replacement text, so they can be used in ES6 template strings. This would be backward compatible, since $< is currently an error in XRegExp replacement text.
  • Add a names property to regexes built or copied by XRegExp, but possibly only if they actually use named capture. <xregexp>.names will be an array of strings holding the names of named capturing groups used by the regex. See #45.

Uncompleted planned changes might also be listed in the issue tracker.

Share your ideas

Open a new issue here on GitHub, or fork the repo and start hacking.

Clone this wiki locally