Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 17, 2025

This PR ports functional regex tests from the RE2 test suite to improve .NET's regex test coverage, as requested in #120756.

Changes

Test Suite Additions

  • Added RegexRe2Tests.cs: New test file containing 85 unique test cases ported from RE2's re2_test.cc and search_test.cc
  • Test coverage: The ported tests execute ~340 times across all available regex engines (Interpreter, Compiled, NonBacktracking, SourceGenerated)
  • Removed duplicates: Analyzed overlap with existing PCRE, Rust, and core regex tests; removed 57 duplicative test cases to avoid redundancy

Test Categories Covered

The ported tests validate:

  • Complex matching patterns and alternations
  • Anchors (^, $) in single-line and multiline modes with non-trivial cases
  • Word boundaries (\b, \B) with ASCII and special characters
  • UTF-8/Unicode character handling
  • Escape sequences (octal \141, hexadecimal \x61, unicode \u0061)
  • Case-insensitive matching ((?i))
  • Non-trivial quantifier combinations ({n}, {n,}, {n,m})
  • Backreferences (excluded for NonBacktracking engine)
  • Edge cases and historical bug patterns from RE2

Compatibility Adjustments

Several RE2-specific patterns were excluded or adapted for .NET compatibility:

  • Removed \C patterns: RE2's byte-matching construct is not supported in .NET
  • Removed \Q...\E patterns: Quote meta syntax is not supported in .NET
  • Converted \x{...} escapes: Changed to .NET-compatible \x (2 digits) or \u (4 digits) format
  • Excluded backreferences for NonBacktracking: Uses RegexHelpers.IsNonBacktracking() check
  • Removed Unicode word boundary test: .NET treats Unicode letters as word characters (unlike RE2/PCRE which use ASCII-only)
  • Removed duplicative tests: Excluded basic patterns like "a", "a*", "a+", "a?", "^$" that are already well-covered in existing test files

License Attribution

  • Updated THIRD-PARTY-NOTICES.TXT: Added RE2 BSD 3-Clause license to properly attribute the ported tests
  • Updated .csproj: Added RegexRe2Tests.cs to the compilation

Testing

All 29,621 tests pass successfully, including the newly ported RE2 tests across all regex engines.

Fixes #120756

Original prompt

This section details on the original issue you should resolve

<issue_title>Port re2 regex tests</issue_title>
<issue_description>re2 has lots of regex functional tests we should port over to improve our coverage:
https://github.com/google/re2/blob/d826d9fcb68c62996c1b7c0a45d604e22d814952/re2/testing/</issue_description>

<agent_instructions>You can see in https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests we have a file of tests ported from PCRE and a file of tests ported from Rust. Please similarly port the cited RE2 tests (or a newer version if there is one); any that don't pass but should need to be marked as [ActiveIssue(...)]... any that don't pass but that's expected should be removed.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@ Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions See info in [area-owners.md](https://github.com/dotnet/runtime/blob/main/docs/area-owners.md) if you want to be subscribed.
Issue Details

re2 has lots of regex functional tests we should port over to improve our coverage:
https://github.com/google/re2/blob/d826d9fcb68c62996c1b7c0a45d604e22d814952/re2/testing/

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions

Milestone: .NET 7.0
@joperezr This will likely miss ZBB but will still make it before 7.0. Adjusting the milestone just because this is not technically a blocker for 7.0.

Fixes #61896


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits October 17, 2025 13:52
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Port re2 regex tests to improve coverage Port RE2 regex tests to improve test coverage Oct 17, 2025
Copilot finished work on behalf of stephentoub October 17, 2025 14:10
@Copilot Copilot AI requested a review from stephentoub October 17, 2025 14:10
@stephentoub stephentoub marked this pull request as ready for review October 17, 2025 14:14
@Copilot Copilot AI review requested due to automatic review settings October 17, 2025 14:14
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Ports a subset of RE2's functional regex tests into the .NET test suite to expand coverage across multiple regex engines.

  • Adds 142 test cases derived from RE2's re2_test.cc and search_test.cc.
  • Updates test project file to include the new test file and adds RE2's BSD 3-Clause license to third-party notices.
  • Exercises diverse regex features (anchors, word boundaries, quantifiers, escapes, Unicode) across all available engines.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
THIRD-PARTY-NOTICES.TXT Adds RE2 BSD 3-Clause license attribution for the ported tests.
System.Text.RegularExpressions.Tests.csproj Includes the new RegexRe2Tests.cs file in the test compilation.
RegexRe2Tests.cs Introduces the ported RE2-derived test cases and supporting data generation logic.

Removed 57 test cases that were already covered by existing tests in PCRE, Rust, or core regex tests:
- Basic patterns like "a", "a*", "a+", "a?" with simple inputs
- Simple anchor tests "^$", "^", "$" that duplicate existing coverage
- Trivial quantifier tests already well-tested elsewhere
- Basic UTF-8 tests with single character patterns

Retained 85 unique test cases that add value:
- Complex patterns and edge cases
- Word boundary tests with various scenarios
- Multiline mode tests
- Case-insensitive matching
- Octal/hexadecimal escapes
- Non-trivial quantifier combinations

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot finished work on behalf of stephentoub October 17, 2025 14:49
@Copilot Copilot AI requested a review from stephentoub October 17, 2025 14:49
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port re2 regex tests

3 participants