Skip to content

rascmatt/regex-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

regex-gen

A Kotlin (Java-compatible) library for generating random strings that match a given regular expression of exact length N, sampled uniformly from all valid strings.

Unlike existing solutions such as Generex, RgxGen, or RegexpGen, this library guarantees:

  • The generated string always has the requested length
  • Sampling is uniform across all valid strings of that length
  • If no such string exists, it reports this deterministically

Other libraries either fail to reliably hit specified length constraints, do not support length constraints at all, or only support a limited subset of regular expression syntax.

regex-gen precomputes possible lengths and their frequencies, allowing to generate matching strings of a specific lenght consistently.


Example

import Generator.Companion.generate
import Parser.Companion.parse

fun main() {
    val pattern = "(\\w|-|\\.)+@(\\w|-|\\.)+(\\w|-){2,4}".parse(20)

    repeat(10) {
        println(pattern.generate(20))
    }
}

parse(bound) analyzes the regex and precomputes all reachable lengths up to the given bound.
generate(n) produces a uniformly random string of length n, or throws if none exists.
generate(from, to) produces a uniformly random string with a length between from and to, inclusive, or throws if no such length is reachable.
generate() without parameters picks a random reachable length below the bound.


How it works

The parser parses the regex into a syntax tree and tracks, for each subexpression, how many strings can be formed for each possible length. This information is used to sample choices (alternations, repetitions, etc.) in proportion to the number of valid strings they can generate.

Precomputation time grows polynomially with both regex complexity and the target length bound. Once a pattern is parsed, generation runs in time proportional to the structure of the regex and the length of the generated string, not to the precomputation bound.


License

MIT License - see LICENSE.


About

Generate random strings from a regular expression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages