performance problem

Hello,
I noticed a performance problem as soon as the schema contains the following structure:

`
...
"anyOf": [
    {"enum": ["aa", "bb", "cc"]},
    {"pattern": "pattern1"},
    {"pattern": "pattern2"},
    {"pattern": "pattern3"},
    ...
]
...
`

The performance can be massively improved by processing the schema beforehand. All enum values and patterns should be combined to a single pattern as shown in the example below:

`
...
"anyOf": [
    {"pattern": "^aa$|^bb$|^cc$|pattern1|pattern2|pattern3"}
]
...
`

Actually, you iteratively append the enum values and regex patterns to a single regex and compute for every iteration the intersection between the current pattern and ".*". This is very expensive and results in bad performance (for this specific kind of schema).

I added an example json file (anyOf.json) that shows the problem. anyOf.json takes on my machine about 50-60 seconds for the result (LHS :< RHS and RHS :< LHS) when checking the file against itself (command `jsonsubschema anyOf.json anyOf.json`). Applying preprocessing, it takes about 0.04 seconds. I also attached a python script (smaller_anyOf.py) that contains the preprocessing. The script combines the string-enum-values and all patterns to a single pattern as shown in the example above. 

[AnyOf.zip](https://github.com/IBM/jsonsubschema/files/4903276/AnyOf.zip)


By transforming the string-enum-values to a regex, special regex characters (e.g. ".", "-", ...) are escaped to get an identical expression as regex. 

`
...
"enum": ["ab-c"]
...
`
will be transformed to
`
...
"pattern": "^ab\\-c$"
...
`

Be careful, this can currently lead to another problem - see #6 .

Best Regards
Michael



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

performance problem #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

performance problem #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions