Skip to content

JavaScript highlighting bug in regexp with quotes #2145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mc-butler opened this issue Apr 17, 2010 · 7 comments
Closed

JavaScript highlighting bug in regexp with quotes #2145

mc-butler opened this issue Apr 17, 2010 · 7 comments
Labels
area: mcedit mcedit, the built-in text editor prio: low Minor problem or easily worked around

Comments

@mc-butler
Copy link

Important

This issue was migrated from Trac:

Origin https://midnight-commander.org/ticket/2145
Reporter baryluk (witold.baryluk@….com)

Consider this simple example file.js:

function f(a) {
   var r = /'/;
  return r.test(a);
}

This is correct javascript code which search for occurrence of ' or returns null if there is no such.

mcedit incorrectly highlights ' (green) in the /'/, and assumes it is string which continues beyond end of regexp (including comments, and probably to the end of file).

http://smp.if.uj.edu.pl/~baryluk/mcedit-js-highlight-bug/Def_030.png

All examples here: http://smp.if.uj.edu.pl/~baryluk/mcedit-js-highlight-bug/

Thanks.

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Apr 18, 2010 at 6:15 UTC (comment 1)

  • Component changed from mc-config-ini to mcedit
  • Version changed from version not selected to master

@mc-butler
Copy link
Author

Changed by zaytsev (@zyv) on Apr 18, 2010 at 12:47 UTC (comment 2)

Duplicate: #2142 ?

@mc-butler
Copy link
Author

Changed by baryluk (witold.baryluk@….com) on Apr 18, 2010 at 17:09 UTC (comment 3)

I don't think it is duplicate.

It is different syntax highlighting file and language.
And different problem. It looks that current JS highlighting rules doesn't
include case for regexps at all. Finding regular expression isn't hard,
scan non-comments and non-strings for /\/.+(?<!\\)\//[igmy]* regexp.

As i understand problem is that / can be both division operator and beginig of the regexp. Simples difference is that before / in regexp we need to have: { or ( or [ or , or ; or = or other oprator (+ - / & && || | ^ ~). But befor division operator we need to have ) or digit or letter or dot. (We inore of course whitespaces in the match)

.+ becuase 'var a = //;' is illegal as this is empty regexp (would always match), and actually it is a commened ';' string

@mc-butler
Copy link
Author

Changed by baryluk (witold.baryluk@….com) on Apr 18, 2010 at 17:47 UTC (comment 4)

Regexp should be

/\/.+(?<!\\)\/[igmy]*/

With negative lookbehind and ignore space, to distinguish from the division operator followed by comment:

/(?<=[{(\[,:=\-+\/&|~])[\s\n]*(\/.+(?<!\\)\/[igmy])*/

After further refinmentens (like allowing / in after [ (character class), and then disallowing after [ if it was escaped \[.

/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(\\\/|[^\[]|(\[.*\/\])|\\\[)+?\/[igmy]*)/

this will match to

var a = /"d"/;

but not to

var a = x / y; // 

PS. Some regexp engine have limited lookbehind capabilities, but this shouldn't be probleme here, as we are only matching only single char before possible regexp (ignoring whitespaces).

But perl gives me correctly some comments (which should are processed on its own, so will not be matched in real life:

cat *.js | perl -n -e 'if (/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(\\\/|[^\[]|(\[.*\/\])|\\\[)+?\/[igmy]*)/mg) {print $1,"\n";}'

output

/^0+/
/Apple.*Mobile.*Safari/
/^\/
/^\[object\s(.*)\]$/
/\/\/
/\s+/g
/^\s+/
/::/
/_/
/"/g
/'/g
/\\./g
/&/g
/&lt;/g
/(^|.|\r|\n)(#\{(.*?)\})/
/\s+/
/Konqueror|Safari|KHTML/
/Gecko\/(\d{4})/
/^\s*(text|application)\/(x-)?(java|ecma)script(;.*)?\s*$/i
/^\s*https?:\/\/[^\/]*/
/\S/
/opacity:\s*(\d?\.?\d*)/
/alpha\(opacity=(.*)\)/
/rv:1\.8\.0/
/^(?:object|applet|embed)$/i
/\S/
/\S/
/\S/
/\S/
/^(\d+)$/
/"/g
/^\s*~\s*/
/^\s*>\s*/
/^\s*\+\s*/
/^\s/
/^\d+$/
/^(?:button|reset|submit)$/i
/\s/
/\s/
/\s+/
/ 2/
/213/
/213/
/213/
/213/
/213/
/213/
/\/asd\/asd/
/[a/]*/
/\[a\/\]*/
...

looks very well. "..." have some false positives, but it is ONLY becuase there was comments there of the form:

sadasdasd = asdsad ; // cadasdsad http://google.com.

@mc-butler
Copy link
Author

Changed by baryluk (witold.baryluk@….com) on Apr 18, 2010 at 17:51 UTC (comment 5)

Changing not needed groups to be noncapturing (?: ...):

/(?<=[{(\[,:=\-+\/&|~!])[\s\n]*(\/(?:\\\/|[^\[]|(?:\[.*\/\])|\\\[)+?\/[igmy]*)/

brings slight more efficient matching and gives exactly same matche.

Is it possible to use this somehow in the syntax files? Or something more simple and not based on the regexp need to be used?

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Nov 1, 2011 at 6:03 UTC (comment 6)

  • Branch state set to no branch
  • Milestone changed from 4.7 to Future Releases

@mc-butler
Copy link
Author

Changed by andrew_b (@aborodin) on Jul 21, 2012 at 5:47 UTC

  • Blocked by set to #2142

@zyv zyv changed the title JavaScript highlighting bug in regexp. JavaScript highlighting bug in regexp with quotes May 27, 2025
@zyv zyv closed this as completed May 27, 2025
@zyv zyv removed this from the Future Releases milestone May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: mcedit mcedit, the built-in text editor prio: low Minor problem or easily worked around
Development

No branches or pull requests

2 participants