-
-
Notifications
You must be signed in to change notification settings - Fork 598
Improve score by supporting extra_phrase
for extra words in rules
#4432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Improve score by supporting extra_phrase
for extra words in rules
#4432
Conversation
6b7da16
to
8fb9f47
Compare
ac99199
to
9b50bff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alok1304! Looking much better
See comments for your consideration. I've updated your PR description to mention that this is a follow up PR, since there is important context and reviews in the previous PR, we need to preserve this as required.
""" | ||
Return True if any of the matches in ``license_matches`` List of LicenseMatch | ||
has extra words are in the correct place. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check both a bit explicitly:
- For all the matches which have extra words, they are in correct location
- For all the matches which does not have extra words, they are correct detections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add a test accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add a test accordingly
where I should add a test and like how I implement for all license_matches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also i added test for is_extra_words_position_valid
see: https://github.com/aboutcode-org/scancode-toolkit/pull/4432/files#diff-d1520ccce311f8f4d4932ba68589bc23b098f8090a307b0b440edb4a846ae21cR1325
b9c7c16
to
a50b3db
Compare
I addet test for |
8a25b51
to
43c6bdb
Compare
…_log` Add test for is correct position of `extra-words` according to `extra-phrases` that is present in rules. if we find `extra-words` are in the right place then we set score to `100`. And also show in `detection_log` why we increasing the score to keep track of this. Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Add new phrases like `extra_phrase` this is special for extra-words. This phrase is represented in the format [[n]], where n indicates the maximum number of extra-words allowed at that position in the rule. If extra-words appear at the correct position and their count does not exceed the allowed limit `n`, then the score is increased to `100`. Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
due to `extra_phrase` in rules, this shows that rules containing `extra-words` Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
add a new `extra-phrase` for a rule i.e bsd-new Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
b38f718
to
61284f4
Compare
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
…rds` Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
Actually, the previous test case is failing because of
because Now, all the test cases are passed :) |
Signed-off-by: Alok Kumar <alokkumarjipura9973@gmail.com>
@@ -1087,7 +1095,7 @@ def is_correct_detection(license_matches): | |||
] | |||
|
|||
return ( | |||
all(matcher in ("1-hash", "1-spdx-id", "2-aho") for matcher in matchers) | |||
all(matcher in ("1-hash", "1-spdx-id", "2-aho", "3-seq") for matcher in matchers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i add 3-seq
because of extra-phrase
is removed from rules while loading, after that when we scan same rules, now matcher is 3-seq
not any one of these "1-hash", "1-spdx-id", "2-aho"
this is due to extra-phrase
marker that are present in rules. That's why this gives 3-seq
. We already loaded the rules where extra-phrase
is removed, but not in the actual rule file.
Follow up of:
extra_phrase
for extra words in rules #4424Add new phrases like
extra_phrase
this is special forextra-words
. This phrase is represented in the format[[n]]
, where n indicates the maximum number ofextra-words
allowed at that position in the rule.If
extra-words
appear at the correct position and their count does not exceed the allowed limitn
, then the score is increased to100
.Reference #4420
Tasks
Run tests locally to check for errors.
Signed-off-by: Alok Kumar alokkumarjipura9973@gmail.com