Skip to content

Commit cb40371

Browse files
committed
Improve copyright detection
Handle more edge cases Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
1 parent 79fc8ee commit cb40371

File tree

10 files changed

+25
-16
lines changed

10 files changed

+25
-16
lines changed

src/cluecode/copyrights.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2335,6 +2335,9 @@ def build_detection_from_node(
23352335
# created by Jason Hunter <jhunter AT jdom DOT org>
23362336
EMAIL: {<EMAIL_START> <AT> <NN|NNP> <DOT> <NN|NNP> } # email_start
23372337
2338+
# Copyright (c) 2001 Karl Garrison (karl AT indy.rr.com)
2339+
EMAIL: {<NN|NNP> <AT> <URL> } # email_at
2340+
23382341
EMAIL: { <NN> <CC> <NN> <DOT> <NN> } # foo at bat dot com
23392342
23402343
# foo@bar.com or baz@bar.com

src/licensedcode/data/rules/jdom_2.RULE

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
license_expression: jdom
33
is_license_text: yes
44
ignorable_authors:
5-
- Jason Hunter
5+
- Jason Hunter jhunter AT jdom DOT org and Brett McLaughlin brett AT jdom DOT org
66
- the JDOM Project (http://www.jdom.org/)
77
ignorable_urls:
88
- http://www.jdom.org/
@@ -51,4 +51,4 @@ ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
5151
This software consists of voluntary contributions made by many individuals on
5252
behalf of the JDOM Project and was originally created by Jason Hunter <jhunter
5353
AT jdom DOT org> and Brett McLaughlin <brett AT jdom DOT org>. For more
54-
information on the JDOM Project, please see <http://www.jdom.org/>.
54+
information on the JDOM Project, please see <http://www.jdom.org/>.

src/textcode/markup.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1918,7 +1918,6 @@ def strip_known_markup_from_text(text):
19181918
"<url>",
19191919
"</url>",
19201920
"<date-of-document>",
1921-
19221921
# common XML namespaces
19231922
"http://www.w3.org/1998/math/mathml",
19241923
"http://www.w3.org/1999/xhtml",
@@ -1937,7 +1936,11 @@ def strip_known_markup_from_text(text):
19371936
"@end",
19381937
"@group",
19391938
"mailto:",
1940-
"xmlns",
1939+
"xmlns=",
1940+
"xmlns:",
1941+
"xml:",
1942+
"lang=",
1943+
"<windows"
19411944
)
19421945

19431946

tests/cluecode/data/copyright_fossology/testdata118.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,6 @@ holders:
3939
- Guy Eric Schalnat, Group 42, Inc.
4040
- Jean-loup Gailly and Mark Adler
4141
authors:
42+
authors:
43+
- Masayuki Hatta (mhatta)
4244
- artofcode LLC. http://artofcode.com

tests/cluecode/data/copyright_fossology/testdata119.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,6 @@ holders:
3939
- Guy Eric Schalnat, Group 42, Inc.
4040
- Jean-loup Gailly and Mark Adler
4141
authors:
42+
authors:
43+
- Masayuki Hatta (mhatta)
4244
- artofcode LLC. http://artofcode.com

tests/cluecode/data/copyright_fossology/testdata127.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@ copyrights:
77
holders:
88
- Valek Filippov
99
authors:
10+
- Masayuki Hatta (mhatta)
1011
- URW++ Design & Development GmbH Valek Filippov <frob@df.ru>

tests/cluecode/data/copyright_fossology/testdata128.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,4 +39,5 @@ holders:
3939
- Guy Eric Schalnat, Group 42, Inc.
4040
- Jean-loup Gailly and Mark Adler
4141
authors:
42+
- Masayuki Hatta (mhatta)
4243
- artofcode LLC. http://artofcode.com

tests/cluecode/data/copyright_fossology/testdata19.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,14 @@ copyrights:
99
holders:
1010
- Karl Garrison
1111
- Martin R.
12-
- Karl Garrison karl AT
12+
- Karl Garrison
1313
authors:
1414
- firstname Karl Garrison
1515
- Martin R. Jones
16-
- Martin R. Jones xmlns:doc http://nwalsh.com/xsl/documentation/1.0
17-
- Jean-Claude Dumas xmlns:doc http://nwalsh.com/xsl/documentation/1.0
18-
- Jean-Claude surname Dumas xmlns:doc http://nwalsh.com/xsl/documentation/1.0
16+
- Martin R. Jones
17+
- Jean-Claude Dumas
18+
- Jean-Claude surname Dumas
1919
- Masayuki Koba
20-
- Frank Pieczynski xmlns:doc http://nwalsh.com/xsl/documentation/1.0
21-
- Bartosz Trudnowski for his wife. xmlns:doc http://nwalsh.com/xsl/documentation/1.0
20+
- Frank Pieczynski
21+
- Bartosz Trudnowski
2222
- Daniel Pfeiffer

tests/cluecode/data/copyrights/abiword_common.copyright.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ copyrights:
88
- Copyright (c) 2009 Patrik Fimml <patrik@fimml.at>
99
holders:
1010
- AbiSource, Inc. & Co.
11-
- Masayuki Hatta (mhatta)
11+
- Masayuki Hatta
1212
- Patrik Fimml
1313
holders_summary:
1414
- value: AbiSource, Inc. & Co.
1515
count: 1
16-
- value: Masayuki Hatta (mhatta)
16+
- value: Masayuki Hatta
1717
count: 1
1818
- value: Patrik Fimml
1919
count: 1

tests/cluecode/data/copyrights/misco4/to_improve/junk-copyright-333.txt.yml

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,4 @@ what:
22
- copyrights
33
- holders
44
- authors
5-
copyrights:
6-
- (c) AT r'b
7-
holders:
8-
- r'b
5+

0 commit comments

Comments
 (0)