Replies: 1 comment
-
This looks like a bug. I have opened #93 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm doing a Resume Parsing project using GateNLP. I have several Gazetteer lists to match and it works well for 2-3 pages resumes. However, when the resume is very long I ecounter the following "IndexError" from TokenGazetteer function. Any help or suggestion would be highly appreciated.
2021-04-11 11:32:36,944 [MainThread ] [WARNI] Failed to see startup log message; retrying...
2021-04-11 11:32:36,944|WARNING|tika.tika|Failed to see startup log message; retrying...
2021-04-11 11:32:51,999|DEBUG|urllib3.connectionpool|Starting new HTTP connection (1): localhost:9998
2021-04-11 11:32:52,569|DEBUG|urllib3.connectionpool|http://localhost:9998 "PUT /rmeta/xml HTTP/1.1" 200 None
Trying to start GATE Worker on port=25335 host=127.0.0.1 log=false keep=false
PythonWorkerRunner.java: starting server with 25335/127.0.0.1/sNHip6pVztTKavzaS2-W6TtM8dg/false
Trying to start GATE Worker on port=25335 host=127.0.0.1 log=false keep=false
PythonWorkerRunner.java: starting server with 25335/127.0.0.1/PmBj0aQy1AQk12LoPnfVjy9NIh8/false
2021-04-11 11:33:04,221|INFO|gatenlp.processing.gazetteer|Reading list file data\certification.lst
2021-04-11 11:33:04,270|INFO|gatenlp.processing.gazetteer|Reading list file data\education.lst
2021-04-11 11:33:04,309|INFO|gatenlp.processing.gazetteer|Reading list file data\jobs.lst
IndexError Traceback (most recent call last)
in
8 doc2 = Annie(doc1)
9 properdoc = ProperDoc(doc1)
---> 10 gazdoc = GazDet(properdoc)
11 for ann in gazdoc.annset("Resume"):
12 doc2.annset("Resume").add_ann(ann)
in GazDet(doc)
5 for typ in details:
6 tgaz = TokenGazetteer("data/" + typ + ".def", fmt="gate-def", annset="", outset="Resume", outtype=typ)
----> 7 gazdoc = tgaz(doc)
8 return gazdoc
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in call(self, doc, annset, tokentype, septype, splittype, withintype, all, skip)
697 for segment_start, segment_end in segment_offs:
698 tokens = list(anns.within(segment_start, segment_end))
--> 699 for matches in self.find_all(tokens, doc=doc):
700 for match in matches:
701 starttoken = tokens[match.start]
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find_all(self, tokens, doc, all, skip, fromidx, toidx, endidx, matchfunc)
617 idx = fromidx
618 while idx <= toidx:
--> 619 matches, maxlen, idx = self.find(
620 tokens,
621 doc=doc,
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find(self, tokens, doc, all, fromidx, toidx, endidx, matchfunc)
550 endidx = len(tokens)
551 while idx <= toidx:
--> 552 matches, long = self.match(
553 tokens, idx=idx, doc=doc, all=all, endidx=endidx, matchfunc=matchfunc
554 )
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in match(self, tokens, doc, all, idx, endidx, matchfunc)
454 while j <= endidx:
455 if node.nodes:
--> 456 token = tokens[j]
457 if token.type == self.splittype:
458 break
IndexError: list index out of range
Beta Was this translation helpful? Give feedback.
All reactions