Skip to content

Tokenizer doesn't parse Volume/issue typeset with no space. #212

@EmmanuelCharpentier

Description

@EmmanuelCharpentier

Germane to #23 : when given such a reference :

1.	Felson DT. Epidemiology of hip and knee osteoarthritis. Epidemiol Rev. 1988;10:1‑28. 

the current parser tokenizes 1988;10:1‑28 as a whole and assigns it to Volume/Issue. It should be approximately

Token Value
Year 1998
Volume 10
Pages 1-28

Worse case :

2.	Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al. Biomechanical considerations in the pathogenesis of osteoarthritis of the knee. Knee Surg Sports Traumatol Arthrosc. mars 2012;20(3):423‑35. 

Is parsed as :

Token Value
Citation number 2
[ Author Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al.
Title Biomechanical considerations in the pathogenesis of osteoarthritis of the knee
Journal Knee Surg Sports Traumatol Arthrosc mars
Date 2012
Volume/Issue 20(3):423‑35

Again, the whoele Volume/issue token isn't parsed for punctuation. I would expect :

Token Value
Citation number 2
[ Author Heijink A, Gomoll AH, Madry H, Drobnič M, Filardo G, Espregueira-Mendes J, et al.
Title Biomechanical considerations in the pathogenesis of osteoarthritis of the knee mars
Journal Knee Surg Sports Traumatol Arthrosc
Date mars 2012
Volume/Issue 20(3)
Pages 423‑35

Recognizing mars 2012 is probably harder...

HTH,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions