Skip to content

support redirects / moved permanent links in HTTP client for fulltext documents #20

@guenterh

Description

@guenterh

The current HTTP client implementation for full text documents doesn't support redirects or moved permanent responses from resources.

https://github.com/swissbib/content2SearchDocs/blob/master/src/java/org/swissbib/documentprocessing/plugins/FulltextContentEnrichment.java#L349

compare: http://stackoverflow.com/questions/1884230/java-doesnt-follow-redirect-in-urlconnection

example:
curl -I 'https://www.edubs.ch/unterstuetzung/buecher/bibliothek/angebot/inhaltsverzeichnisse/pzb-d-32-20-9-10-deutsch-an-stationen.pdf'
HTTP/1.1 301 Moved Permanently
Date: Fri, 19 Jun 2015 15:31:40 GMT
Server: Zope/(2.13.21, python 2.7.8, linux2) ZServer/1.1
Strict-Transport-Security: max-age=63072000; includeSubDomains
Content-Length: 15197
Content-Language: de
Expires: Sat, 01 Jan 2000 00:00:00 GMT
X-Ua-Compatible: IE=edge,chrome=1
Content-Type: text/html;charset=utf-8
Location: https://www.edubs.ch/unterstuetzung/bibliothek/bibliothek/angebot/inhaltsverzeichnisse/pzb-d-32-20-9-10-deutsch-an-stationen.pdf
Cache-control: private
Set-Cookie: serverid=p0101; path=/
Vary: Accept-Encoding

The permanently moved document won't be parsed and thereafter used as fulltext search document although the link is configured in our properties file.

ALLOWED.DOCUMENTS=http://www.ub.unibas.ch/tox/IDSLUZ/._?/PDF###http://www.ub.unibas.ch/tox/IDSBB/._?/PDF###http://www.ub.unibas.ch/tox/HBZ/._?/OCR###http://d-nb.info/._?/04###http://aleph.unisg.ch/hsgscan/._?.pdf###http://opac.nebis.ch/objects/pdf/._?.pdf###http://biblio.unizh.ch/objects/pdf/._?.pdf###http://libraries.admin.ch/gw/toc/pdf/._?.pdf###https://www.edubs.ch/unterstuetzung/buecher/.*?.pdf

Example www.swissbib.ch
https://www.swissbib.ch/Record/328655201

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions