Look at replacing most of crawler with an external crawling package

This is a big one, but it's possible that most of this crawler should be replaced with Apache Nutch or similar.  I originally hacked this out as a proof-of-concept but as usual, it grew a bit from there.  However, now meeting scalability issues (parallel crawling, possibly on multiple machines, crawling to a large database, etc.) that we need to take a serious use at a well-established alternative like Nutch.

Some questions

* Is Nutch suitable? If so, 1.x or 2.x?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Look at replacing most of crawler with an external crawling package #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Look at replacing most of crawler with an external crawling package #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions