Skip to content

TimeOut when extracting a large dataset #7

@AnooshaCherukuri

Description

@AnooshaCherukuri

I have a big dataset with 8,00,000 records. When i do the extraction it came with the following error:

[2017-10-10 14:43:08,491: ERROR/MainProcess] Task extractor.extract[401c7ccc-7a3c-455e-a5f4-f23b804ae43d] raised unexpected: SearchIndexError('Solr returned an error: (u"Connection to server 'http://solr_server/solr/ckan/update/?commit=true' timed out: HTTPConnectionPool(host='#########', port=8983): Read timed out. (read timeout=60)",)',)
Traceback (most recent call last):
File "/usr/lib/ckan/default/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/lib/ckan/default/lib/python2.7/site-packages/celery/app/trace.py", line 438, in protected_call
return self.run(*args, **kwargs)
File "/usr/lib/ckan/default/src/ckanext-extractor/ckanext/extractor/tasks.py", line 94, in extract
index_for('package').update_dict(pkg_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 101, in update_dict
self.index_package(pkg_dict, defer_commit)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/search/index.py", line 295, in index_package
raise SearchIndexError(msg)
SearchIndexError: Solr returned an error: (u"Connection to server 'http://XXXXXXXXXXXXXXXXX/solr/ckan/update/?commit=true' timed out: HTTPConnectionPool(host='xxxxxxxxxx', port=8983): Read timed out. (read timeout=60)",)

Did anyone else had same issue or can anyone please let me know how to fix it.
Thanks in Advance.!!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions