Reindexing is done in background #8

katta · 2013-06-24T10:14:00Z

Made it compatible with ES v0.20.6
Running indexing process in a background thread

karussell · 2013-06-24T10:41:49Z

Cool, thanks!

Won't the CPU hit 100% if there is no sleep in the loop? katta@db0dc31#L2R62

And shouldn't this background thing be done via a elasticsearch feature like the river API?

http://www.elasticsearch.org/guide/reference/river/
http://blog.trifork.com/2013/01/10/how-to-write-an-elasticsearch-river-plugin/

Or why did you decide to do it this way?

The river is also stoppable: https://github.com/elasticsearch/elasticsearch-river-twitter/blob/master/src/main/java/org/elasticsearch/river/twitter/TwitterRiver.java#L389

katta · 2013-06-24T11:36:10Z

Yes CPU hits 100% but not sure if sleep is a good way to control it. We
should think of a better way to control, hadn't thought about it. Thanks
for pointing out.

Regarding doing it on river, my understanding of river is that : It is used
in scenarios where there is a continuous stream of data its working on
periodically and not for one time tasks. I see this reindexing as on time
task (correct me if my understanding is not right), which is why I felt
river is not suitable here. Even if we do use river we have to explicitly
make sure the job is run in background in RiverModule just like the way we
have done it. River just gives an additional option of re triggering @
periodic intervals.

-katta

On Mon, Jun 24, 2013 at 4:11 PM, Peter notifications@github.com wrote:

Cool, thanks!

Won't the CPU hit 100% if there is no sleep in the loop? katta/elasticsearch-reindex@db0dc31
#L2R62katta@db0dc31#L2R62

And shouldn't this background thing be done via a elasticsearch feature
like the river API? http://www.elasticsearch.org/guide/reference/river/Or why did you decide to do it this way?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/8#issuecomment-19899830
.

karussell · 2013-06-24T11:47:23Z

ok, thanks for clarification. But this reindexing thread should be somehow stopable - isn't that also possible via the river API or at least a bit easier?

Yes CPU hits 100% but not sure if sleep is a good way to control it.

yes, or some wait+notify mechanism (or the lock.await stuff)

katta · 2013-06-24T11:51:50Z

<<ok, thanks for clarification. But this reindexing thread should be
somehow stopable - isn't that also possible via the river API or at least a
bit easier?>>

You are right, with river we can control it better. And I might be wrong in
saying we have to handle making it background explicitly. Will give it a
shot with river and see how it goes.

On Mon, Jun 24, 2013 at 5:17 PM, Peter notifications@github.com wrote:

ok, thanks for clarification. But this reindexing thread should be somehow
stopable - isn't that also possible via the river API or at least a bit
easier?

Yes CPU hits 100% but not sure if sleep is a good way to control it.

yes, or some wait+notify mechanism (or the lock.await stuff)

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/8#issuecomment-19902160
.

karussell · 2013-06-24T11:52:59Z

That would be nice!

shadow000fire · 2013-08-21T23:15:17Z

Hey guys, I was just about to code a reindex plugin when I saw this. Personally I like the endpoint better than a river for this. I agree with @katta on what rivers are for. Not sure how that would make it better. Using an endpoint would let's you specify a query on the fly to select which documents to reindex. The same endpoint can be used to pause and resume a background thread, or return the current status if reindexing is currently in progress. I see river as an always on thing.

Also, specifying X docs per second to ingest might be a good way to control CPU.

karussell · 2013-08-23T19:27:05Z

Thanks! Did you tried @katta's solution?

JoeZ99 · 2014-02-18T22:42:27Z

Hello,
is somthing wrong with this PR ,, so it isn't merged?
I also think this is not a "river" thing, but an endpoint one.
I don't know if this is non-orthodox, but couldn't we , besides specifying entrypoints for status checking, stopping and resuming, add a "callback" in the form of specifying a url for the cluster to send a post request or something once the task is done??

katta added 3 commits June 21, 2013 13:33

Compatible with elasticsearch v0.20.6

7c5765b

Reindexing documents in background

db0dc31

Fixed callback for hits while reindexing

dfbac2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reindexing is done in background #8

Reindexing is done in background #8

Uh oh!

katta commented Jun 24, 2013

Uh oh!

karussell commented Jun 24, 2013

Uh oh!

katta commented Jun 24, 2013

Uh oh!

karussell commented Jun 24, 2013

Uh oh!

katta commented Jun 24, 2013

Uh oh!

karussell commented Jun 24, 2013

Uh oh!

shadow000fire commented Aug 21, 2013

Uh oh!

karussell commented Aug 23, 2013

Uh oh!

JoeZ99 commented Feb 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Reindexing is done in background #8

Are you sure you want to change the base?

Reindexing is done in background #8

Uh oh!

Conversation

katta commented Jun 24, 2013

Uh oh!

karussell commented Jun 24, 2013

Uh oh!

katta commented Jun 24, 2013

Uh oh!

karussell commented Jun 24, 2013

Uh oh!

katta commented Jun 24, 2013

Uh oh!

karussell commented Jun 24, 2013

Uh oh!

shadow000fire commented Aug 21, 2013

Uh oh!

karussell commented Aug 23, 2013

Uh oh!

JoeZ99 commented Feb 18, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants