Skip to content
This repository was archived by the owner on Aug 16, 2020. It is now read-only.

Conversation

@katta
Copy link

@katta katta commented Jun 24, 2013

  • Made it compatible with ES v0.20.6
  • Running indexing process in a background thread

@karussell
Copy link
Owner

Cool, thanks!

Won't the CPU hit 100% if there is no sleep in the loop? katta@db0dc31#L2R62

And shouldn't this background thing be done via a elasticsearch feature like the river API?

http://www.elasticsearch.org/guide/reference/river/
http://blog.trifork.com/2013/01/10/how-to-write-an-elasticsearch-river-plugin/

Or why did you decide to do it this way?

The river is also stoppable: https://github.com/elasticsearch/elasticsearch-river-twitter/blob/master/src/main/java/org/elasticsearch/river/twitter/TwitterRiver.java#L389

@katta
Copy link
Author

katta commented Jun 24, 2013

Yes CPU hits 100% but not sure if sleep is a good way to control it. We
should think of a better way to control, hadn't thought about it. Thanks
for pointing out.

Regarding doing it on river, my understanding of river is that : It is used
in scenarios where there is a continuous stream of data its working on
periodically and not for one time tasks. I see this reindexing as on time
task (correct me if my understanding is not right), which is why I felt
river is not suitable here. Even if we do use river we have to explicitly
make sure the job is run in background in RiverModule just like the way we
have done it. River just gives an additional option of re triggering @
periodic intervals.

-katta

On Mon, Jun 24, 2013 at 4:11 PM, Peter notifications@github.com wrote:

Cool, thanks!

Won't the CPU hit 100% if there is no sleep in the loop? katta/elasticsearch-reindex@db0dc31
#L2R62katta@db0dc31#L2R62

And shouldn't this background thing be done via a elasticsearch feature
like the river API? http://www.elasticsearch.org/guide/reference/river/Or why did you decide to do it this way?


Reply to this email directly or view it on GitHubhttps://github.com//pull/8#issuecomment-19899830
.

@karussell
Copy link
Owner

ok, thanks for clarification. But this reindexing thread should be somehow stopable - isn't that also possible via the river API or at least a bit easier?

Yes CPU hits 100% but not sure if sleep is a good way to control it.

yes, or some wait+notify mechanism (or the lock.await stuff)

@katta
Copy link
Author

katta commented Jun 24, 2013

<<ok, thanks for clarification. But this reindexing thread should be
somehow stopable - isn't that also possible via the river API or at least a
bit easier?>>

You are right, with river we can control it better. And I might be wrong in
saying we have to handle making it background explicitly. Will give it a
shot with river and see how it goes.

On Mon, Jun 24, 2013 at 5:17 PM, Peter notifications@github.com wrote:

ok, thanks for clarification. But this reindexing thread should be somehow
stopable - isn't that also possible via the river API or at least a bit
easier?

Yes CPU hits 100% but not sure if sleep is a good way to control it.

yes, or some wait+notify mechanism (or the lock.await stuff)


Reply to this email directly or view it on GitHubhttps://github.com//pull/8#issuecomment-19902160
.

@karussell
Copy link
Owner

That would be nice!

@shadow000fire
Copy link

Hey guys, I was just about to code a reindex plugin when I saw this. Personally I like the endpoint better than a river for this. I agree with @katta on what rivers are for. Not sure how that would make it better. Using an endpoint would let's you specify a query on the fly to select which documents to reindex. The same endpoint can be used to pause and resume a background thread, or return the current status if reindexing is currently in progress. I see river as an always on thing.

Also, specifying X docs per second to ingest might be a good way to control CPU.

@karussell
Copy link
Owner

Thanks! Did you tried @katta's solution?

@JoeZ99
Copy link

JoeZ99 commented Feb 18, 2014

Hello,
is somthing wrong with this PR ,, so it isn't merged?
I also think this is not a "river" thing, but an endpoint one.
I don't know if this is non-orthodox, but couldn't we , besides specifying entrypoints for status checking, stopping and resuming, add a "callback" in the form of specifying a url for the cluster to send a post request or something once the task is done??

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants