-
-
Notifications
You must be signed in to change notification settings - Fork 21
[Feature Request] Multi-threading #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was wondering how to implement this, if anyone has the idea and locations I can write and submit a pull request. |
Hi @gprime31, @arthur4ires, I totally agree that the current way is not good enough, I think that what we can do without too much effort and refactoring is to split each domain into a different thread. |
I didn't envision working independently with each domain, I only envisioned launching multiple processes with Python's multiprocessing lib. At line 184, where the loop that will iterate through each line in the text begins. My idea is to transform the variable that opens the file to a list type, a real list in python, and make it global. And the processes will remove the values already used and add them to a new list that will be used to return the clean values. |
Hi @arthur4ires , I actually started working on that a few days ago, and I'm mostly done, just testing and fixing a few bugs that I've found along the way. From my experience, I usually have a long list of URLs, which is combined from multiple domains (and sub-domains ofc), but not a very long list of URLs that are all of the same domain. Therefore I chose to multi-process by base-URLs, and not by splitting the list of URLs into smaller chunks. Maybe the approach I went with isn't the most generic and is making a heavy assumption, but it could be tweaked more in the future if needed. Just some stats until now: Tested locally on my Macbook Pro (16 cores). If you'd like to further discuss it with me, you can PM me on Twitter (@2RS3C). |
The results are actually much better with threads. Did you release this version on any branches here on github? If there was I could help with some bugs or code. If you've already started with this approach I think continuing with it would be more prudent. An interesting point for me is displaying the results on the screen with a verbose option. That way you can know more or less where the script is. I sent an mp on your twitter. |
Uh oh!
There was an error while loading. Please reload this page.
Multi-threading would be great, if it's possible.
Seeing how it only uses one core ATM.
The text was updated successfully, but these errors were encountered: