Reworking the threading (at least from my last experience the input thread is the bottleneck, not the actual computation)