Always worker #320

nexovec · 2024-04-22T15:27:25Z

nexovec
Apr 22, 2024

Hi, how would you make a worker that always runs in the background exactly once and always retries with minimum delay?

Answered by brandur

Apr 24, 2024

What you could do is combine periodic jobs with unique opts.

Configure a periodic job that inserts every five minutes or so, and set it to RunOnStart so that it always get an insert when a River client starts up.

Then, configure unique options by state [2], where JobStateCompleted is omitted so that if the job was ever finished, a new one gets inserted:

[]rivertype.JobState{
    rivertype.JobStateAvailable,
    rivertype.JobStateRunning,
    rivertype.JobStateRetryable,
    rivertype.JobStateScheduled,
}

The client will start up, insert an initial job, and that job will start work. The periodic job enqueuer will try to insert new versions of it every five minutes, but since the job is alr…

View full answer

brandur · 2024-04-23T00:29:27Z

brandur
Apr 23, 2024
Maintainer

Hi, we'll need more information on this one. If you just want a job worked once, just insert it once and it'll work one time before being set as completed and ignored thereafter.

If you mean "exactly once" in the distributed systems sense, then it's not possible. All semantics in River are at-least-once [1] because guaranteeing any better than is ~impossible without other serious tradeoffs.

See docs here [2] for writing your own custom retry policy. Writing one that retries immediately would look something like:

func (policy *UltraAggressivePolicy) NextAt(job *river.rivertype.JobRow) time.Time {
    return time.Now()
}

[1] https://riverqueue.com/docs/reliable-workers
[2] https://riverqueue.com/docs/job-retries#client-retry-policy

0 replies

nexovec · 2024-04-23T15:02:19Z

nexovec
Apr 23, 2024
Author

Hi again,
what I want is one single instance of a worker that constantly runs in the background. I don't need strong guarrantee exactly one runs, as long as it mostly runs. I do need a strong guarantee that there aren't multiple instances running at any single time.

Here is my attempt at this using unique jobs, it contains some "helpful" comments, which is why I'm not happy with this solution yet.

type DispatchTestAnswersToJobsJobArgs struct {
	UniqueID int32 `json:"unique_id"` // FIXME: possibly a bug where you can't have an empty args struct for unique jobs
}

func (DispatchTestAnswersToJobsJobArgs) Kind() string {
	return "dispatch_test_answers_to_jobs"
}

func (DispatchTestAnswersToJobsJobArgs) InsertOpts() river.InsertOpts {
	return river.InsertOpts{
		// FIXME: MaxAttempts is actually int16 in the database and this is a very bad bug
		// FIXME: jobsnooze overflows the int16, try putting in 32767
		MaxAttempts: 1_000_000_000, //  NOTE: as many as possible ... this is actually negative in this case because of overflow, this is very confusing
		Priority:    1,
		UniqueOpts: river.UniqueOpts{
			ByPeriod: 99 * 365 * 24 * time.Hour, // for 99 years, yes.
		},
	}
}

type DispatchTestAnswersToJobsJobWorker struct {
	river.WorkerDefaults[DispatchTestAnswersToJobsJobArgs]
}

func (w *DispatchTestAnswersToJobsJobWorker) Work(ctx context.Context, job *river.Job[DispatchTestAnswersToJobsJobArgs]) error {
	slog.Debug("dispatching test answers to jobs")
	return river.JobSnooze(100 * time.Millisecond)
}

func (w DispatchTestAnswersToJobsJobWorker) Timeout(*river.Job[DispatchTestAnswersToJobsJobArgs]) time.Duration {
	return 100 * time.Millisecond
}

Multiple test takers take tests and fill in answers, which then get batched by this job to be passed to another job for verification of the answers. The reason is there is a rate limit on the calls to the verification API, so I need throughput control and batching. It needs to avoid redundant calls and desync issues, so there needs to be a single central worker on the job in the snippet.

I'm happy to help out with the issues highlighted, but am unsure of whether that belongs into this issue.

The obvious thing that prevents me from shipping this code is the jobsnooze overflows the MaxAttempts counter.

0 replies

nexovec · 2024-04-23T15:11:59Z

nexovec
Apr 23, 2024
Author

See docs here [2] for writing your own custom retry policy. Writing one that retries immediately would look something like:
func (policy *UltraAggressivePolicy) NextAt(job *river.rivertype.JobRow) time.Time {
    return time.Now()
}

I'd like to point out that the method name is NextRetry, not NextAt.

@brandur this is a good improvement for me, but I would like this to be scoped to the worker instead of the client, else I'd have to start managing multiple queues and client types with different policies, which would needlessly complicate my software design.

Are there any drawbacks of using this super aggressive scheduling for all workers?

0 replies

brandur · 2024-04-24T01:14:32Z

brandur
Apr 24, 2024
Maintainer

What you could do is combine periodic jobs with unique opts.

Configure a periodic job that inserts every five minutes or so, and set it to RunOnStart so that it always get an insert when a River client starts up.

Then, configure unique options by state [2], where JobStateCompleted is omitted so that if the job was ever finished, a new one gets inserted:

[]rivertype.JobState{
    rivertype.JobStateAvailable,
    rivertype.JobStateRunning,
    rivertype.JobStateRetryable,
    rivertype.JobStateScheduled,
}

The client will start up, insert an initial job, and that job will start work. The periodic job enqueuer will try to insert new versions of it every five minutes, but since the job is already running, those inserts will be no ops because of the unique opts.

I assume the Work implementation will have to have a big loop in it if it's going to be running that long. Make sure that the loop respects context cancellation to be able to leave the loop and return, or else the River client won't be able to stop cleanly [3].

[1] https://riverqueue.com/docs/periodic-jobs#basic-usage
[2] https://riverqueue.com/docs/unique-jobs#unique-by-state
[3] https://riverqueue.com/docs/reliable-workers#timeouts-and-contexts

0 replies

nexovec · 2024-04-24T08:07:52Z

nexovec
Apr 24, 2024
Author

@brandur that's quite clever. The downside is on the occasional failure of the job, I might have to wait for the job to spawn for up to 5 minutes.

Will it spam the jobs table if I put 1 second timeout instead of 5 minutes or so? If I understand correctly, the redundant jobs don't insert by virtue of failing some SQL constraint roughly equivalent to the unique constraint applied to the job worker.
Did you look at the FIXMEs in the worker snippet? They are potential bugs.
The scheduler doesn't respect it when I pass in low values(like 100ms) to JobSnooze, is this intended? I'm pretty sure this could work with some sort of immediate dispatch of the JobSnooze signal to the running clients. Should I avoid using JobSnooze for "infinite loops" (currently fails due to that overflow thing)(likely also applicable to NextRetry)?

0 replies

brandur · 2024-04-25T00:42:31Z

brandur
Apr 25, 2024
Maintainer

Will it spam the jobs table if I put 1 second timeout instead of 5 minutes or so? If I understand correctly, the redundant jobs don't insert by virtue of failing some SQL constraint.

Personally I wouldn't go crazy with that, but it won't be that big of a deal resource-wise.

Did you look at the FIXMEs in the worker snippet? They are potential bugs.

Don't bother changing MaxAttempts. With the periodic job configuration, if max attempts were to ever be reached, the defunct job would transition to a state of discarded, and the periodic job enqueuer would add a new job that would start work (discarded isn't in your set of unique states).

The scheduler doesn't respect it when I pass in low values(like 100ms) to JobSnooze, is this intended? I'm pretty sure this could work with some sort of immediate dispatch of the JobSnooze signal to the running clients. Should I avoid using JobSnooze for "infinte loops"?

Don't do this if only for purposes of not overcomplicating things. Use a for loop.

1 reply

nexovec Apr 25, 2024
Author

Regardless of whether it's a good decision to use JobSnooze, the MaxAttempts int16 can still overflow. This will cause a production bug someday.

Also, if you pass in something like a billion as MaxAttempts, it's actually negative in the database and the job won't run.

nexovec · 2024-04-25T15:48:03Z

nexovec
Apr 25, 2024
Author

I'm still curious as to why you can't snooze for a 100ms, but besides that, your responses were massively helpful indeed.
Also thanks for making this awesome library.

1 reply

bgentry Apr 25, 2024
Maintainer

You can, however there’s a scheduler optimization where it puts the job straight back into available if it’s going to be due in the very near future. However it still will not run before its scheduled_at time based on whatever snooze duration you’ve chosen

nexovec · 2024-04-25T15:53:42Z

nexovec
Apr 25, 2024
Author

Why is it not allowed to have an empty jobargs struct?

1 reply

bgentry Apr 25, 2024
Maintainer

Can you provide a code example for the issue you’re facing? You can definitely have an args type that serializes to an empty JSON object, we have many such examples in our tests.

Always worker #320

Uh oh!

nexovec Apr 22, 2024

Replies: 8 comments · 3 replies

Uh oh!

brandur Apr 23, 2024 Maintainer

Uh oh!

Uh oh!

nexovec Apr 23, 2024 Author

Uh oh!

Uh oh!

nexovec Apr 23, 2024 Author

Uh oh!

Uh oh!

brandur Apr 24, 2024 Maintainer

Uh oh!

Uh oh!

nexovec Apr 24, 2024 Author

Uh oh!

brandur Apr 25, 2024 Maintainer

Uh oh!

Uh oh!

nexovec Apr 25, 2024 Author

Uh oh!

nexovec Apr 25, 2024 Author

Uh oh!

bgentry Apr 25, 2024 Maintainer

Uh oh!

nexovec Apr 25, 2024 Author

Uh oh!

bgentry Apr 25, 2024 Maintainer

nexovec
Apr 22, 2024

Replies: 8 comments 3 replies

brandur
Apr 23, 2024
Maintainer

nexovec
Apr 23, 2024
Author

nexovec
Apr 23, 2024
Author

brandur
Apr 24, 2024
Maintainer

nexovec
Apr 24, 2024
Author

brandur
Apr 25, 2024
Maintainer

nexovec Apr 25, 2024
Author

nexovec
Apr 25, 2024
Author

bgentry Apr 25, 2024
Maintainer

nexovec
Apr 25, 2024
Author

bgentry Apr 25, 2024
Maintainer