Description
We are using $sample
2 when getting captchas. This is causing slow queries on the nodes. We need to change this approach as follows:
-
Create an index on { datasetId: 1, solved: 1 }
-
Instead of $sample, use a random selection method to improve performance. For example:
- Add a random field to each document at insertion time.
- Index this field.
- Query using $gte or $lte to efficiently retrieve random documents.
- Use $limit Before $sample
Instead of sampling from the entire dataset, limit the query first:
db.captchas.aggregate([
{ $match: { datasetId: "0xe666b35451f302b9fccfbe783b1de9a6a4420b840abed071931d68a9ccc1c21d", solved: true } },
{ $limit: 1000 }, // Get a subset first
{ $sample: { size: 2 } }, // Then sample from that subset
{ $project: { datasetId: 1, datasetContentId: 1, captchaId: 1, captchaContentId: 1, items: 1, target: 1 } }
]);
This reduces the number of documents MongoDB has to scan.