Skip to content

Conversation

@trinity-1686a
Copy link
Contributor

Description

When scheduling leaf splits, allow up to N queries to run concurrently so that a single large query doesn't hog all ressources on a searcher.

How was this PR tested?

added unit test to verify it emits permits in the expected order

@fulmicoton
Copy link
Collaborator

fulmicoton commented Jan 27, 2026

What metric will be improved by this change and which metric will be worse?

My understanding is that:

  • the average latency will be worsened.
  • the average (real latency / expected latency give the query size) will be improved.

Is this a patch meant to address current urgent PoC or is it meant as a long term solution?
I think we can be a tiny bit more ambitious than this.

I think @rdettai's approach using differentiated service was on the right track, but using separate pool was too wasteful.
We could use a watered down version of https://en.wikipedia.org/wiki/Shortest_remaining_time for instance.

@trinity-1686a
Copy link
Contributor Author

the average latency should be exactly the same, just moving it from one query to another.
i think the latency for pXX where XX is low isn't impacted much (queries done with no load at all are unimpacted, queries done with low load start a bit earlier, but compete with the next request, which overall should cancel each other).
latency for pXX where XX is high should be improved (queries stuck after a big one are not entirely stuck, and may complete before that large query)
latency for p100 should be worth, a single expensive query no longer has 100% of the compute of searchers dedicated to it, so it's longer to run

Copy link
Collaborator

@rdettai-sk rdettai-sk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone else reading this thread:

  • This PR is about (partially) reverting #5509
  • The alternative with differentiated services mentioned here is this #6010

average latency should be exactly the same

Took me a while to understand @fulmicoton's point as to why this isn't true. I had to come up with an example which I think explains it. Say you have 2 queries with 5 splits each and the provider is initially full:

  • sequentially, you would have to wait 5 times for split1 and 10 times for split2, so an average of 7.5
  • in round robin you would have to wait 9 times for split1 and 10 times for split2, so an average of 9.5

I really like the idea of shortest remaining time. My only concern is regarding its behavior when the remaining time's estimate is bad, i.e some splits are cached and others not. But I don't think it would make things worse than the current scheduler.

Comment on lines 245 to 250
let (leaf, leaf_position) = if let Some(leaf_permit_request) =
self.permits_requests.get_mut(self.next_query_to_serve)
{
(leaf_permit_request, self.next_query_to_serve)
} else if let Some(leaf_permit_request) = self.permits_requests.front_mut() {
(leaf_permit_request, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for easier proof reading I think we can mutate self.next_query_to_serve instead of returning leaf_position

@trinity-1686a
Copy link
Contributor Author

tbh, i don't like shortest remaining time.
First it needs good estimates, which is far from trivial for a myriad of reasons. Second it easily has the potential to starve queries (either big, or small but miss-estimated ones).
The second may or may not be something we consider to be bad. Obviously having a query "never" complete isn't great, but assuming there is a timeout anywhere in front of quickwit, it's better to not work at all on some queries than to have all queries 90% done when hitting a timeout, and possibly getting hit with a new identical query.

@fulmicoton what did you mean by "watered down version" of SRTF? Did you had a specific idea of what part you wanted to cut out/change? it feels like (assuming a good heuristic, which again, hard), SRTF is really simple, without any part you can remove

@fulmicoton
Copy link
Collaborator

fulmicoton commented Jan 29, 2026

@trinity-1686a By watered down I did not mean simplify. I meant watering down its property. My main concern was your second point.

I don't have any great idea yet unfortunately, but to give you an example: your job could a budget of "letting other job go in front of you". After this budget is expired you stop the SRTF behavior and just execute before any other job that arrived after you. I don't think this is a great idea either though.

@rdettai-sk
Copy link
Collaborator

rdettai-sk commented Jan 29, 2026

it's better to not work at all on some queries than to have all queries 90% done when hitting a timeout

The worse collapses of the search performance I have seen on QW happened in a similar scenario: You end up with all queries starting right before their timeout, most of them just using some resources before getting cancelled, effectively keeping the searchers in a steady regime of high failure rate. Not to mention that this risk is getting higher as you scale the searcher fleet, because having only one searcher enter this regime is enough to compromise all searches cluster wide. I feel like SRTF would better protect us from this while providing a good average efficiency. Starvation is a lesser of 2 evils for us 😅

@trinity-1686a
Copy link
Contributor Author

I don't have any great idea yet unfortunately, but to give you an example: your job could a budget of "letting other job go in front of you". After this budget is expired you stop the SRTF behavior and just execute before any other job that arrived after you. I don't think this is a great idea either though.

that kinda sounds like Highest Response Ratio Next.

I'm worried SRTF might have some bad behaviours in some cases, but FIFO definitely has bad behaviours in case that matters to us all. My N-first RR will have the same issue as FIFO given enough big queries in parallel. @rdettai-sk's 2-fifo is likely very sensitive to its configuration (and isn't work conservative but we could fix that)

So here is what i propose, we start with a SRTF (heuristic = number of splits at the leaf level). If it ends up working good, we can leave it that way. If it has some issues, we make the SearchPermitProvider support multiple schedulers through configuration, so we can more easily test multiple ideas, and eventually pick a winner. wdyt?

@trinity-1686a trinity-1686a force-pushed the trinity.pointard/concurrent-queries branch from ef75690 to 1dc9229 Compare January 30, 2026 10:09
@trinity-1686a trinity-1686a force-pushed the trinity.pointard/concurrent-queries branch from 1dc9229 to 17535cc Compare January 30, 2026 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants