Fix GH-18956: PHP-FPM Process Count Inconsistencies#19191
Fix GH-18956: PHP-FPM Process Count Inconsistencies#19191bukka wants to merge 1 commit intophp:PHP-8.3from
Conversation
411f84b to
6b7a76a
Compare
|
This needs more testing - I will go only for WST test as PHP test is too tricky for this and would not most likely work in the pipeline anyway. |
This fixes incorrect decrement of the active number of processes. This was done in accepting stage before incrementing which is problematic if there are already some running processes as it decrements their number first and result in incorrect total (lower than the actual number). In addition it also fixes phpGH-14212 as accept is done just once for keepalive connection so in this case the number of active connection just increasing with each request. Closes phpGH-19191
|
So I have been testing and debugging this. At the end the best way to see what's going on was adding some extra logs which I have in this debug branch: master...bukka:php-src:fpm_gh18956_request_active_decrement_debug . If it's done without keepalive, a single request call looks like following: But with keepalive ( The problem is that there are two on_read calls because keepalive means that FPM (or more fastcgi.c) does second loop but if nginx gets single non keepalive http request, the second read is just 0 so FPM closes connection. In this case it might seem logical to reduce active in fpm_request_finished (on_close hook) except it wouldn't work for non keepalive. Unfortunately there is currently no way to differentiate it in fpm_request_finished. Also on_close can be called in couple of other cases so it would need additional checking. I will need to think about it more and see if there is some clean solution. Alternatively there might be some hacky solution to do that but it will require more checking as well. I will get this sorted - it will just require a bit of time to get right and properly test it. |
|
Would it be possible to explicitly mark keepalive requests and reflect them in the statistics? For keepalive connections, subsequent reuse should only increment the reuse count without affecting the active/idle process state. Based on this idea, I created a demonstration that minimizes changes to the existing API. ref |
6b7a76a to
3586d5e
Compare
This fixes extra increments on keep alive that happen for follow up request in headers reading stage. It is because the accepting stage is only done on the first request. It adds a new flag to the on_read fastcgi callback that sets whether the kept alive request is being done and then skips the further active increments. It also fixes issue with incorrect decrement of the active number of processes. This was done in accepting stage even on the first which might still be problematic if there are already some running processes as it decrements their number first and result in incorrect total (lower than the actual number). The fix is to skip this decrement of active processes on the first accept. Closes phpGH-19191
3586d5e to
a6f3a2b
Compare
|
@Appla I was checking your changes and not sure if it would work unless I'm misunderstanding it. The thing is that I just got an idea to pass that info from fastcgi (extra parameter for on_read and on_accept) and skip the increment for follow up requests and decrement for the first accept. That should hopefully balance it better but will need to properly test it. |
The |
Distinguishing keep‑alive connections is important, as it also relates to the I tried to fix the timing issue with minimal changes (ref), but an overall reconsideration would likely be better. |
This fixes extra increments on keep alive that happen for follow up request in headers reading stage. It is because the accepting stage is only done on the first request. It adds a new flag to the on_read fastcgi callback that sets whether the kept alive request is being done and then skips the further active increments.
It also fixes issue with incorrect decrement of the active number of processes. This was done in accepting stage even on the first accept which might have been be problematic if there were already some running processes as it decremented their number first and resulted in incorrect total (lower than the actual number). The fix is to skip this decrement of active processes on the first accept.