Skip to content

Conversation

@kpumuk
Copy link
Contributor

@kpumuk kpumuk commented Dec 21, 2025

Caution

This pull request introduces lingering settings aligned with C++ library, but upon further investigation does not seem to be safe. For example, THRIFT-3888 talks about abnormal connection termination from the client side.

Implemented lingering the same way C++ library handles it:

Lingering for the client can be configured:

socket = Thrift::Socket.new('localhost', port)
# Disable lingering
socket.linger(false, 0)
# Enable lingering with timeout 10 seconds
socket.linger(true, 10)

Benchmark

Before:

THRIFT_NUM_CALLS=1 THRIFT_NUM_CLIENTS=1000 THRIFT_SERVER=Thrift::ThreadPoolServer ruby benchmark/benchmark.rb; netstat -n | wc -l
Starting server...
Spawning benchmark processes...
Collecting output...
Translating output...
Analyzing output...

Server class:        Thrift::ThreadPoolServer
Server interpreter:  ruby
Client interpreter:  ruby
Socket class:        Thrift::Socket
Number of processes: 40
Clients per process: 1000
Calls per client:    1
Using fastthread:    no

Connection failures:               0
Connection errors:                 0
Average time per call:             0.0080 seconds
Average time per client (1 calls): 0.0082 seconds
Total time for all calls:          319.6949 seconds
Real time for benchmarking:        9.1044 seconds
Shortest call time:                0.0001 seconds
Longest call time:                 0.0229 seconds
Shortest client time (1 calls):    0.0003 seconds
Longest client time (1 calls):     0.2283 seconds

====> 13714 open sockets afterwards in TIME_WAIT status

After:

$ THRIFT_NUM_CALLS=1 THRIFT_NUM_CLIENTS=1000 THRIFT_SERVER=Thrift::ThreadPoolServer ruby benchmark/benchmark.rb; netstat -n | wc -l
Starting server...
Spawning benchmark processes...
Collecting output...
Translating output...
Analyzing output...

Server class:        Thrift::ThreadPoolServer
Server interpreter:  ruby
Client interpreter:  ruby
Socket class:        Thrift::Socket
Number of processes: 40
Clients per process: 1000
Calls per client:    1
Using fastthread:    no

Connection failures:               0
Connection errors:                 0
Average time per call:             0.0083 seconds
Average time per client (1 calls): 0.0084 seconds
Total time for all calls:          330.7904 seconds
Real time for benchmarking:        9.2939 seconds
Shortest call time:                0.0001 seconds
Longest call time:                 0.0245 seconds
Shortest client time (1 calls):    0.0003 seconds
Longest client time (1 calls):     0.2390 seconds

====> 4 open sockets afterwards, ~85 in the process
  • Did you create an Apache Jira ticket? THRIFT-5914
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

@kpumuk
Copy link
Contributor Author

kpumuk commented Dec 22, 2025

Found this ticket and now I am questioning both this and C++ implementation: https://issues.apache.org/jira/browse/THRIFT-3888

Java does not set lingering on the server socket, but instead disables it on the client, and it is not configurable: https://github.com/apache/thrift/blob/master/lib/java/src/main/java/org/apache/thrift/transport/TSocket.java#L63

Going to switch this to draft and perform more analysis...

@kpumuk kpumuk marked this pull request as draft December 22, 2025 01:39
@kpumuk
Copy link
Contributor Author

kpumuk commented Dec 22, 2025

Stress-testing with so_linger(1, 0) on a benchmark with 40,000 separate calls in multiple processes, closing and re-opening the socket for each call leads to actual data loss on oneway RPC - 0-4 calls out of 40,000.

@kpumuk
Copy link
Contributor Author

kpumuk commented Dec 22, 2025

Current state in other libraries, omitted ones do not configure lingering (off by default)

Language Server Client Notes
c_glib Proposed, and then deleted here THRIFT-2414
cpp [0, 0] [1, 0] Issues with terminating connections THRIFT-3888, in WSL THRIFT-5374
d [0, 0]
delphi [0, 0] [1, 0]
java [0, 0]
netstd Intentionally not used, see THRIFT-904

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant