Problem
When TokenAwarePolicy wraps RackAwareRoundRobinPolicy, LWT queries may be sent to the wrong replica (not the Paxos leader), adding an unnecessary network hop and increasing latency.
Root Cause
In TokenAwarePolicy.make_query_plan() (cassandra/policies.py:496-529), LWT queries correctly skip replica shuffling (line 517-518). However, replicas are still passed through yield_in_order() which buckets them by distance:
def yield_in_order(hosts):
for distance in [HostDistance.LOCAL_RACK, HostDistance.LOCAL, HostDistance.REMOTE]:
for replica in hosts:
if replica.is_up and child.distance(replica) == distance:
yield replica
With RackAwareRoundRobinPolicy, replicas in the same rack as the client get LOCAL_RACK distance, while replicas in other racks get LOCAL. This causes the Paxos leader (first natural replica in token-ring order) to be demoted if it's in a different rack.
Example
3 replicas in DC1, client in rack1:
- Replica 1 (Paxos leader, ring order first) → rack2 → distance
LOCAL
- Replica 2 → rack1 → distance
LOCAL_RACK
- Replica 3 → rack2 → distance
LOCAL
Result: yield_in_order yields Replica 2 first (same rack), then Replica 1 (Paxos leader). The query goes to Replica 2, which must forward the Paxos proposal to Replica 1 — an extra network hop.
Note: With DCAwareRoundRobinPolicy, all local DC replicas get LOCAL distance, so ring order is preserved and this bug does not manifest.
Impact
- Extra network hop per LWT query when the Paxos leader is in a different rack
- Increased Paxos latency and potential contention
- Only affects users of
RackAwareRoundRobinPolicy (or any child policy that distinguishes LOCAL_RACK from LOCAL)
Proposed Fix
For LWT queries, bypass yield_in_order and yield replicas in their natural token-ring order (filtering only down/ignored hosts):
if query.is_lwt():
for replica in replicas:
if replica.is_up and child.distance(replica) != HostDistance.IGNORED:
yield replica
else:
yield from yield_in_order(replicas)
Reference
gocql handles this correctly — for LWT queries, replicas are yielded in natural token-ring order without distance-based reordering.
Problem
When
TokenAwarePolicywrapsRackAwareRoundRobinPolicy, LWT queries may be sent to the wrong replica (not the Paxos leader), adding an unnecessary network hop and increasing latency.Root Cause
In
TokenAwarePolicy.make_query_plan()(cassandra/policies.py:496-529), LWT queries correctly skip replica shuffling (line 517-518). However, replicas are still passed throughyield_in_order()which buckets them by distance:With
RackAwareRoundRobinPolicy, replicas in the same rack as the client getLOCAL_RACKdistance, while replicas in other racks getLOCAL. This causes the Paxos leader (first natural replica in token-ring order) to be demoted if it's in a different rack.Example
3 replicas in DC1, client in rack1:
LOCALLOCAL_RACKLOCALResult:
yield_in_orderyields Replica 2 first (same rack), then Replica 1 (Paxos leader). The query goes to Replica 2, which must forward the Paxos proposal to Replica 1 — an extra network hop.Note: With
DCAwareRoundRobinPolicy, all local DC replicas getLOCALdistance, so ring order is preserved and this bug does not manifest.Impact
RackAwareRoundRobinPolicy(or any child policy that distinguishesLOCAL_RACKfromLOCAL)Proposed Fix
For LWT queries, bypass
yield_in_orderand yield replicas in their natural token-ring order (filtering only down/ignored hosts):Reference
gocql handles this correctly — for LWT queries, replicas are yielded in natural token-ring order without distance-based reordering.