Fix SimpleStatement.is_lwt(): detect LWT from CQL query string by mykaul · Pull Request #784 · scylladb/python-driver

mykaul · 2026-04-01T09:17:53Z

Summary

SimpleStatement.is_lwt() always returns False (inherited from Statement base class, line 348). This means all LWT-aware optimizations are silently disabled for:

Raw CQL users: session.execute(SimpleStatement("INSERT ... IF NOT EXISTS")) — LWT routing and retry policies never activate
All cqlengine queries: cqlengine wraps every query in SimpleStatement at connection.py:347-349, losing any LWT information

This is particularly impactful because:

PR Fix LWT routing: preserve Paxos leader order in TokenAwarePolicy #782 (Paxos leader routing) only activates when query.is_lwt() returns True
PR Add LWTRetryPolicy: retry CAS timeouts on same host with backoff #783 (LWT retry policy) relies on WriteType.CAS from the server, but the routing optimization from Fix LWT routing: preserve Paxos leader order in TokenAwarePolicy #782 is lost
Any SimpleStatement user gets no LWT-specific behavior at all

Fix

Added regex-based LWT detection to SimpleStatement.is_lwt() that matches CQL patterns:

Pattern	Example
`IF NOT EXISTS`	`INSERT INTO t (a) VALUES (1) IF NOT EXISTS`
`IF EXISTS`	`UPDATE t SET a=1 WHERE k=1 IF EXISTS`
`IF <column>`	`UPDATE t SET a=1 WHERE k=1 IF a = 2`

The regex is case-insensitive and handles multiline queries, extra whitespace, and tabs. The result is cached after the first call to avoid repeated regex matching.

This is a best-effort heuristic — PreparedStatement continues to use the authoritative is_lwt flag from the server. The main known false positive is DDL queries like CREATE TABLE IF NOT EXISTS, which is harmless since DDL doesn't use token-aware routing.

Changes

cassandra/query.py: Added _LWT_PATTERN regex and SimpleStatement.is_lwt() override
tests/unit/test_query.py: Added SimpleStatementIsLwtTest class with 20 tests; updated existing BatchStatement test to use the new detection instead of a workaround subclass

Tests

20 new tests covering:

INSERT/UPDATE/DELETE IF [NOT] EXISTS (various cases)
Conditional updates/deletes (IF col = val, IF col != val, IF col > val)
Multiple conditions (IF a = 1 AND b = 2)
Case insensitivity (lowercase, mixed case)
Non-LWT queries (SELECT, INSERT/UPDATE/DELETE without IF)
DDL with IF NOT EXISTS (documented false positive)
Result caching
Multiline queries, extra whitespace, tab separation

All 26 tests in tests/unit/test_query.py pass. All 82 tests in tests/unit/test_policies.py pass.

SimpleStatement.is_lwt() always returns False (inherited from Statement base class, line 348). This means all LWT-aware optimizations are silently disabled for:

Imo it is working as intended. If someone needs performance (including routing optimizations like the ones for LWT) they should use prepared statements. Drivers don't parse query strings.

The main known false positive is DDL queries like CREATE TABLE IF NOT EXISTS, which is harmless since DDL doesn't use token-aware routing.

DDL is the main use case for non-prepared statements. If you use a statement multiple times, you should prepare it.

This is something we need to agree with - we don't make efforts to improve non-prepared statements performance. I'm OK with that.

Also, a big false positive is a request with a string that contains IF NOT EXISTS, no?

Yes, a known one. I can handle it, if needed.

SimpleStatement.is_lwt() always returned False, which meant LWT-aware routing (TokenAwarePolicy Paxos leader routing) and retry policies never activated for: - Raw CQL queries via session.execute(SimpleStatement(...)) - All cqlengine queries (which wrap everything in SimpleStatement) This adds regex-based LWT detection to SimpleStatement that matches: - INSERT ... IF NOT EXISTS - UPDATE/DELETE ... IF EXISTS - UPDATE/DELETE ... IF <column> <op> <value> (conditional) - UPDATE/DELETE ... IF "column" <op> <value> (quoted identifiers) - BEGIN BATCH ... IF NOT EXISTS ... APPLY BATCH DDL statements (CREATE/ALTER/DROP ... IF [NOT] EXISTS) are correctly excluded by checking that the first word is a DML verb. The result is cached after the first call. This is a best-effort heuristic; PreparedStatement continues to get the authoritative is_lwt flag from the server during PREPARE. Also updates the existing BatchStatement LWT test to use the new SimpleStatement.is_lwt() directly instead of a workaround subclass.

mykaul · 2026-04-01T12:26:01Z

v2 — Rebased on current master (8e6c4d4e7). Changes since v1:

Fixed DDL false positives: Added a DML verb check — is_lwt() now requires the first word of the query to be a DML verb (INSERT, UPDATE, DELETE, or BEGIN) before checking for IF. This eliminates false positives from DDL statements like CREATE TABLE IF NOT EXISTS, DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS, etc.
Fixed quoted identifier false negative: The regex now correctly detects LWT in queries using quoted identifiers, e.g. UPDATE t SET x=1 WHERE id=1 IF "col" = 1.
30 unit tests covering: all LWT forms (INSERT IF NOT EXISTS, UPDATE IF EXISTS, UPDATE IF , DELETE IF EXISTS, DELETE IF , BEGIN BATCH with LWT), non-LWT DML, DDL with IF (6 DDL cases), quoted identifiers, whitespace/tab/multiline variations, caching behavior.

Lorak-mmk · 2026-04-08T07:26:58Z

This is something we need to agree with - we don't make efforts to improve non-prepared statements performance. I'm OK with that.

In my opinion we shouldn't. We will only introduce hacks making the code more complicated, and there will always be cases where the behavior differs from prepared statement. User wants performance, correct routing -> user needs prepared statement.
For simple statement, you are never going to be able to be token-aware anyway, so LWT won't be correctly routed. I don't see the benefit of is_lwt heuristic.

mykaul · 2026-04-08T09:19:05Z

This is something we need to agree with - we don't make efforts to improve non-prepared statements performance. I'm OK with that.

In my opinion we shouldn't. We will only introduce hacks making the code more complicated, and there will always be cases where the behavior differs from prepared statement. User wants performance, correct routing -> user needs prepared statement.
For simple statement, you are never going to be able to be token-aware anyway, so LWT won't be correctly routed. I don't see the benefit of is_lwt heuristic.

Perhaps we should focus on improving cqlengine queries then.

mykaul marked this pull request as draft April 1, 2026 10:41

mykaul force-pushed the feature/simple-statement-is-lwt branch from 00b14ef to e8b3ee6 Compare April 1, 2026 11:21

mykaul force-pushed the feature/simple-statement-is-lwt branch from e8b3ee6 to 09ce230 Compare April 1, 2026 12:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SimpleStatement.is_lwt(): detect LWT from CQL query string#784

Fix SimpleStatement.is_lwt(): detect LWT from CQL query string#784
mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul:feature/simple-statement-is-lwt

mykaul commented Apr 1, 2026

Uh oh!

Lorak-mmk commented Apr 1, 2026

Uh oh!

Lorak-mmk commented Apr 1, 2026

Uh oh!

mykaul commented Apr 1, 2026

Uh oh!

mykaul commented Apr 1, 2026

Uh oh!

mykaul commented Apr 1, 2026

Uh oh!

Lorak-mmk commented Apr 8, 2026

Uh oh!

mykaul commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mykaul commented Apr 1, 2026

Summary

Fix

Changes

Tests

Related

Uh oh!

Lorak-mmk commented Apr 1, 2026

Uh oh!

Lorak-mmk commented Apr 1, 2026

Uh oh!

mykaul commented Apr 1, 2026

Uh oh!

mykaul commented Apr 1, 2026

Uh oh!

mykaul commented Apr 1, 2026

Uh oh!

Lorak-mmk commented Apr 8, 2026

Uh oh!

mykaul commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants