Skip to content

Conversation

@mbhaskar
Copy link
Member

Description

This PR adds support for nregion synchronous commit for less than strong consistency to the SDK

Major changes

  • Added isNRegionSynchronousCommitEnabled to DatabaseAccount to get if the feature is enabled on the account. This is then populated on the RequestContext for write requests so that it can be read from ConsistencyWriter.

  • Deserialize GlobalNRegionCommittedGLSN from backend RNTBD response headers

  • Prior to this, we used to issue barrier requests only for global strong account/requests. This PR introduce barrier request calls for write request when N-Region Synchronous commit is enabled for the account and N-Region Committed LSN is returned for less than strong consistency to enforce barrier writes

Testing

  • Added unit tests using mock for global strong and n-region barrier requests for success and failure scenarios
  • Tested manually on a live account that has n-region commit enabled

##Reference
Azure/azure-cosmos-dotnet-v3#5401

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Adding unit tests
Refactoring and cleanup
# Conflicts:
#	sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/ConsistencyWriter.java
Copilot AI review requested due to automatic review settings January 20, 2026 23:43
@mbhaskar mbhaskar requested review from a team and kirankumarkolli as code owners January 20, 2026 23:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for N-region synchronous commit for less than strong consistency levels in the Azure Cosmos DB Java SDK. The feature introduces barrier write requests similar to global strong consistency, but specifically for N-region commit scenarios when the account has this capability enabled.

Changes:

  • Added deserialization of GlobalNRegionCommittedGLSN header from backend RNTBD responses
  • Added isNRegionSynchronousCommitEnabled property to DatabaseAccount and propagated it through GlobalEndpointManager and request context
  • Refactored ConsistencyWriter barrier request logic to support both global strong writes and N-region synchronous commit scenarios
  • Added comprehensive unit tests for the new barrier request scenarios

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
RntbdConstants.java Added GlobalNRegionCommittedGLSN response header constant with ID 0x0078
RntbdResponseHeaders.java Added deserialization support for GlobalNRegionCommittedGLSN header
WFConstants.java Added GLOBAL_N_REGION_COMMITTED_GLSN backend header constant
StoreResponse.java Added getNumberOfReadRegions() method to retrieve number of read regions from response headers
ConsistencyWriter.java Refactored barrier request logic to support both global strong and N-region commit scenarios with dedicated helper methods
BarrierType.java New enum to distinguish between barrier types (NONE, GLOBAL_STRONG_WRITE, N_REGION_SYNCHRONOUS_COMMIT)
RxDocumentClientImpl.java Set NRegionSynchronousCommitEnabled flag in request context for write operations
RMResources.java Added error message for N-region commit barrier not met scenarios
HttpConstants.java Added sub-status code for N-region commit write barrier failures
GlobalEndpointManager.java Added method to retrieve N-region synchronous commit enabled status from database account
DocumentServiceRequestContext.java Added fields for N-region commit enabled flag and barrier type, renamed globalStrongWriteResponse to cachedWriteResponse
DatabaseAccount.java Added method to check if N-region synchronous commit is enabled on the account
Constants.java Added ENABLE_N_REGION_SYNCHRONOUS_COMMIT property constant
ConsistencyWriterTest.java Added comprehensive unit tests for global strong and N-region commit barrier scenarios
Comments suppressed due to low confidence (1)

sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/DocumentServiceRequestContext.java:170

  • The clone method does not copy the two new fields (nRegionSynchronousCommitEnabled and barrierType) added to DocumentServiceRequestContext. When a context is cloned, these fields will be null/default in the cloned instance, which could lead to incorrect behavior during request processing. Add the following assignments to the clone method:
    context.nRegionSynchronousCommitEnabled = this.nRegionSynchronousCommitEnabled;
    context.barrierType = this.barrierType;
    public DocumentServiceRequestContext clone() {
        DocumentServiceRequestContext context = new DocumentServiceRequestContext();
        context.forceAddressRefresh = this.forceAddressRefresh;
        context.forceRefreshAddressCache = this.forceRefreshAddressCache;
        context.requestChargeTracker = this.requestChargeTracker;
        context.timeoutHelper = this.timeoutHelper;
        context.resolvedCollectionRid = this.resolvedCollectionRid;
        context.sessionToken = this.sessionToken;
        context.quorumSelectedLSN = this.quorumSelectedLSN;
        context.globalCommittedSelectedLSN = this.globalCommittedSelectedLSN;
        context.cachedWriteResponse = this.cachedWriteResponse;
        context.originalRequestConsistencyLevel = this.originalRequestConsistencyLevel;
        context.readConsistencyStrategy = this.readConsistencyStrategy;
        context.resolvedPartitionKeyRange = this.resolvedPartitionKeyRange;
        context.resolvedPartitionKeyRangeForCircuitBreaker = this.resolvedPartitionKeyRangeForCircuitBreaker;
        context.resolvedPartitionKeyRangeForPerPartitionAutomaticFailover = this.resolvedPartitionKeyRangeForPerPartitionAutomaticFailover;
        context.regionIndex = this.regionIndex;
        context.usePreferredLocations = this.usePreferredLocations;
        context.locationIndexToRoute = this.locationIndexToRoute;
        context.regionalRoutingContextToRoute = this.regionalRoutingContextToRoute;
        context.performLocalRefreshOnGoneException = this.performLocalRefreshOnGoneException;
        context.effectivePartitionKey = this.effectivePartitionKey;
        context.performedBackgroundAddressRefresh = this.performedBackgroundAddressRefresh;
        context.cosmosDiagnostics = this.cosmosDiagnostics;
        context.resourcePhysicalAddress = this.resourcePhysicalAddress;
        context.throughputControlRequestContext = this.throughputControlRequestContext;
        context.replicaAddressValidationEnabled = this.replicaAddressValidationEnabled;
        context.endToEndOperationLatencyPolicyConfig = this.endToEndOperationLatencyPolicyConfig;
        context.unavailableRegionsForPartition = this.unavailableRegionsForPartition;
        context.crossRegionAvailabilityContextForRequest = this.crossRegionAvailabilityContextForRequest;
        return context;

private Mono<Boolean> waitForWriteBarrierAsync(
private String getErrorMessageForBarrierRequest(RxDocumentServiceRequest request) {
if (request.requestContext.getBarrierType() == BarrierType.N_REGION_SYNCHRONOUS_COMMIT) {
return String.format("ConsistencyWriter: Write barrier has not been met for n region synchronous commit request. SelectedGlobalCommittedLsn: %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log value of n - how many regions have to commit is configurable (should be part of the account metadata)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to keep this consistent with the log message from .net. Not required?

globalCommittedLsn.v = Long.parseLong(headerValue);
}
else if ((headerValue = response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) != null) {
globalCommittedLsn.v = Long.parseLong(headerValue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we need a separate property in the diagnostic string for N-Region Committed LSN (to represent the LSN) and also account-level N-Region Commit enablement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like (snippet from the diagnostics):

    "consistencyCfg" : "(consistency: Session, nRegionCommitEnabled: true, readConsistencyStrategy: null,  mm: true, prgns: [])",
    "proactiveInitCfg" : "",
    "e2ePolicyCfg" : "",
    "sessionRetryCfg" : ""

https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/directconnectivity/StoreResultDiagnostics.java#L29 - this is the GCLSN serialized as part of the diagnostics so something similar such https://github.com/Azure/azure-cosmos-dotnet-v3/pull/5401/changes#diff-2d305bb945ccb69fdb9eb3404e3d1016302f0bc22a5d32c879513834ed9e9c0fR414 would help.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure it adds clarity, but in current case we can still imply that glsn refers to nregion glsn when n region is enabled on the account

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can make the change if we feel it is important to have it that way

return globalCommittedGlsnValue != -1;
} catch (NumberFormatException e) {
// Malformed header value: treating as no barrier instead of throwing.
return false;
Copy link
Member

@xinlian12 xinlian12 Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a warning/error log for this case

if ((headerValue = response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_COMMITTED_LSN)) != null) {
globalCommittedLsn.v = Long.parseLong(headerValue);
}
else if ((headerValue = response.getHeaderValue(WFConstants.BackendHeaders.GLOBAL_N_REGION_COMMITTED_GLSN)) != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does GLOBAL_COMMITTED_LSN and GLOBAL_N_REGION_COMMITTED_GLSN exclusive to each other? like there will always only be 1 populated or > -1?

(for my local account with session consistency with just 1 write region, the GLOBAL_COMMITTED_LSN is populated. so want to understand what it will look like when N-Region commit is enabled)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they are exclusive. nregion glsn will show up only for non strong consistancies when enabled


try {
if (ReplicatedResourceClient.isGlobalStrongEnabled() && this.isGlobalStrongRequest(request, response)) {
BarrierType barrierType = getBarrierRequestType(request, response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also refactor the method name? since it will not only be for global strong any more

try {
return Long.parseLong(numberOfReadRegionsString);
} catch (NumberFormatException e) {
// If value cannot be parsed as Long, return -1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warn/error log?

}

// visibility set to package-private for testing
Mono<Boolean> waitForWriteBarrierAsync(
Copy link
Member

@xinlian12 xinlian12 Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For barrier request, should we also check GLOBAL_N_REGION_COMMITTED_GLSN instead? seems like it always validating based on gclsn regardless of the barrier type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For barrier request, the glsn remains the same. I didnt see any change related to this in .NET. Let me double check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants