Skip to content

Standalone Nexus Operations#685

Open
bergundy wants to merge 44 commits intotemporalio:masterfrom
bergundy:standalone-nexus-op
Open

Standalone Nexus Operations#685
bergundy wants to merge 44 commits intotemporalio:masterfrom
bergundy:standalone-nexus-op

Conversation

@bergundy
Copy link
Copy Markdown
Member

@bergundy bergundy commented Dec 9, 2025

What changed?

  • Added the full API scope for standalone nexus operations.

Server PR

temporalio/temporal#9869

Comment thread temporal/api/workflowservice/v1/request_response.proto
Comment thread temporal/api/nexus/v1/message.proto Outdated

// The number of attempts made to start/deliver the operation request.
// This number represents a minimum bound since the attempt is incremented after the request completes.
int32 attempt = 9;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is attempt to deliver the start request. Will we support overall operation retry in the future? Will this name be confusing if we do? Maybe we should call it start_attempt so that people will not confuse it with activity attempt which has a different meaning.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to keep this for consistency with PendingNexusOperationInfo.

Comment thread temporal/api/nexus/v1/message.proto Outdated
string request_id = 19;

// Operation token. Only set for asynchronous operations after a successful StartOperation call.
string operation_token = 20;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we said we didn't want to expose this to callers? They should only have one way of referencing their operations: their caller-side operation ID.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still worth exposing this information as we do for workflow callers.

Copy link
Copy Markdown
Contributor

@stephanos stephanos Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no horse in this race, but I'm curious, why is it useful to have?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful for debugging and can be used in the direct Nexus APIs to reattach to the same operation (future capability).

Comment thread temporal/api/workflowservice/v1/request_response.proto
// The run ID of the operation, useful when run_id was not specified in the request.
string run_id = 1;

// Stage to wait for. The operation may be in a more advanced stage when the poll is unblocked.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, is this the stage the original request sent? Or does it represent the current stage of the operation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current stage. Let me fix the docstring.

Comment thread temporal/api/workflowservice/v1/service.proto
Comment thread temporal/api/nexus/v1/message.proto Outdated
// Updated on terminal status.
int64 state_transition_count = 10;
// Updated once on scheduled and once on terminal status.
int64 state_size_bytes = 11;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally a field only present in list? It was mentioned for standalone activities that everything in list was expected to be in describe.

Also, for standalone activities it was mentioned there would be a tool that would make sure everything in list was also in describe result. Can we prioritize that? It's a lot of effort for me to have to continually confirm our assertion on every PR and find these issues since we chose not to reuse types.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to call this out that we don't have this guarantee for schedules or batch which are much older archetypes: https://github.com/temporalio/api/blob/master/temporal/api/schedule/v1/message.proto https://github.com/temporalio/api/blob/master/temporal/api/workflowservice/v1/request_response.proto#L1715-L1751.

I don't think this guarantee needs to be high priority but we should keep track of it because I do think that it is nice to have. Ideally the SDKs would allow the types to have completely different fields, there's no need to reuse the models here.

Copy link
Copy Markdown
Contributor

@cretz cretz Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this guarantee/promise was made as part of not reusing models knowing the SDK will need this guarantee. Was not expecting a "nice to have" guarantee when the promise/guarantee was made.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take this offline.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, we will write a tool soon.

// Response to a successful UnpauseWorkflowExecution request.
message UnpauseWorkflowExecutionResponse { }

message StartNexusOperationRequest {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it intentional we don't have Priority support?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, priorities only apply to durable matching queues, those are not used for nexus tasks.

stephanos and others added 9 commits February 23, 2026 12:35
# Conflicts:
#	openapi/openapiv2.json
#	openapi/openapiv3.yaml
#	temporal/api/errordetails/v1/message.proto
#	temporal/api/nexus/v1/message.proto
#	temporal/api/workflowservice/v1/request_response.proto
#	temporal/api/workflowservice/v1/service.proto
string blocked_reason = 7;

// A reason that may be specified in the CancelNexusOpertionRequest.
string reason = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was set to 24 and I changed it; I don't see why it should be 24.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wonder why we don't have a linter for this?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy pasta probably. Thanks for catching this.


// How long this operation has been running for, including all attempts and backoff between attempts.
// Elapsed time from schedule_time to now for running operations or to close_time for closed
// operations, including all attempts and backoff between attempts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarifies it works for running operations (as opposed to NexusOperationListInfo); correct me if that's wrong.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

stephanos and others added 8 commits March 11, 2026 16:12
Restores the Execution suffix on all Nexus operation types except
Link.NexusOperation in common/v1/message.proto.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Remove state_size_bytes field and renumber execution_duration.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@stephanos stephanos force-pushed the standalone-nexus-op branch from 3b976ca to b314585 Compare March 20, 2026 21:40
// This is the only timeout settable for a Nexus operation.
// (-- api-linter: core::0140::prepositions=disabled
// aip.dev/not-precedent: "to" is used to indicate interval. --)
google.protobuf.Duration schedule_to_close_timeout = 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the other timeouts like schedule to start and start to close

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those should be added.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will they be in this PR or later?

Copy link
Copy Markdown
Contributor

@stephanos stephanos Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it now: 1d12d58

@stephanos stephanos changed the title [DO NOT MERGE] Standalone Nexus Operations Standalone Nexus Operations Apr 21, 2026
NEXUS_OPERATION_ID_CONFLICT_POLICY_FAIL = 1;
// Don't start a new operation; instead return a handle for the running operation.
NEXUS_OPERATION_ID_CONFLICT_POLICY_USE_EXISTING = 2;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support terminate like w/ workflows?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that I don't have context for why it's not there; but it seems like something we can always add later if we want to.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support TERMINATE_IF_RUNNING for anything that isn't workflow.

bool include_input = 4;
// Include the outcome (result/failure) in the response if the operation has completed.
bool include_outcome = 5;
// Token from a previous DescribeNexusOperationExecutionResponse. If present, long-poll until operation
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Token from a previous DescribeNexusOperationExecutionResponse. If present, long-poll until operation
// Token from a previous DescribeNexusOperationExecutionResponse. If present, this RPC will long-poll until operation

Took me a minute to understand what this was trying to say w/o this clarification.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does this mean in order to long poll on complete you must make a minimum of two RPCs? One to get the token and one to wait on it? If so what's the use case for the long poll token here instead of just using PollNexusOperationExecutionRequest?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Polling on DescribeNexusOperationExecution is for the UI. So the UI will describe once to render the page and then long poll so any update can be reflected in the UI.

Copy link
Copy Markdown
Contributor

@stephanos stephanos Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, DescribeActivityExecutionRequest follows the same approach.

Also a noteable difference is, like Quinn alludes to, the Describe returns NexusOperationExecutionInfo which Poll doesn't.

Comment on lines +3153 to +3164
// - OperationId
// - RunId
// - Endpoint
// - Service
// - Operation
// - RequestId
// - StartTime
// - ExecutionTime
// - CloseTime
// - ExecutionStatus
// - ExecutionDuration
// - StateTransitionCount
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanos can you double check this against the implementation please?

Copy link
Copy Markdown
Contributor

@stephanos stephanos Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All but RequestId are supported; fixing that up now: temporalio/temporal#10032

NEXUS_OPERATION_ID_CONFLICT_POLICY_FAIL = 1;
// Don't start a new operation; instead return a handle for the running operation.
NEXUS_OPERATION_ID_CONFLICT_POLICY_USE_EXISTING = 2;
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support TERMINATE_IF_RUNNING for anything that isn't workflow.

Comment thread temporal/api/nexus/v1/message.proto Outdated
Comment on lines +279 to +280
// The number of attempts made to start/deliver the operation request.
// This number represents a minimum bound since the attempt is incremented after the request completes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The number of attempts made to start/deliver the operation request.
// This number represents a minimum bound since the attempt is incremented after the request completes.
// The number of attempts made to deliver the start operation request.
// This number is approximate, it is incremented when a task is added to the history queue.
// In practice, there could be more attempts if a task is executed but fails to commit, or less attempts if a task was
// never executed.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanos please also update this comment for PendingNexusOperationInfo in workflow/v1/message.go.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 734ce9d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants