Skip to content

[Go SDK] Rewrite dot runner to generate DOT from portable pipeline proto#37673

Open
YousufFFFF wants to merge 3 commits intoapache:masterfrom
YousufFFFF:go-dot-skip-composites
Open

[Go SDK] Rewrite dot runner to generate DOT from portable pipeline proto#37673
YousufFFFF wants to merge 3 commits intoapache:masterfrom
YousufFFFF:go-dot-skip-composites

Conversation

@YousufFFFF
Copy link
Copy Markdown

@YousufFFFF YousufFFFF commented Feb 21, 2026

Fixes #27508

This change rewrites the Go SDK dot runner to generate the DOT graph from the portable pipeline proto representation instead of relying on Go SDK internal graph structures.

By basing DOT generation on the portable pipeline model:

• Cross-language pipelines can now be rendered correctly.
• The implementation aligns with the portable runner architecture.
• It enables future reuse in Prism Runner and other portable tooling.

The current implementation focuses on rendering leaf transforms (composites are skipped explicitly), keeping the traversal simple while leaving room for future refinement of composite expansion strategies.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @YousufFFFF, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the Go SDK's DOT graph generation mechanism. Instead of relying on internal Go SDK graph structures, the system now processes the portable pipeline protocol buffer representation to create DOT graphs. This fundamental shift enhances compatibility with cross-language pipelines and aligns the Go SDK with the broader portable runner architecture, paving the way for more integrated and reusable tooling.

Highlights

  • Portable Pipeline Proto Integration: The Go SDK dot runner has been rewritten to generate DOT graphs directly from the portable pipeline protocol buffer representation.
  • Cross-Language Compatibility: This change enables the correct rendering of cross-language pipelines in DOT format.
  • Architectural Alignment: The implementation now aligns with the portable runner architecture, improving consistency and maintainability.
  • Future Reusability: The new approach facilitates future reuse in tools like Prism Runner and other portable tooling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • sdks/go/pkg/beam/runners/dot/dot.go
    • Replaced the dotlib import with graphx and added the fmt package.
    • Modified the Execute function to marshal pipeline edges into a portable protocol buffer representation using graphx.Marshal.
    • Implemented custom DOT graph generation logic by iterating over portable pipeline components and transforms.
    • Removed the previous reliance on dotlib.Render for DOT graph creation.
    • Added logic to build consumer relationships between PCollections and transforms, correctly handling and skipping composite transforms for accurate graph representation.
Activity
  • No specific activity (comments, reviews, or progress updates) was provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@YousufFFFF
Copy link
Copy Markdown
Author

Hi @mohamedawnallah!
CI checks are green and the implementation is complete.
Would appreciate your review when you have time.
Thanks!

@github-actions
Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @shunping for label go.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@YousufFFFF
Copy link
Copy Markdown
Author

Hi @lostluck !

I’ve pushed updates to align the implementation more closely with the portable pipeline proto.
The DOT generation now operates directly on pipeline.GetComponents() and skips composite transforms explicitly, rather than relying on the previous internal graph utilities.
This keeps the implementation simple for now and leaves room to evolve the composite expansion strategy later based on further discussion.
I’d really appreciate your feedback when you have time.

@YousufFFFF
Copy link
Copy Markdown
Author

R: @lostluck

@github-actions
Copy link
Copy Markdown
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Copy link
Copy Markdown
Contributor

@mohamedawnallah mohamedawnallah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @YousufFFFF for the PR! Left some comments.

Copy link
Copy Markdown
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all of Mohamed's comments so far.

In particular, the one about simply just rewriting this to use bespoke code instead of the package.

As a first pass, add a few simple test cases that check the returned output. This is just good practice when doing a refactoring to make sure that some things are working at least as good as they were before. This is not a high bar, at least, due to not having any tests in the first place! Even just a 3-4 very basic pipelines as a smoke test would go a long way.

This dot runner is very old, so it is unfortunate that it was authored without tests, since it was actively being used to look at pipeline shapes.


Also, do take a look at the python "render" runner for inspiration, which also creates dot representations.

class PipelineRenderer:

(Admittedly, having done a different python -> go conversion for prism's Fusion handling, the python handling is very Set Theory based rather than graph based, so it can take a moment to grasp what it's doing.)

@YousufFFFF
Copy link
Copy Markdown
Author

YousufFFFF commented Feb 25, 2026

Thank you @lostluck and @mohamedawnallah for the detailed review and guidance.

Based on the feedback, I’ve made the following updates:

• Removed the unused core/util/dot dependency.
• Verified there are no remaining usages in the Go SDK.
• Marked core/util/dot as Deprecated to discourage future use.
• Removed the redundant composite consumer check.
• Added deterministic topological sorting of transforms to ensure stable DOT emission.
• Added a basic deterministic output test for the DOT runner.

I also reviewed the Python render runner for reference and will keep alignment considerations in mind for future improvements while keeping this PR focused.

Appreciate the thoughtful suggestions, they helped improve clarity and maintainability.

@YousufFFFF
Copy link
Copy Markdown
Author

I agree with all of Mohamed's comments so far.

In particular, the one about simply just rewriting this to use bespoke code instead of the package.

As a first pass, add a few simple test cases that check the returned output. This is just good practice when doing a refactoring to make sure that some things are working at least as good as they were before. This is not a high bar, at least, due to not having any tests in the first place! Even just a 3-4 very basic pipelines as a smoke test would go a long way.

This dot runner is very old, so it is unfortunate that it was authored without tests, since it was actively being used to look at pipeline shapes.

Also, do take a look at the python "render" runner for inspiration, which also creates dot representations.

class PipelineRenderer:

(Admittedly, having done a different python -> go conversion for prism's Fusion handling, the python handling is very Set Theory based rather than graph based, so it can take a moment to grasp what it's doing.)

Thanks @lostluck - I’ll take a look at the Python render runner for reference.

That’s a good suggestion, especially since it already handles DOT generation in a more structured way. I’ll review its approach and see if there are any patterns or simplifications we can adopt here while keeping the Go implementation idiomatic.

Appreciate the pointer 👍

@YousufFFFF
Copy link
Copy Markdown
Author

Hey @lostluck
Just a gentle ping on this pr if it may have been buried.

Copy link
Copy Markdown
Contributor

@mohamedawnallah mohamedawnallah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @YousufFFFF for your patience, it is taking shape! Having no further comments.


// Package dot produces DOT graphs from Beam graph representations.
//
// Deprecated:This package is no longer used by the Beam Go SDK.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Deprecated:This package is no longer used by the Beam Go SDK.
// Deprecated: This package is no longer used by the Beam Go SDK.

Comment on lines +18 to +19
// Deprecated:This package is no longer used by the Beam Go SDK.
// It is slated for removal in a future Beam release.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to mentioning this here, perhaps worth adding it in Deprecations section in the release notes of the current unreleased version e.g
https://github.com/apache/beam/blob/master/CHANGES.md#deprecations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'd avoid bringing it up until after the more comprehensive refactoring is complete. It's a non-critical package that's not likely to be broadly used, so it doesn't require that attention at this stage.

)

func TestDotRunner_GeneratesDeterministicOutput(t *testing.T) {
ctx := context.Background()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an option of the built-in t.Context(). It is introduced in Go v1.24, and Beam Go SDK supports that version

go 1.25.0

ctx := context.Background()

// Create temporary DOT file
tmpFile, err := os.CreateTemp("", "dot_test_*.dot")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be the built-in t.TempDir() and it is removed automatically when outside the scope of test

@mohamedawnallah
Copy link
Copy Markdown
Contributor

Seeing if Gemini has any comments we can apply here

@mohamedawnallah
Copy link
Copy Markdown
Contributor

/gemini

Copy link
Copy Markdown
Contributor

@mohamedawnallah mohamedawnallah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed submitting that comment from the previous review. Thanks!


if !strings.Contains(content, "->") {
t.Fatalf("dot output contains no edges")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be something along those lines for more comprehensive checks?

Suggested change
}
// Validate the output is parseable as a basic DOT digraph.
// Checks: header, footer, and that every non-structural line is a valid edge.
lines := strings.Split(strings.TrimSpace(content), "\n")
if lines[0] != "digraph G {" {
t.Fatalf("missing digraph header, got: %s", lines[0])
}
if lines[len(lines)-1] != "}" {
t.Fatalf("missing closing brace, got: %s", lines[len(lines)-1])
}
for _, line := range lines[1 : len(lines)-1] {
line = strings.TrimSpace(line)
if line == "" {
continue
}
if !strings.Contains(line, "->") || !strings.HasSuffix(line, ";") {
t.Fatalf("invalid DOT edge line: %s", line)
}
}

@mohamedawnallah
Copy link
Copy Markdown
Contributor

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request rewrites the Go SDK dot runner to generate the DOT graph from the portable pipeline proto representation, which is a great improvement for supporting cross-language pipelines and aligning with the portable runner architecture. The implementation looks solid, generating the graph by traversing the topologically sorted transforms. I've added a few suggestions to improve robustness and testing.

Comment on lines +68 to +69
transforms := components.GetTransforms()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If pipeline.GetComponents() returns a non-nil Components but with a nil or empty Transforms map, pipelinex.TopologicalSort could panic if pipeline.GetRootTransformIds() returns a non-empty slice. It's safer to handle the case of no transforms explicitly to prevent a potential panic.

	transforms := components.GetTransforms()
	if len(transforms) == 0 {
		buf.WriteString("}\n")
		return nil, os.WriteFile(*dotFile, buf.Bytes(), 0644)
	}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be ignored, as that's not a problem the runner should be dealing with and indicates an issue with the submitting SDK.


// Package dot produces DOT graphs from Beam graph representations.
//
// Deprecated:This package is no longer used by the Beam Go SDK.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to Go's style guide for comments, there should be a space after the colon in a Deprecated: notice. This improves readability.

Suggested change
// Deprecated:This package is no longer used by the Beam Go SDK.
// Deprecated: This package is no longer used by the Beam Go SDK.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +46 to +66
// Run with dot runner
_, err = Execute(ctx, p)
if err != nil {
t.Fatalf("Execute failed: %v", err)
}

// Read generated file
data, err := os.ReadFile(tmpFile.Name())
if err != nil {
t.Fatalf("failed to read dot file: %v", err)
}

content := string(data)

if !strings.HasPrefix(content, "digraph G {") {
t.Fatalf("dot output missing header")
}

if !strings.Contains(content, "->") {
t.Fatalf("dot output contains no edges")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test name is TestDotRunner_GeneratesDeterministicOutput, but it doesn't actually verify determinism. It only performs a basic smoke test. To properly test for determinism, you could run the generation twice with identical pipelines and assert that the outputs are identical. This would make the test more aligned with its name and more robust.

	// Run with dot runner
	_, err = Execute(ctx, p)
	if err != nil {
		t.Fatalf("Execute failed: %v", err)
	}

	// Read generated file
	data1, err := os.ReadFile(tmpFile.Name())
	if err != nil {
		t.Fatalf("failed to read dot file: %v", err)
	}

	content1 := string(data1)

	if !strings.HasPrefix(content1, "digraph G {") {
		t.Fatalf("dot output missing header")
	}

	if !strings.Contains(content1, "->") {
		t.Fatalf("dot output contains no edges")
	}

	// Run again on a new identical pipeline to check for determinism.
	p2, s2 := beam.NewPipelineWithRoot()
	col2 := beam.Create(s2, "a", "b", "c")
	passert.Count(s2, col2, "", 3)

	_, err = Execute(ctx, p2)
	if err != nil {
		t.Fatalf("Execute on second pipeline failed: %v", err)
	}
	data2, err := os.ReadFile(tmpFile.Name())
	if err != nil {
		t.Fatalf("failed to read dot file on second run: %v", err)
	}

	if content1 != string(data2) {
		t.Errorf("output is not deterministic. Run 1:\n%s\nRun 2:\n%s", content1, string(data2))
	}

@lostluck
Copy link
Copy Markdown
Contributor

lostluck commented Mar 5, 2026

This hasn't escaped my attention, just how much time I could put in immediately due to travel and other responsibilities. I've scheduled looking at it again tomorrow morning. Thank you for your patience.

Copy link
Copy Markdown
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything to add to mohamed's and gemini's comments at this stage. Please address them, and I'll be more attentive of this next round so we can merge it in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request][Go SDK]: Rewrite the dot runner in terms of a Portable pipeline.

3 participants