[Go SDK] Rewrite dot runner to generate DOT from portable pipeline proto#37673
[Go SDK] Rewrite dot runner to generate DOT from portable pipeline proto#37673YousufFFFF wants to merge 3 commits intoapache:masterfrom
Conversation
Summary of ChangesHello @YousufFFFF, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Go SDK's DOT graph generation mechanism. Instead of relying on internal Go SDK graph structures, the system now processes the portable pipeline protocol buffer representation to create DOT graphs. This fundamental shift enhances compatibility with cross-language pipelines and aligns the Go SDK with the broader portable runner architecture, paving the way for more integrated and reusable tooling. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Hi @mohamedawnallah! |
|
Assigning reviewers: R: @shunping for label go. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Hi @lostluck ! I’ve pushed updates to align the implementation more closely with the portable pipeline proto. |
|
R: @lostluck |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
There was a problem hiding this comment.
Thanks @YousufFFFF for the PR! Left some comments.
lostluck
left a comment
There was a problem hiding this comment.
I agree with all of Mohamed's comments so far.
In particular, the one about simply just rewriting this to use bespoke code instead of the package.
As a first pass, add a few simple test cases that check the returned output. This is just good practice when doing a refactoring to make sure that some things are working at least as good as they were before. This is not a high bar, at least, due to not having any tests in the first place! Even just a 3-4 very basic pipelines as a smoke test would go a long way.
This dot runner is very old, so it is unfortunate that it was authored without tests, since it was actively being used to look at pipeline shapes.
Also, do take a look at the python "render" runner for inspiration, which also creates dot representations.
(Admittedly, having done a different python -> go conversion for prism's Fusion handling, the python handling is very Set Theory based rather than graph based, so it can take a moment to grasp what it's doing.)
…s, add deterministic ordering and test
…ecated and removed an unreachable block of test code
|
Thank you @lostluck and @mohamedawnallah for the detailed review and guidance. Based on the feedback, I’ve made the following updates: • Removed the unused core/util/dot dependency. I also reviewed the Python render runner for reference and will keep alignment considerations in mind for future improvements while keeping this PR focused. Appreciate the thoughtful suggestions, they helped improve clarity and maintainability. |
Thanks @lostluck - I’ll take a look at the Python render runner for reference. That’s a good suggestion, especially since it already handles DOT generation in a more structured way. I’ll review its approach and see if there are any patterns or simplifications we can adopt here while keeping the Go implementation idiomatic. Appreciate the pointer 👍 |
|
Hey @lostluck |
mohamedawnallah
left a comment
There was a problem hiding this comment.
Thanks @YousufFFFF for your patience, it is taking shape! Having no further comments.
|
|
||
| // Package dot produces DOT graphs from Beam graph representations. | ||
| // | ||
| // Deprecated:This package is no longer used by the Beam Go SDK. |
There was a problem hiding this comment.
| // Deprecated:This package is no longer used by the Beam Go SDK. | |
| // Deprecated: This package is no longer used by the Beam Go SDK. |
| // Deprecated:This package is no longer used by the Beam Go SDK. | ||
| // It is slated for removal in a future Beam release. |
There was a problem hiding this comment.
In addition to mentioning this here, perhaps worth adding it in Deprecations section in the release notes of the current unreleased version e.g
https://github.com/apache/beam/blob/master/CHANGES.md#deprecations
There was a problem hiding this comment.
In this case, I'd avoid bringing it up until after the more comprehensive refactoring is complete. It's a non-critical package that's not likely to be broadly used, so it doesn't require that attention at this stage.
| ) | ||
|
|
||
| func TestDotRunner_GeneratesDeterministicOutput(t *testing.T) { | ||
| ctx := context.Background() |
There was a problem hiding this comment.
There is an option of the built-in t.Context(). It is introduced in Go v1.24, and Beam Go SDK supports that version
Line 23 in 3197d88
| ctx := context.Background() | ||
|
|
||
| // Create temporary DOT file | ||
| tmpFile, err := os.CreateTemp("", "dot_test_*.dot") |
There was a problem hiding this comment.
can be the built-in t.TempDir() and it is removed automatically when outside the scope of test
|
Seeing if Gemini has any comments we can apply here |
|
/gemini |
|
|
||
| if !strings.Contains(content, "->") { | ||
| t.Fatalf("dot output contains no edges") | ||
| } |
There was a problem hiding this comment.
Can it be something along those lines for more comprehensive checks?
| } | |
| // Validate the output is parseable as a basic DOT digraph. | |
| // Checks: header, footer, and that every non-structural line is a valid edge. | |
| lines := strings.Split(strings.TrimSpace(content), "\n") | |
| if lines[0] != "digraph G {" { | |
| t.Fatalf("missing digraph header, got: %s", lines[0]) | |
| } | |
| if lines[len(lines)-1] != "}" { | |
| t.Fatalf("missing closing brace, got: %s", lines[len(lines)-1]) | |
| } | |
| for _, line := range lines[1 : len(lines)-1] { | |
| line = strings.TrimSpace(line) | |
| if line == "" { | |
| continue | |
| } | |
| if !strings.Contains(line, "->") || !strings.HasSuffix(line, ";") { | |
| t.Fatalf("invalid DOT edge line: %s", line) | |
| } | |
| } |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request rewrites the Go SDK dot runner to generate the DOT graph from the portable pipeline proto representation, which is a great improvement for supporting cross-language pipelines and aligning with the portable runner architecture. The implementation looks solid, generating the graph by traversing the topologically sorted transforms. I've added a few suggestions to improve robustness and testing.
| transforms := components.GetTransforms() | ||
|
|
There was a problem hiding this comment.
If pipeline.GetComponents() returns a non-nil Components but with a nil or empty Transforms map, pipelinex.TopologicalSort could panic if pipeline.GetRootTransformIds() returns a non-empty slice. It's safer to handle the case of no transforms explicitly to prevent a potential panic.
transforms := components.GetTransforms()
if len(transforms) == 0 {
buf.WriteString("}\n")
return nil, os.WriteFile(*dotFile, buf.Bytes(), 0644)
}There was a problem hiding this comment.
This can be ignored, as that's not a problem the runner should be dealing with and indicates an issue with the submitting SDK.
|
|
||
| // Package dot produces DOT graphs from Beam graph representations. | ||
| // | ||
| // Deprecated:This package is no longer used by the Beam Go SDK. |
There was a problem hiding this comment.
| // Run with dot runner | ||
| _, err = Execute(ctx, p) | ||
| if err != nil { | ||
| t.Fatalf("Execute failed: %v", err) | ||
| } | ||
|
|
||
| // Read generated file | ||
| data, err := os.ReadFile(tmpFile.Name()) | ||
| if err != nil { | ||
| t.Fatalf("failed to read dot file: %v", err) | ||
| } | ||
|
|
||
| content := string(data) | ||
|
|
||
| if !strings.HasPrefix(content, "digraph G {") { | ||
| t.Fatalf("dot output missing header") | ||
| } | ||
|
|
||
| if !strings.Contains(content, "->") { | ||
| t.Fatalf("dot output contains no edges") | ||
| } |
There was a problem hiding this comment.
The test name is TestDotRunner_GeneratesDeterministicOutput, but it doesn't actually verify determinism. It only performs a basic smoke test. To properly test for determinism, you could run the generation twice with identical pipelines and assert that the outputs are identical. This would make the test more aligned with its name and more robust.
// Run with dot runner
_, err = Execute(ctx, p)
if err != nil {
t.Fatalf("Execute failed: %v", err)
}
// Read generated file
data1, err := os.ReadFile(tmpFile.Name())
if err != nil {
t.Fatalf("failed to read dot file: %v", err)
}
content1 := string(data1)
if !strings.HasPrefix(content1, "digraph G {") {
t.Fatalf("dot output missing header")
}
if !strings.Contains(content1, "->") {
t.Fatalf("dot output contains no edges")
}
// Run again on a new identical pipeline to check for determinism.
p2, s2 := beam.NewPipelineWithRoot()
col2 := beam.Create(s2, "a", "b", "c")
passert.Count(s2, col2, "", 3)
_, err = Execute(ctx, p2)
if err != nil {
t.Fatalf("Execute on second pipeline failed: %v", err)
}
data2, err := os.ReadFile(tmpFile.Name())
if err != nil {
t.Fatalf("failed to read dot file on second run: %v", err)
}
if content1 != string(data2) {
t.Errorf("output is not deterministic. Run 1:\n%s\nRun 2:\n%s", content1, string(data2))
}|
This hasn't escaped my attention, just how much time I could put in immediately due to travel and other responsibilities. I've scheduled looking at it again tomorrow morning. Thank you for your patience. |
lostluck
left a comment
There was a problem hiding this comment.
I don't have anything to add to mohamed's and gemini's comments at this stage. Please address them, and I'll be more attentive of this next round so we can merge it in.
Fixes #27508
This change rewrites the Go SDK dot runner to generate the DOT graph from the portable pipeline proto representation instead of relying on Go SDK internal graph structures.
By basing DOT generation on the portable pipeline model:
• Cross-language pipelines can now be rendered correctly.
• The implementation aligns with the portable runner architecture.
• It enables future reuse in Prism Runner and other portable tooling.
The current implementation focuses on rendering leaf transforms (composites are skipped explicitly), keeping the traversal simple while leaving room for future refinement of composite expansion strategies.