fix: collapse consecutive dashes in NormalizeTag#17
Conversation
"V. Gabriel" normalized to "v--gabriel" because the period converts to a dash (step 5) and the space also converts to a dash (step 4), leaving two consecutive dashes. Add a final step to collapse any run of dashes into a single dash so the output is always "v-gabriel".
📝 WalkthroughWalkthroughExtended the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tagfilter_test.go (1)
67-71: Add boundary cases for dash-collapse behavior.This regression test is good, but I recommend adding explicit cases for already-consecutive dashes (e.g.,
"a---b") and mixed punctuation runs to harden coverage of-{2,}behavior.Proposed test additions
{ name: "consecutive dashes collapsed", input: "V. Gabriel", want: "v-gabriel", }, + { + name: "three dashes collapsed to one", + input: "a---b", + want: "a-b", + }, + { + name: "mixed punctuation generates one dash", + input: "A.. B", + want: "a-b", + },As per coding guidelines "
**/*.go: Write tests for all new code - comprehensive coverage required".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tagfilter_test.go` around lines 67 - 71, Add explicit boundary test cases in tagfilter_test.go around the existing table-driven case "consecutive dashes collapsed": include inputs that already contain consecutive dashes (e.g., "a---b") and inputs with mixed punctuation runs (e.g., "a-.-b" or "a--!@-b") to ensure the collapse logic for -{2,} is exercised; update the test table (the slice of test structs used by the existing test function) by inserting new entries with clear names (e.g., "already-consecutive dashes" and "mixed-punctuation runs") and their expected outputs (normalized single dash sequences) so the test function that iterates over the table validates these boundary behaviors.tagfilter.go (1)
33-34: Prefer rationale-focused comments over step-by-step behavior comments.Line 33-34 and Line 55-56 still explain what the code does. Consider rewriting them to capture why this normalization exists (e.g., preventing downstream tokenization/mapping errors).
As per coding guidelines "Do not write comments that restate what the code does - comments should explain why, not what".
Also applies to: 55-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tagfilter.go` around lines 33 - 34, Replace the step-by-step comments that start with "periods→dashes, remove special chars (except colon, dash, and comma), and collapse consecutive dashes into one." and the related later comment (around the tag normalization logic) with a rationale-focused comment that explains why this normalization is required (e.g., to prevent downstream tokenization/mapping errors, ensure consistent tag keys across inputs, and avoid collisions from special characters), referencing the tag normalization block or function name used there so future readers know the intent rather than the mechanics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tagfilter_test.go`:
- Around line 67-71: Add explicit boundary test cases in tagfilter_test.go
around the existing table-driven case "consecutive dashes collapsed": include
inputs that already contain consecutive dashes (e.g., "a---b") and inputs with
mixed punctuation runs (e.g., "a-.-b" or "a--!@-b") to ensure the collapse logic
for -{2,} is exercised; update the test table (the slice of test structs used by
the existing test function) by inserting new entries with clear names (e.g.,
"already-consecutive dashes" and "mixed-punctuation runs") and their expected
outputs (normalized single dash sequences) so the test function that iterates
over the table validates these boundary behaviors.
In `@tagfilter.go`:
- Around line 33-34: Replace the step-by-step comments that start with
"periods→dashes, remove special chars (except colon, dash, and comma), and
collapse consecutive dashes into one." and the related later comment (around the
tag normalization logic) with a rationale-focused comment that explains why this
normalization is required (e.g., to prevent downstream tokenization/mapping
errors, ensure consistent tag keys across inputs, and avoid collisions from
special characters), referencing the tag normalization block or function name
used there so future readers know the intent rather than the mechanics.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bf2cceb3-8674-4a8a-b246-1aae3523a23f
📒 Files selected for processing (2)
tagfilter.gotagfilter_test.go
Summary
NormalizeTagconverts periods to dashes (step 5) and spaces to dashes (step 4), so an input like"V. Gabriel"produces"v--gabriel"(two consecutive dashes)"v-gabriel"This was discovered during smoke testing of TOSEC filename parsing: publisher names with initials like
"V. Gabriel"were producingrev:1instead ofpublisher:v-gabrielbecauseparseCommaSeparatedTagssplit on-and matched the lone"v"token against therev:1mapping.Summary by CodeRabbit