Skip to content

fix: collapse consecutive dashes in NormalizeTag#17

Merged
wizzomafizzo merged 1 commit into
mainfrom
fix/normalize-consecutive-dashes
Apr 20, 2026
Merged

fix: collapse consecutive dashes in NormalizeTag#17
wizzomafizzo merged 1 commit into
mainfrom
fix/normalize-consecutive-dashes

Conversation

@wizzomafizzo
Copy link
Copy Markdown
Member

@wizzomafizzo wizzomafizzo commented Apr 20, 2026

Summary

  • NormalizeTag converts periods to dashes (step 5) and spaces to dashes (step 4), so an input like "V. Gabriel" produces "v--gabriel" (two consecutive dashes)
  • Add a final step that collapses any run of two or more dashes into a single dash, so the result is "v-gabriel"

This was discovered during smoke testing of TOSEC filename parsing: publisher names with initials like "V. Gabriel" were producing rev:1 instead of publisher:v-gabriel because parseCommaSeparatedTags split on - and matched the lone "v" token against the rev:1 mapping.

Summary by CodeRabbit

  • Bug Fixes
    • Improved tag normalization to collapse multiple consecutive dashes into a single dash for more consistent tag formatting.

"V. Gabriel" normalized to "v--gabriel" because the period converts to
a dash (step 5) and the space also converts to a dash (step 4), leaving
two consecutive dashes. Add a final step to collapse any run of dashes
into a single dash so the output is always "v-gabriel".
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

📝 Walkthrough

Walkthrough

Extended the NormalizeTag function to collapse runs of consecutive dashes into single dashes after special-character removal by introducing a new regex pattern. Updated variable declarations and corresponding test coverage to verify the normalization behavior.

Changes

Cohort / File(s) Summary
Tag Normalization Logic
tagfilter.go
Added regex pattern reConsecutiveDash to identify and collapse 2+ consecutive dashes into single dashes; reformatted variable declarations; updated function documentation to reflect new normalization step.
Tag Normalization Tests
tagfilter_test.go
Added test case to TestNormalizeTag verifying that input "V. Gabriel" normalizes to "v-gabriel" with consecutive dashes collapsed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 When dashes clash and dashes dash,
We collapse them with a swift regex flash!
Two become one, oh what a sight,
Tags now normalized just right.
The hoppy fix hops merrily along! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding dash-collapsing functionality to the NormalizeTag function. It is specific, clear, and directly reflects the primary objective of the pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/normalize-consecutive-dashes

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tagfilter_test.go (1)

67-71: Add boundary cases for dash-collapse behavior.

This regression test is good, but I recommend adding explicit cases for already-consecutive dashes (e.g., "a---b") and mixed punctuation runs to harden coverage of -{2,} behavior.

Proposed test additions
 		{
 			name:  "consecutive dashes collapsed",
 			input: "V. Gabriel",
 			want:  "v-gabriel",
 		},
+		{
+			name:  "three dashes collapsed to one",
+			input: "a---b",
+			want:  "a-b",
+		},
+		{
+			name:  "mixed punctuation generates one dash",
+			input: "A..  B",
+			want:  "a-b",
+		},

As per coding guidelines "**/*.go: Write tests for all new code - comprehensive coverage required".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tagfilter_test.go` around lines 67 - 71, Add explicit boundary test cases in
tagfilter_test.go around the existing table-driven case "consecutive dashes
collapsed": include inputs that already contain consecutive dashes (e.g.,
"a---b") and inputs with mixed punctuation runs (e.g., "a-.-b" or "a--!@-b") to
ensure the collapse logic for -{2,} is exercised; update the test table (the
slice of test structs used by the existing test function) by inserting new
entries with clear names (e.g., "already-consecutive dashes" and
"mixed-punctuation runs") and their expected outputs (normalized single dash
sequences) so the test function that iterates over the table validates these
boundary behaviors.
tagfilter.go (1)

33-34: Prefer rationale-focused comments over step-by-step behavior comments.

Line 33-34 and Line 55-56 still explain what the code does. Consider rewriting them to capture why this normalization exists (e.g., preventing downstream tokenization/mapping errors).

As per coding guidelines "Do not write comments that restate what the code does - comments should explain why, not what".

Also applies to: 55-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tagfilter.go` around lines 33 - 34, Replace the step-by-step comments that
start with "periods→dashes, remove special chars (except colon, dash, and
comma), and collapse consecutive dashes into one." and the related later comment
(around the tag normalization logic) with a rationale-focused comment that
explains why this normalization is required (e.g., to prevent downstream
tokenization/mapping errors, ensure consistent tag keys across inputs, and avoid
collisions from special characters), referencing the tag normalization block or
function name used there so future readers know the intent rather than the
mechanics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tagfilter_test.go`:
- Around line 67-71: Add explicit boundary test cases in tagfilter_test.go
around the existing table-driven case "consecutive dashes collapsed": include
inputs that already contain consecutive dashes (e.g., "a---b") and inputs with
mixed punctuation runs (e.g., "a-.-b" or "a--!@-b") to ensure the collapse logic
for -{2,} is exercised; update the test table (the slice of test structs used by
the existing test function) by inserting new entries with clear names (e.g.,
"already-consecutive dashes" and "mixed-punctuation runs") and their expected
outputs (normalized single dash sequences) so the test function that iterates
over the table validates these boundary behaviors.

In `@tagfilter.go`:
- Around line 33-34: Replace the step-by-step comments that start with
"periods→dashes, remove special chars (except colon, dash, and comma), and
collapse consecutive dashes into one." and the related later comment (around the
tag normalization logic) with a rationale-focused comment that explains why this
normalization is required (e.g., to prevent downstream tokenization/mapping
errors, ensure consistent tag keys across inputs, and avoid collisions from
special characters), referencing the tag normalization block or function name
used there so future readers know the intent rather than the mechanics.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf2cceb3-8674-4a8a-b246-1aae3523a23f

📥 Commits

Reviewing files that changed from the base of the PR and between bc87bb7 and 8c5cf31.

📒 Files selected for processing (2)
  • tagfilter.go
  • tagfilter_test.go

@wizzomafizzo wizzomafizzo merged commit 7e01051 into main Apr 20, 2026
12 checks passed
@wizzomafizzo wizzomafizzo deleted the fix/normalize-consecutive-dashes branch April 20, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant