Skip to content

fix: left-angle-bracket scenario (#733)#780

Merged
willkg merged 2 commits intomainfrom
733-left-angle-bracket
Mar 17, 2026
Merged

fix: left-angle-bracket scenario (#733)#780
willkg merged 2 commits intomainfrom
733-left-angle-bracket

Conversation

@willkg
Copy link
Member

@willkg willkg commented Mar 17, 2026

This handles the case where there's a < followed by one or more character such that it looks like the beginning of a start tag and then a space and then a thing that actually is a start tag. Something like:

<tag <b>text</b>

This fixes it by identifying the situation and then shoving everything after the space back into the character stream to get parsed again.

Fixes #733.

@willkg willkg force-pushed the 733-left-angle-bracket branch from d83105f to 5b4fb68 Compare March 17, 2026 20:24
and " " in token["data"]
):
# token["data"] may contain something that looks like the
# beginning of a tag, but isn't followed by an actual tag.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't clear. Need to make this clearer.

# if so, we want to reparse the tag, so we shove it back
# into the stream
head, rest = token["data"].split(" ", 1)
if rest.startswith("<"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't handle the case <foo <b>text</b>.

@willkg willkg force-pushed the 733-left-angle-bracket branch from 5b4fb68 to 5cb5c9f Compare March 17, 2026 20:33
@willkg willkg force-pushed the 733-left-angle-bracket branch 2 times, most recently from b70b9c0 to 29231f1 Compare March 17, 2026 20:57
This handles the case where there's a `<` followed by one or more
character such that it looks like the beginning of a start tag and then
a space and then a thing that actually is a start tag. Something like:

```
<tag <b>text</b>
```

This fixes it by identifying the situation and then shoving everything
after the space back into the character stream to get parsed again.
@willkg willkg force-pushed the 733-left-angle-bracket branch from 29231f1 to 0c2616d Compare March 17, 2026 21:05
last_error_token["data"]
in (
"invalid-character-in-attribute-name",
"invalid-character-after-attribute-name",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles both the <foo <b>text</b> (invalid-character-in-attribute-name) and <foo <bar <b>text</b> (both an invalid-character-in-attribute-name and invalid-character-after-attribute-name errors) cases.

@willkg willkg merged commit cd47b4c into main Mar 17, 2026
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Open left angle bracket followed immediately by an alpha character causes next tag to be sanitized

1 participant