fix: add repetition_penalty #155

ChenZiHong-Gavin · 2026-01-13T15:34:30Z

This PR made the following changes to vllm_wrapper:

Introduced Repetition Penalty
Enhanced Prompt Formatting: use self.tokenizer.apply_chat_template to format prompt
Increased Default Timeout: The default timeout value in the vllm_wrapper constructor has been increased from 300 to 600 seconds, allowing more time for generation requests to complete.

gemini-code-assist · 2026-01-13T15:34:45Z

Summary of Changes

Hello @ChenZiHong-Gavin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily enhances the vllm_wrapper by integrating a repetition_penalty parameter into the sampling process, which helps in generating more diverse and less repetitive output from the language model. Additionally, it includes a substantial refactor of the prompt building mechanism to use the model's tokenizer for more accurate and standardized chat template application. These changes aim to improve the quality and control over the model's text generation capabilities.

Highlights

Repetition Penalty: The repetition_penalty parameter has been introduced to the SamplingParams in the generate_answer method, allowing for better control over generated text to prevent repetitive phrases. It defaults to 1.05 if not provided.
Prompt Formatting Refactor: The _build_inputs method has been significantly refactored to utilize the tokenizer's apply_chat_template for constructing prompts. This change ensures more consistent and robust prompt formatting, especially for chat-based models, by leveraging the model's native tokenizer capabilities.
Parameter Validation Adjustment: The validation logic for temperature and top_p in SamplingParams has been updated to allow a value of 0 (i.e., >= 0 instead of > 0), which is common for deterministic generation settings.
Minor Code Refactoring: The assignment of result_generator has been moved inside the try blocks in both generate_answer and generate_topk_per_token methods for improved error handling and scope management.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

graphgen/models/llm/local/vllm_wrapper.py

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request introduces repetition_penalty to the sampling parameters and refactors the _build_inputs method to use tokenizer chat templates. While adding the new parameter is a good enhancement, the refactoring of _build_inputs introduces a critical issue: it changes the expected type for the history parameter, which breaks the contract with the BaseLLMWrapper base class and will cause runtime errors. Additionally, the new implementation modifies the history list in-place, which is an undesirable side effect. I've also noted a minor issue with an unused import. My review includes suggestions to address these problems.

I am having trouble creating individual review comments. Click here to see my feedback.

graphgen/models/llm/local/vllm_wrapper.py (48-56)

This refactoring has several issues:

Critical Bug (Contract Violation): The history parameter's type is now Optional[List[dict]], which is incompatible with the Optional[List[str]] defined in the BaseLLMWrapper base class methods. This violates the Liskov Substitution Principle and will cause runtime errors. The method signatures must be consistent across the class hierarchy.
Side Effect: The input history list is modified in-place. A copy should be used to avoid unexpected behavior for the caller.
Vague Type Hint: The return type should be str instead of Any for better type safety.

    def _build_inputs(self, prompt: str, history: Optional[List[dict]] = None) -> str:
        messages = (history or []).copy()
        messages.append({"role": "user", "content": prompt})

        return self.tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=True
        )

graphgen/models/llm/local/vllm_wrapper.py (2)

The stop import from tracemalloc is not used in this file and should be removed to maintain code cleanliness.

…ncelab/GraphGen into fix/fix-vllm-repeating

ChenZiHong-Gavin · 2026-01-14T04:16:36Z

/gemini summary

gemini-code-assist · 2026-01-14T04:16:54Z

Summary of Changes

This pull request significantly improves the vllm_wrapper's text generation capabilities by integrating a repetition_penalty to enhance output diversity and by overhauling the prompt formatting mechanism to leverage the tokenizer's native chat template application for greater consistency. These changes, alongside an extended default timeout and refined error handling, aim to make the language model interaction more robust and controllable.

Highlights

Introduced Repetition Penalty: A repetition_penalty parameter has been added to SamplingParams in the generate_answer method, defaulting to 1.05, to help prevent the generation of repetitive text.
Enhanced Prompt Formatting: The _build_inputs method has been significantly refactored to utilize the model's tokenizer (self.tokenizer.apply_chat_template) for constructing prompts. This ensures more consistent and robust prompt formatting, especially for chat-based models, by leveraging the tokenizer's native capabilities.
Increased Default Timeout: The default timeout value in the vllm_wrapper constructor has been increased from 300 to 600 seconds, allowing more time for generation requests to complete.
Improved Parameter Validation: The validation logic for temperature and top_p in SamplingParams has been adjusted to allow values greater than or equal to zero (>= 0), accommodating scenarios where zero values are valid (e.g., for deterministic generation).
Robust Error Handling: asyncio.TimeoutError has been added to the exception handling blocks in both generate_answer and generate_topk_per_token methods, improving the resilience of the generation process against timeouts.
Optimized Generator Consumption: The _consume_generator method now explicitly breaks its loop once request_output.finished is True, ensuring that only the final output is processed and improving efficiency.

Changelog

graphgen/models/llm/local/vllm_wrapper.py
- Increased the default timeout parameter in the __init__ method from 300 to 600.
- Initialized self.tokenizer using self.engine.engine.tokenizer.tokenizer in the __init__ method.
- Refactored the _build_inputs method: changed it from a static method to an instance method, updated its signature to accept List[dict], and implemented self.tokenizer.apply_chat_template for prompt construction.
- Modified _consume_generator to explicitly break the loop once request_output.finished is True.
- Added repetition_penalty=extra.get("repetition_penalty", 1.05) to SamplingParams in generate_answer.
- Updated temperature and top_p conditions in SamplingParams from > 0 to >= 0.
- Moved the result_generator = self.engine.generate(...) call inside the try block in both generate_answer and generate_topk_per_token.
- Added asyncio.TimeoutError to the exception handling tuple in generate_answer and generate_topk_per_token.

Activity

gemini-code-assist[bot] provided an initial summary of the pull request, highlighting the repetition penalty, prompt formatting refactor, parameter validation adjustment, and minor code refactoring.
github-code-quality[bot] identified an unused import of 'stop' in graphgen/models/llm/local/vllm_wrapper.py and suggested its removal.
ChenZiHong-Gavin requested a new summary from Gemini Code Assist.

fix: add repetition_penalty

0f325c8

github-actions bot added the core label Jan 13, 2026

github-code-quality bot found potential problems Jan 13, 2026

View reviewed changes

graphgen/models/llm/local/vllm_wrapper.py Fixed Show fixed Hide fixed

Potential fix for pull request finding 'Unused import'

3387988

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

gemini-code-assist bot reviewed Jan 13, 2026

View reviewed changes

ChenZiHong-Gavin added 4 commits January 14, 2026 00:02

fix: catch timeouterror

5e57ac4

Merge branch 'fix/fix-vllm-repeating' of https://github.com/open-scie…

82658d6

…ncelab/GraphGen into fix/fix-vllm-repeating

fix: fix no attribute timeout

4198cde

fix: change vllm timeout from 300 to 600

b03a9d1

ChenZiHong-Gavin merged commit d1b3b6e into main Jan 14, 2026
6 checks passed

ChenZiHong-Gavin deleted the fix/fix-vllm-repeating branch January 14, 2026 04:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add repetition_penalty #155

fix: add repetition_penalty #155

Uh oh!

ChenZiHong-Gavin commented Jan 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

ChenZiHong-Gavin commented Jan 14, 2026

Uh oh!

gemini-code-assist bot commented Jan 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: add repetition_penalty #155

fix: add repetition_penalty #155

Uh oh!

Conversation

ChenZiHong-Gavin commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Jan 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

graphgen/models/llm/local/vllm_wrapper.py (48-56)

graphgen/models/llm/local/vllm_wrapper.py (2)

Uh oh!

ChenZiHong-Gavin commented Jan 14, 2026

Uh oh!

gemini-code-assist bot commented Jan 14, 2026

Summary of Changes

Highlights

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ChenZiHong-Gavin commented Jan 13, 2026 •

edited

Loading