Interact with LLM in JSON #743

DonggeLiu · 2024-12-05T21:21:32Z

Currently we use XML-tag, it would be nice to use JSON as that seems to be more common, e.g., in Tool API.

One potential issue is escaping special characters. We have seen cases where LLM tend to add redundant \s, which may have performance impact.

The text was updated successfully, but these errors were encountered:

nugget-cloud · 2025-03-02T14:36:34Z

@DonggeLiu Hi can you assign this issue to me

DonggeLiu · 2025-03-03T00:08:37Z

Hi @nugget-cloud Thanks for willing to help.

#744 could be a good starting point for you : )
If you could start a PR based on #744, then once you merge, we will be able to run an experiment and see the results in the PR.

nugget-cloud · 2025-03-03T10:38:58Z

Hi, @DonggeLiu
I am trying to understand the code changes I need to make to interact with the LLM in JSON

in the prompts directory, change the XML to JSON
make changes in the llm_toolkit/prompts_builder and prompts and output_parser.py to handle JSON output

Please correct me if I am wrong here

DonggeLiu · 2025-03-03T12:28:48Z

Thanks @nugget-cloud!

Before we dive in, I should mention that this is exploratory work, meaning it may improve results or lead to lower performance. Potential challenges include handling quoted/unquoted strings in JSON, confusing LLM somehow, or unexpected compatibility issues. Unfortunately, if the outcome turns out to be worse than the current approach (e.g., a lower fuzz target build rate), we won’t be able to merge the changes.

If you are keen, I suggest breaking this task into stages, starting with a simpler change before moving to more complex ones.

Stage 1: Converting XML prompts to JSON

The goal here is to improve clarity in our requests to the LLM by switching from XML to JSON, which is a more commonly used format. The core logic remains the same.

TODOs:

This is essentially #744.
You can fix the conflict and may also need to check for places (e.g., in agents/) where prompts are constructed dynamically instead of using templates in prompts/.
No changes are needed for prompt_builder.py or prompts.py since we still expect the LLM to return plain text responses.

We can stop here if Stage 2 turns out to be too complex—your help is greatly appreciated regardless!

Stage 2: Structuring LLM Responses in JSON

This stage requires deeper modifications, ensuring the LLM directly returns structured JSON responses rather than plain text.

TODOs:

Use GenerationConfig to enforce JSON responses. This involves defining a response_schema, which varies based on the expected output.
To simplify schema management, we will likely need to modify prompt_builder.py and prompts.py, potentially introducing a new class that extends existing text-based ones but is tailored for JSON-based responses.

If done correctly, we shouldn't need major changes to output_parser.py, as the responses should already be well-formatted. Minor adjustments, such as handling unquoted text, may be required.

Once this is complete, you will have made significant progress, and we can confidently close this issue.

Stage 3 (Optional): Encapsulating LLM Interactions as Function Calls

If you're interested in exploring this further, we could encapsulate LLM interactions as function calls. The idea is to explicitly define how the LLM's responses will be used by a function and request it to return structured parameter values accordingly.

For example, instead of directly asking the LLM to generate a fuzz target, we define a function that takes a fuzz target as a parameter and instruct the LLM to generate the appropriate parameter value.

In practice, this is more complex, but much of the groundwork has already been done in #731, which I put on hold due to other priorities. Feel free to pick it up if you're interested!

This was referenced Dec 5, 2024

Support running multiple bash commands and compile in one query #736

Merged

Replace XML tags with JSON #744

Open

DonggeLiu assigned nugget-cloud Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interact with LLM in JSON #743

Interact with LLM in JSON #743

DonggeLiu commented Dec 5, 2024

nugget-cloud commented Mar 2, 2025

DonggeLiu commented Mar 3, 2025 •

edited

Loading

nugget-cloud commented Mar 3, 2025

DonggeLiu commented Mar 3, 2025 •

edited

Loading

Interact with LLM in JSON #743

Interact with LLM in JSON #743

Comments

DonggeLiu commented Dec 5, 2024

nugget-cloud commented Mar 2, 2025

DonggeLiu commented Mar 3, 2025 • edited Loading

nugget-cloud commented Mar 3, 2025

DonggeLiu commented Mar 3, 2025 • edited Loading

Stage 1: Converting XML prompts to JSON

Stage 2: Structuring LLM Responses in JSON

Stage 3 (Optional): Encapsulating LLM Interactions as Function Calls

DonggeLiu commented Mar 3, 2025 •

edited

Loading

DonggeLiu commented Mar 3, 2025 •

edited

Loading