LLM code generation

improve by adding context

Published: 2025-03-17

Contents

Using Large Language Models Effectively
Testing Code Generation LLMs for Simian Web Apps
From burden to benefit

Using Large Language Models Effectively

Large Language Models (LLMs) generate responses by creating variations of what they have learned, aiming to best fulfill your prompt. However, these suggestions are not always optimal—or even correct—for your specific situation.

To improve accuracy and relevance, providing context is key. This can include referencing previous work or supplying purpose-written files that clarify what information should be used.

When used correctly and intelligently, LLMs can be a powerful tool for productivity and creativity. Without proper guidance, however, they can quickly become more of a burden than a benefit.

Testing Code Generation LLMs for Simian Web Apps

Code generation LLMs are trained on widely used programming libraries, making them adaptable to a broad range of projects.

We tested Codestral and Copilot to generate code for Simian Web Apps using the Simian Python library. In most cases, the generated code contained hallucinations and bugs, requiring manual corrections.

However, we found that providing context significantly improved the results. By including existing web app code in our prompts, the LLMs generated more relevant suggestions—especially for constructs already present in the provided context files. This highlights the importance of guiding LLMs with project-specific information to achieve better accuracy and reliability.

From burden to benefit

By carefully crafting a context file, we successfully improved the quality of code generated for our Simian Web Apps, making it more accurate and relevant to our needs. In the context file we defined what is available in our library, what it does and when something should be used.

Our Simian Python context file is available from Github repo. For human readers we do recommend using the documentation though.

Example: Defining a construct

We have a very simple control hiding definition with an isequal condition on the value of another control:

# Hide the control when the value of "counter" equals 0.
control.show = False
control.when = "counter"
control.eq = 0

Adding this snippet to our context allowed the models to hide controls when the value of another control equals the given value.

Preventing hallucinations

When outside the bounds of the context, the model will use constructs from similar code that seem logical, but is not necessarily correct for the situation.

For greater-than (and other non-equal) comparisons another, (slower) construct needs te be used. The model, however, decided to use a non-existing 'gt' attribute for the task. Conventional tab-completion would never have created this broken, yet convincing looking code. Without knowledge on the library you would not easily be able to identify what is going wrong.

We were able to prevent these suggestions and get the correct ones by adding a comment to the context:

condition.eq = 2  # Does not contain gt, lt, ge, le.

Which, in itself, is proof of the potential of LLMs.