A Weekend with Code Llama

7 min readOct 9, 2023

I used the Code Llama LLM with IDE integration for a weekend, it’s pretty neat. If you’re a software engineer curious about code-generation LLMs, you should try it!

I’ve been using ChatGPT in personal coding side projects for several months. I find it useful to solve simple tasks like parsing strings, making HTTP calls, etc, and discovering interface or language features. This allows me to spend more energy on what I actually find enjoyable in weekend side-projects.

My workflow is generally to get a coding task to “work” by myself with some mocked-out classes or methods. Once happy, I give an English description of the mocks’ expected behavior, and some examples to ChatGPT. I then fold the resulting generated code into my project, modifying it myself to get that last 25% fit/finish.

After reading about Meta’s open-source Code Llama model in August 2023, I realized a fully-local, project-aware LLM coding tool was on the horizon.

Last week I discovered Ollama, a fantastic wrapper around Llama-based models now has an official docker image to abstract away all the typical AI environment setup horrors. With that development, I decided it was finally time to give LLMs in my IDE a try!

The tools I used are:

Ollama — the fore-mentioned Llama wrapper that runs locally
Llama 2 codellama:13b model
Continue — a VSCode plugin to enable LLM-assisted coding.
VSCode — my preferred IDE for Typescript
Docker and Windows Subsystem for Linux

My Experience

TL;DR: My experience coding for a weekend (~6–8 hours) with Code Llama 13b in my IDE was a mild convenience and latency improvement over a browser-based chatbot like ChatGPT, with a minor quality regression, especially while I was learning how it “likes” to be prompted.

Regardless, the privacy, latency, and in-IDE user experience improvements over ChatGPT coding help were enough for me to continue to use it moving forward.

Example: Refactoring and Logging

Below is a representative example of my first-try experience.

One of the things I did over the weekend was add minor logging improvements to a gaming side project to troubleshoot a bug. While I was adding log statements, I noticed something that was worth refactoring, and had the thought to try this Code Llama tool setup.

I had a simple debug method which accepted an object: any. debug conditionally writes the log to avoid serialization in tight loops and huge inputs, which meaningfully slows down my API's latency.

A pattern I developed when using debug in my application was to emit the calling method as a part of the object for traceability:

debug({ method: 'getReplaySummary', options });

I imagined there was some reflection- or stack-inspection way to get the calling method from within debug, making the method a bit cleaner to use, and resilient to renaming:

debug({ options });

I’m a novice at JS/TS and didn’t know off the top of my head how to get that information.

So I asked Code Llama!

I highlighted the relevant code, then used the prompt:

The ‘debug’ method takes an object with a ‘method’ key. As you see, the method is ‘getReplaySummary’. How would I change the internal implementation of ‘debug’ to calculate the calling method’s name automatically.

Code Llama’s response (on the left of the above image) was useful, and the in-IDE LLM UX was convenient. But this isn’t a mind-blowing improvement over ChatGPT or even a Google search, the only real difference is using the provided highlighted context to name the methods. This “conversation” is essentially the #1 hit for the equivalent Google search.

When I added context (the entire contents of logger.ts) to the conversation, it didn't quite work as expected, but in my mind shows where this tech is going. What I wanted was to change the method, but Code Llama didn’t quite read my mind enough, instead adding a somewhat strange wrapper class.

Once I was direct about what I wanted, just change the existing debug method, it made the suggestion, after a fashion. It didn’t understand the spirit of the refactoring, but again I can see where this tech is going.

What I wanted was: this use of debug:

export async function getReplaySummary(options: ReplayOptions) ... {
  debug({ options });
  ...

to result in this log message:

{
  "method": "getReplaySummary",
  "options" { ... } 
}

From the above response, I could easily take the solution generated by Code Llama and fold it into my project. But I was curious, and tried to get Code Llama to write exactly what I wanted.

What transpired was a pretty frustrating oscillation of context loss and intent. Summarized, the exchange was:

Me: I want this logged: { options: options, method: ‘getReplaySummary’ }
Llama:

export const debug = (object: any) => {
  if (l.isLevelEnabled('debug')) {
    l.debug(`${JSON.stringify(object)}`, { method: 'getReplaySummary' });
  }
};

Me: This does not log one object, it logs with two parameters, and doesn’t calculate the method automatically.
Llama: You’re correct! I apologize for the confusion. [Me: lol.. ok… so?]
Me: Make it log one object.
Llama:

export const debug = (object: any) => {
  if (l.isLevelEnabled('debug')) {
    l.debug(`${JSON.stringify(object)}`, { method: 'getReplaySummary' });
  }
};

[Llama forgot the previous parts of the prompt to log the method]
Me: Remember that I want to calculate the ‘method’ key automatically.

export const debug = (object: any) => {
  if (l.isLevelEnabled('debug')) {
    l.debug(`${JSON.stringify(object)}`, { method: arguments.callee });
  }
};

[Llama forgot the previous parts of the prompt to log as one object]

…and so it continued oscillating until I gave up.

What it looks like is 13b wasn't able to utilize the multiple instructions and clarifications in the context fully — it “forgot” about the previous conversation.

After a few more interactions like this, I realized that getting the prompt right (prompt engineering) dramatically changes the response. For example:

Examples of input/output/side-effects help a lot
Including what you do not want to happen helps too
Don’t use jargon or keywords that aren’t included in the context
Avoiding use of libraries, unless you also include them in the context

The prompt that eventually got the debug function to what I wanted followed the tips 1, 3, 4.

In the end, it wasn’t immediately worth the time, in this example, to use Code Llama to completely solve this simple task. The real value-add was the quick, context-aware nudge towards the language feature arguments.callee I didn't know about. I can also see that after I get some prompt engineering skill it could be used to fully solve problems the same way I use ChatGPT.

Setup and Configuration

This post isn’t meant to be a setup guide for the above-mentioned tools, mostly I wanted to share my experience. For detailed instructions, visit the above-mentioned websites. In total, having WSL and Docker previously installed and no previous experience with either Ollama or VSCode plugins, setup took me about one hour.

Below are a few one-liners and configs that might help:

Ollama

As I wanted local-only LLM tools, I ran ollama within a docker network that disabled outgoing connections (ref). There’s probably a more secure way to do this, but having a no-internet docker network is a handy tool for situations like this.

docker network create \
  --driver bridge \
  --subnet 172.18.0.0/16 \
  --gateway 172.18.0.1 \
  no-internet

..then boot up the ollama container and watch the logs.

docker run \
  --network no-internet \
  --detach \
  --gpus=all \
  --volume ollama:/root/.ollama \
  --port 11434:11434 \
  --name ollama \
  ollama/ollama 
docker logs --follow /ollama

And finally, start prompting it via the Continue tab in VSCode’s left-side plugin bar.

Continue

After installing Continue, the VSCode plugin that connects to Code Llama (among other models), I selected the codellama:13b model (see why below), using the model selector.

I adjusted the temperature and disabled telemetry in the UI-referenced config.py.

config = ContinueConfig(
    allow_anonymous_telemetry=False,
    models=Models(
		default=Ollama(
			title="codellama:13b",
			system_message="",
			model="codellama:13b"
		)
	),
    system_message="",
    temperature=0.85,

Crashing

One issue I did have through my weekend trial, was crashes. The ollama process crashes frequently, mostly when it runs out of memory. Additionally, when this would happen, Continue would also hang.

When Code Llama is active, responding to a query or generating code, it use an enormous amount of memory.

After anywhere from 3–20 interactions (query, response cycles) with the 34b parameter model, the context seems to build up to the point where even on my pretty beefy system, with 32GB RAM and RTX 4090/24GB VRAM it segfaulted from running out of memory.

Here you can see the conversation happening in the VRAM graph:

My first stab to solve this this was to relax to the 13b parameter model, which stopped crashing enough for me to continue experimenting.

I wasn’t able to immediately tell the difference in responses at a glance, but from what I’ve read, the higher parameter models do perform much better. I think this is probably something I can find a workaround for, and I intend to, but I haven’t yet. If you have any ideas, let me know!

Conclusion

Was my first experience using Code Llama for weekend coding mind-blowing?
No, not really..
..but it was cool, and I’m excited about LLM-assisted IDEs in a year or two.

Was it useful?
About as useful as Google and Stack Overflow, which is to say, very!

Will I keep experimenting with Code Llama?
Absolutely!