Building an AI-Powered Code Evolution System A Frugal Developer's Journey
AI is fundamentally changing how we write and evaluate code. As a solo developer juggling between different AI coding assistants, I’ve explored everything from OpenAI’s offerings to Claude (my personal favorite) to DeepSeek. Each brings something unique to the table, but there’s a catch that every independent developer knows too well - API costs add up fast.
Why Local AI? A Practical Choice
After watching my OpenAI credits vanish faster than free pizza at a developer meetup (and probably spending enough on API calls to buy several pizzas), I started exploring alternatives. Enter Ollama - an open-source project that lets you run powerful language models locally.
📘 What is Ollama? Ollama is a framework that makes it easy to run large language models locally. It handles model management, provides a simple API, and optimizes performance for your hardware. Learn more about Ollama
The Fun Begins: An AI Experiment
One evening, while manually copying code between ChatGPT and Claude for generation and review, I had a thought: Why not make the AIs talk to each other? The results were… interesting. Sometimes brilliant, sometimes hilariously off-base. It was like watching two rubber ducks debug each other while occasionally speaking in tongues.
Adventures with DeepSeek: A Tale of Two Models
My journey involved experimenting with both DeepSeek-R1 14B and 1.5B variants. The results? Let’s just say it was a choice between accuracy and keeping my MacBook Pro from achieving liftoff.
The 14B Experience: Power Meets Heat
First, I tried DeepSeek-R1 14B:
- Incredibly Accurate: The responses were spot-on
- Superior Understanding: Handled complex code patterns beautifully
- Resource Hungry: My MacBook Pro nearly went up in flames
- Fan Concert: Constant max-speed fan noise became my new workspace ambiance
The 1.5B Compromise: Keeping Cool
After some thermal throttling adventures, I settled on DeepSeek 1.5B for this experiment:
- Good Enough: Still produces solid code most of the time
- Sometimes Bananas: The hallucinations can be entertaining
- Resource Friendly: My MacBook can actually breathe
- Free to Experiment: No API costs and no fire extinguisher needed
The system configuration is straightforward:
const CONFIG = {
MODEL: 'deepseek-r1:1.5b', // Previously used 14b but switched for thermal reasons
DEFAULT_ROUNDS: 5,
MIN_ROUNDS: 2,
MAX_ROUNDS: 6,
PORTS: {
DEFAULT: 5100,
SOLVER: 11434,
REVIEWER: 11435
},
CORS_ORIGINS: ['http://localhost:5173']
};
System Architecture: The Complete Picture
Here’s a breakdown of the system’s core components:
The system is built using Express.js and interacts with two Ollama instances:
- A solver (port 11434) that generates code solutions
- A reviewer (port 11435) that evaluates the code
The Code Generation Process
The solver instance generates solutions using Ollama’s chat API:
async generateSolutionStream(prompt, previousSolution = "", reviewFeedback = "") {
try {
const contextPrompt = previousSolution && reviewFeedback
? `Previous solution:\n\`\`\`javascript\n${previousSolution}\n\`\`\`\n\n${reviewFeedback}\n\nImprove the solution based on the feedback.`
: `Create a JavaScript solution for: ${prompt}\n\nReturn ONLY the code.`;
const response = await fetch(`${this.solverHost}/api/chat`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: CONFIG.MODEL,
messages: [
{
role: "system",
content: "You are a JavaScript expert. Provide only clean, working code without explanations."
},
{
role: "user",
content: contextPrompt,
}
],
stream: true,
})
});
if (!response.ok) {
throw new Error(`Solver API error: ${response.statusText}`);
}
return response;
} catch (error) {
throw new Error(`Solution generation failed: ${error.message}`);
}
}
The Review Process
The reviewer evaluates solutions with specific criteria:
async reviewSolutionStream(problem, solution, round) {
try {
const prompt = `Review the following solution for Round ${round}:
**Problem:**
${problem}
**Solution:**
\`\`\`
${solution}
\`\`\`
Provide a detailed review with:
1. What works well
2. What could be improved
3. Score out of 10
Format as:
### Score: [X/10]
### Review:
[Your detailed review]`;
const response = await fetch(`${this.reviewerHost}/api/chat`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: CONFIG.MODEL,
messages: [
{
role: "user",
content: prompt,
}
],
stream: true,
})
});
if (!response.ok) {
throw new Error(`Reviewer API error: ${response.statusText}`);
}
return response;
} catch (error) {
throw new Error(`Review generation failed: ${error.message}`);
}
}
Stream Processing Magic
One of the trickier parts was handling the streaming responses from Ollama. Here’s how we manage that:
async processStream(response, writeStream) {
let fullText = "";
let isCodeBlock = false;
let codeContent = "";
const parser = createParser((event) => {
if (event.type === "event") {
try {
const data = JSON.parse(event.data);
if (data.message?.content) {
const content = data.message.content;
// For code blocks, ensure proper formatting
if (content.includes("```")) {
isCodeBlock = !isCodeBlock;
if (!isCodeBlock && codeContent) {
const cleanCode = this.cleanCodeBlock(codeContent);
fullText += cleanCode;
writeStream(cleanCode);
codeContent = "";
}
} else if (isCodeBlock) {
codeContent += content;
} else {
fullText += content;
writeStream(content);
}
}
} catch (error) {
console.error("Error parsing stream chunk:", error);
}
}
});
try {
for await (const chunk of response.body) {
const text = new TextDecoder().decode(chunk);
parser.feed(text);
}
} catch (error) {
console.error("Error processing stream:", error);
throw error;
}
return fullText;
}
Real-World Challenges (aka The Fun Part)
1. Resource Management
Running DeepSeek locally on an M1 MacBook Pro taught me some valuable lessons about hardware limits:
- Memory Usage: The 14B model needed about 8GB per instance, while 1.5B runs comfortably with 2-3GB
- Thermal Management: With the 14B model, my MacBook went into jet engine mode. The 1.5B version keeps things much cooler
- Disk Space: The 14B model needs about 8GB of disk space, while 1.5B only needs about 2GB
- Performance vs Temperature: Found that accuracy improvements from the 14B model often weren’t worth the thermal throttling
2. Connection Management
The system includes robust connection checking:
async checkConnection() {
try {
const [solverTags, reviewerTags] = await Promise.all([
fetch(`${this.solverHost}/api/tags`),
fetch(`${this.reviewerHost}/api/tags`)
]);
if (!solverTags.ok || !reviewerTags.ok) {
return false;
}
const [solverData, reviewerData] = await Promise.all([
solverTags.json(),
reviewerTags.json()
]);
const solverHasModel = solverData.models?.some(
(model) => model.name === CONFIG.MODEL
);
const reviewerHasModel = reviewerData.models?.some(
(model) => model.name === CONFIG.MODEL
);
if (!solverHasModel || !reviewerHasModel) {
console.warn(`Model ${CONFIG.MODEL} not found. Please run: ollama pull ${CONFIG.MODEL}`);
return false;
}
return true;
} catch (error) {
console.error("Connection check failed:", error);
return false;
}
}
Setting Up Your Own Instance
Want to join the fun? Here’s how to get started:
-
Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
-
Pull the DeepSeek model:
ollama pull deepseek-r1:1.5b
-
Start two Ollama instances:
# Terminal 1 ollama serve # Terminal 2 OLLAMA_HOST=127.0.0.1:11435 ollama serve
-
Clone and run the project:
git clone https://github.com/mrSamDev/ai-code-evolution cd ai-code-evolution && cd server npm install node x
What I’ve Learned
This experiment has been great for:
- Solo Projects: When you need a second pair of eyes (even if they’re occasionally cross-eyed)
- Learning: Both about AI and how creative its mistakes can be
- Code Review: When it works, it’s impressive. When it doesn’t, it’s entertaining
- Cost Savings: My wallet is happier, even if my CPU isn’t
Join the Adventure
Got your own AI coding stories? Watched your local model go completely off the rails? Let’s share our experiences - both the successes and the hilarious failures. After all, who doesn’t love a good “AI went bananas” story?
The complete code is available on GitHub
Updates and Discussion
This has been a fun experiment in pushing the boundaries of what’s possible with local AI models. Sometimes it works brilliantly, sometimes it hallucinates wildly, but it’s always interesting. Have you tried running AI models locally? What chaos have you witnessed? Let’s discuss in the comments below!