Research Log

Experiments

Building an LLM agent on consumer hardware means running into walls. We document what we hit, what we tried, and what we learned.

Completed

Experiments with results and conclusions.

Planned

Experiments with hypotheses, waiting to run.

0002

Context Compaction Under Sustained Tool Use

Does summarize-and-replace preserve enough state for 10+ turn tasks?

0003

Tree-Sitter Aware File Reading

Can AST-based outlines reduce context waste for code files like heading extraction does for markdown?

0004

Quant Quality vs Instruction Following

Does Q8 Gemma 4 follow tool-use instructions more reliably than Q4?

0005

Per-Expert MoE Offloading in Upstream llama.cpp

When upstream adds per-expert offloading, does it close the gap with ollama?

0006

Ollama GGUF Format Differences

Document exact structural differences between ollama's GGUF and ggml-org's. Converter feasibility.

0007

System Prompt Engineering for Q4 Models

Systematic testing of prompt variants: role framing, negative constraints, few-shot examples.

0008

Output Token Budget and File Generation

Map ctx-size to max reliable output tokens. What's the biggest file cando can write in one tool call?

0009

Task Decomposition Strategies

Multiple focused cando runs vs one big prompt. External orchestration vs in-context planning.

Topics

Context Window

The primary constraint for complex tasks. Smart file reading, compaction, and output budgeting.

EXP-0001, 0002, 0008

Tooling

Structural file reading, tree-sitter awareness, sed reliability, write strategies.

EXP-0001, 0003, 0011

Model Behavior

System prompt engineering, quant effects on instruction following, reviewer-mode drift.

EXP-0004, 0007, 0010

Infrastructure

Ollama vs llama.cpp, GGUF formats, MoE offloading, VRAM constraints.

EXP-0001, 0005, 0006