Completed
Experiments with results and conclusions.
LLM Backend Selection and Context Window Management
Compared llama.cpp vs ollama for Gemma 4 26B on 12GB VRAM. Ollama wins on GPU utilization due to MoE-aware offloading. Context management and smart file reading essential for multi-step tool use.
System Prompt Role in Tool-Use Behavior
System prompt is load-bearing. "Helpful assistant" → model reviews files. "Task executor, NEVER review" → model acts. Small wording changes produce large behavioral shifts at Q4 quant.
Model sed Capability and HTML Corruption
sed is unreliable for HTML modification by LLMs. Partial application compounds errors. Atomic file rewrites are safer than incremental edits for structured documents.
Planned
Experiments with hypotheses, waiting to run.
Context Compaction Under Sustained Tool Use
Does summarize-and-replace preserve enough state for 10+ turn tasks?
Tree-Sitter Aware File Reading
Can AST-based outlines reduce context waste for code files like heading extraction does for markdown?
Quant Quality vs Instruction Following
Does Q8 Gemma 4 follow tool-use instructions more reliably than Q4?
Per-Expert MoE Offloading in Upstream llama.cpp
When upstream adds per-expert offloading, does it close the gap with ollama?
Ollama GGUF Format Differences
Document exact structural differences between ollama's GGUF and ggml-org's. Converter feasibility.
System Prompt Engineering for Q4 Models
Systematic testing of prompt variants: role framing, negative constraints, few-shot examples.
Output Token Budget and File Generation
Map ctx-size to max reliable output tokens. What's the biggest file cando can write in one tool call?
Task Decomposition Strategies
Multiple focused cando runs vs one big prompt. External orchestration vs in-context planning.
Topics
Context Window
The primary constraint for complex tasks. Smart file reading, compaction, and output budgeting.
EXP-0001, 0002, 0008
Tooling
Structural file reading, tree-sitter awareness, sed reliability, write strategies.
EXP-0001, 0003, 0011
Model Behavior
System prompt engineering, quant effects on instruction following, reviewer-mode drift.
EXP-0004, 0007, 0010
Infrastructure
Ollama vs llama.cpp, GGUF formats, MoE offloading, VRAM constraints.
EXP-0001, 0005, 0006