ai / claude

Britt and I use Anthropic's Claude 4 models via API to power our internal software tool at work. Here is what we learned about structured outputs, system vs. user prompts, and hallucination guardrails.

Thin client

We wrote a thin API client over the Messages API using the HTTP gem instead of depending on an Anthropic SDK.

Four methods cover what we need:

Every method requires a json_schema and returns [text, err] (chat_with_tools returns [parsed_json, conversation, err]).

Model selection

We use three models:

We use model aliases (claude-sonnet-4-5) instead of dated snapshots so the API resolves aliases to the latest version automatically.

Always use structured outputs

The most important thing we did was require structured outputs via json_schema in output_config on every API call.

Without it, Claude returns conversational preamble ("Okay, I'll help you with that...") or wraps text in markdown fences. Prompting it away is fragile.

With a schema, the API enforces the format at the protocol level. No preamble, no post-processing, no strip_surrounding_double_quotes.

Define a JSON_SCHEMA constant in each job:

JSON_SCHEMA = {
  type: "object",
  properties: {
    headline: {
      type: "string",
      description: "A Y Combinator-style company headline, 80 characters or less."
    }
  },
  required: ["headline"],
  additionalProperties: false
}.freeze

Then parse the response:

response, err = client.chat(
  model: MODEL_HAIKU,
  json_schema: JSON_SCHEMA,
  user_prompt: prompt
)
headline = JSON.parse(response).fetch("headline")

Every API call in the codebase now requires a schema.

System prompt vs. user prompt

After some experimentation, we settled on a rule:

Reserve system_prompt for separating instructions from untrusted data.

When you pass scraped websites, user-generated notes, or raw email threads in user_prompt, the model can confuse data for instructions. A system prompt carries higher authority and keeps your instructions safe from prompt injection.

If the prompt is self-contained (you control all the data), put everything in user_prompt and skip system_prompt entirely. We don't use "You are a..." persona lines. Detailed instructions already constrain output.

Tool use

chat_with_tools runs a multi-turn loop: send messages, receive tool_use blocks, execute each tool locally, send tool_result back, repeat until Claude emits end_turn.

Define tools as Anthropic-format hashes and pass a callable tool_handler that receives (name, input) and returns a result string:

response, conversation, err = client.chat_with_tools(
  model: MODEL_SONNET,
  system_prompt: system_prompt,
  user_prompt: prompt,
  tools: TOOLS,
  tool_handler: method(:handle_tool),
  json_schema: JSON_SCHEMA
)

json_schema constrains the final text response. During tool-use turns, Claude returns tool_use blocks instead. Both features work together in the same request.

The method returns [parsed_json, conversation, err]. conversation is the full message array for logging. A max_iterations parameter (default 10) caps the loop.

Reduce hallucinations

When provided data is thin, Claude can fabricate details, supplemented with training-data knowledge. We apply Anthropic's hallucination minimization guidelines across all prompts:

Restrict to provided data. Add "base your response only on the information provided" when the prompt passes structured context that should not be supplemented with outside knowledge.

Allow expressing uncertainty. Instruct Claude to output "Insufficient data" when context is too thin. Handle that sentinel string before writing to the database: filter out insufficient sections, NULL the column, or skip the record.

Require citations. For research tasks, require inline [n] citations that map to sources. Instruct Claude to omit claims it cannot cite rather than stating them without attribution.

Format data as JSON. We use Ruby's .to_json when passing data in prompts. Clean input reduces misinterpretation by the model.

Context window math

The client computes a max input size per model:

(context_window - max_output_tokens - buffer) * 4 chars/token

200k context window, minus 64k max output for Haiku/Sonnet (128k for Opus), minus a 5k buffer for the system prompt. User prompts are truncated to this limit before sending.

← All articles