ruby / haml renderer

The templates in the web framework use a restricted "dumb" subset of Haml. A custom renderer replaced the haml gem. It parses and renders only the subset grammar, rejecting everything else at parse time.

Two files, ~1,200 lines total:

Why replace the gem

The haml gem evaluates arbitrary Ruby at render time. Templates can call methods, access constants, assign variables. The dumb template subset uses none of this.

A custom renderer enforces the subset by construction: if the parser has no node type for a construct, it can't appear in templates. This removes Ruby eval from the rendering path and drops a dependency.

The prerequisite was restricting all ~360 templates to the dumb subset first: moving method calls, hash access, and formatting into handlers with Data.define structs. Once every template conformed, a CI linter prevented regressions, and the renderer could be built against a frozen grammar.

Parser

Haml::Subset.new(source, path:) parses source into a tree at construction time. Lines are classified into node types:

:doctype     # !!!
:comment     # -# ...
:filter      # :javascript, :css
:if          # - if expr
:elsif       # - elsif expr
:else        # - else
:each        # - collection.each do |item|
:render      # = render "name", key: value
:output      # = expr (HTML-escaped)
:raw_output  # != expr (raw)
:tag         # %tag.class#id{ attrs }
:text        # static text

There is no :eval or :ruby node. A method call, constant reference, or variable assignment has no node type to parse into — the parser raises.

Indentation determines nesting. The parser walks lines at each indent level and recursively parses children:

private def parse(lines, base_indent, from, to)
  nodes = []
  i = from

  while i < to
    line = lines[i]
    stripped = line.lstrip
    indent = line.length - line.lstrip.length

    if stripped == ""
      i += 1
      next
    end

    if indent != base_indent
      raise "#{@path}:#{i + 1}: expected indent #{base_indent}, got #{indent}"
    end

    # Find children (lines with greater indent)
    child_end = i + 1
    while child_end < to
      next_line = lines[child_end]
      next_stripped = next_line.lstrip
      if next_stripped != ""
        if (next_line.length - next_stripped.length) <= indent
          break
        end
      end
      child_end += 1
    end

    node = parse_line(stripped, indent, lines, i + 1, child_end)
    nodes << node
    i = child_end
  end

  nodes
end

The parser extracts tag name, classes, ID, and attributes:

private def parse_tag(stripped, indent, lines, child_from, child_to)
  rest = stripped.dup
  tag_name = "div"
  classes = []
  id = nil

  if rest.start_with?("%")
    m = rest.match(/\A%(\w[\w-]*)/)
    tag_name = m[1]
    rest = rest[m[0].length..]
  end

  while rest.match?(/\A[.#]/)
    if rest.start_with?(".")
      m = rest.match(/\A\.(-?[a-zA-Z_][\w-]*)/)
      classes << m[1]
      rest = rest[m[0].length..]
    elsif rest.start_with?("#")
      m = rest.match(/\A#([a-zA-Z_][\w-]*)/)
      id = m[1]
      rest = rest[m[0].length..]
    end
  end

  # Reject inline content — inner content must be on a new line
  rest = rest.strip
  if rest != ""
    raise "#{@path}: inline content on tags is not allowed: #{stripped}"
  end

  children = parse(lines, indent + 2, child_from, child_to)
  { type: :tag, tag: tag_name, classes: classes, id: id,
    children: children }
end

Inline content on tags is banned. %h1 Title must be written as:

%h1
  Title

This simplifies parsing (every tag's content is children) and makes the structure explicit.

Expression evaluator

Expressions in = field, - if expr, and #{} interpolation go through Haml::Expr, a recursive-descent parser with a constrained grammar:

expr         ::= or_expr
or_expr      ::= and_expr ('||' and_expr)*
and_expr     ::= not_expr ('&&' not_expr)*
not_expr     ::= '!' not_expr | cmp_expr
cmp_expr     ::= primary (('==' | '!=') primary)?
primary      ::= STRING | NUMBER | BOOL | NIL | field_access
field_access ::= IDENT ('.' IDENT)*

Three stages — tokenize, parse to AST, evaluate:

def self.eval_string(src, ctx)
  tokens = tokenize(src.strip)
  parser = Parser.new(tokens)
  node = parser.parse_expr
  evaluate(node, ctx)
end

The evaluator walks the AST and resolves values against a context object:

def self.evaluate(node, ctx)
  case node[:type]
  when :string  then node[:value]
  when :number  then node[:value]
  when :bool    then node[:value]
  when :nil     then nil
  when :field   then eval_field(node[:parts], ctx)
  when :cmp
    left = evaluate(node[:left], ctx)
    right = evaluate(node[:right], ctx)
    case node[:op]
    when "==" then left == right
    when "!=" then left != right
    end
  when :and
    evaluate(node[:left], ctx) && evaluate(node[:right], ctx)
  when :or
    evaluate(node[:left], ctx) || evaluate(node[:right], ctx)
  when :not
    !evaluate(node[:operand], ctx)
  end
end

Field access resolves through send:

def self.eval_field(parts, ctx)
  val = ctx.send(parts[0].to_sym)
  i = 1
  while i < parts.length
    val = val.send(parts[i].to_sym)
    i += 1
  end
  val
end

= data.name becomes ctx.send(:data).send(:name), which works with Data.define structs and singleton methods on the context.

The evaluator also handles hash literals with ** splat (for tag attributes), array literals, interpolated strings, and function calls (for ViewHelper methods on the context).

Rendering

The renderer walks the AST, appending HTML to a buffer:

private def render_nodes(nodes, buf, ctx, partial_renderer)
  i = 0
  while i < nodes.length
    node = nodes[i]
    case node[:type]
    when :doctype    then buf << "<!DOCTYPE html>\n"
    when :comment    then nil
    when :text       then buf << Expr.interpolate(node[:text], ctx) << "\n"
    when :output
      buf << escape_val(Expr.eval_string(node[:expr], ctx)) << "\n"
    when :raw_output
      buf << Expr.eval_string(node[:expr], ctx).to_s << "\n"
    when :render
      buf << render_partial_call(node[:expr], ctx, partial_renderer)
    when :tag
      render_tag(node, buf, ctx, partial_renderer)
    when :filter
      render_filter(node, buf, ctx)
    when :if
      chain = [node]
      while i + 1 < nodes.length &&
          (nodes[i + 1][:type] == :elsif || nodes[i + 1][:type] == :else)
        i += 1
        chain << nodes[i]
      end
      render_conditional(chain, buf, ctx, partial_renderer)
    when :each
      render_each(node, buf, ctx, partial_renderer)
    end
    i += 1
  end
end

= expr always HTML-escapes. != expr outputs raw. Handlers pre-escape anything that needs !=.

Tags emit opening and closing HTML with escaped attributes:

private def render_tag(node, buf, ctx, partial_renderer)
  tag = node[:tag]
  attrs = build_attrs(node, ctx)
  attr_str = attrs.map { |k, v|
    if v == true
      " #{k}"
    else
      " #{k}=\"#{CGI.escapeHTML(v.to_s)}\""
    end
  }.join

  if VOID_ELEMENTS.include?(tag)
    buf << "<#{tag}#{attr_str}>\n"
    return
  end

  if node[:children] != []
    buf << "<#{tag}#{attr_str}>\n"
    render_nodes(node[:children], buf, ctx, partial_renderer)
    buf << "</#{tag}>\n"
  else
    buf << "<#{tag}#{attr_str}></#{tag}>\n"
  end
end

Context

Templates receive data through a context object. Locals become singleton methods:

private def make_context(locals, context: nil)
  env = context || Object.new
  locals.each do |k, v|
    env.define_singleton_method(k) { v }
  end
  env
end

Loop variables clone the context to avoid mutating the parent binding:

private def clone_context(ctx, name, value)
  child = ctx.clone
  child.define_singleton_method(name.to_sym) { value }
  child
end

Without cloning, - items.each do |data| would overwrite the parent data local for the rest of the template.

The renderer IS the linter

Before the custom renderer, a regex-based linter scanned templates for banned constructs. It was incomplete: regexes can't parse nested expressions, and every new violation pattern needed a new rule.

The custom renderer replaced the linter. Templates are parsed at boot by cache_all. A template with a construct outside the subset crashes the process before it serves a request.

If it parses, it's in the subset. If it's not in the subset, it doesn't parse.

← All articles