Monorepo

Imagine we are software engineers at a devtool startup. Our team will write and maintain multiple programs.

We work incrementally, use technologies suited to their tasks and our preferences, and write custom tooling when needed, such as testbot, a Continuous Integration tool for private GitHub monorepos.

New repo

We initialize a monorepo on GitHub: org/repo:

.
└── README.md

We write a README.md:

Add to your shell profile:

# Set environment variable to monorepo path

export ORG="$HOME/org"

# Prepend monorepo scripts

export PATH="$ORG/bin:$PATH"

Clone:

git clone https://github.com/org/repo.git $ORG
cd $ORG

The $ORG environment variable will be used throughout the codebase.

Design

We design an architecture like this:

.
├── ...
├── cmd
│   └── serverd
│       └── main.go
├── dashboard
├── sdk
│   ├── go
│   ├── node
│   └── ruby
└── server
    └── http_handler.go

Commit messages

We use commit message conventions with a subject line prefix:

$ git log
dashboard: redirect to new project form after sign up
sdk/node: enable Keep-Alive in HTTP agent
server: document access to prod db

The prefix refers to the module that is changing. It often matches a filesystem directory. Sometimes the change will touch files across modules, but the prefix should be "where the action is."

We try to keep the subject line under 50 characters and err on the side of brevity for module names.

Dependencies

To aid local development and help onboard teammates, we provide a setup.sh script for each module. In turn, each setup.sh script might invoke more general $ORG/bin/setup-* scripts.

Since everyone has $ORG/bin on their $PATH, scripts in bin/ are available in everyone's shells without a directory prefix.

.
├── ...
├── bin
│   ├── setup-go
│   ├── setup-node
│   ├── setup-postgres
│   └── setup-ruby
├── dashboard
│   └── setup.sh
├── sdk
│  ├── go
│  │   └── setup.sh
│  ├── node
│  │   └── setup.sh
│  └── ruby
│      └── setup.sh
└── server
    └── setup.sh

Formatting

Our Go code is already formatted with gofmt but we could be more consistent in our TypeScript and Ruby code. We add config files for Prettier and Rubocop at the top of the file hierarchy:

.
├── ...
├── .rubocop.yml
└── .prettierrc.yml

CI

To move quickly without breaking things, we write automated tests and run those tests continuously when we integrate our changes.

We want our Continuous Integration (CI) service to:

Common causes of slow-starting tests on hosted CI services are multi-tenant queues and containerization: noisy neighbors backlog the queues and containers have cache misses.

To meet our design goals, we write our own CI service, testbot:

.
├── ...
├── cmd
│   └── testbot
│       └── main.go
├── dashboard
│   └── Testfile
├── sdk
│   ├── go
│   │   └── Testfile
│   ├── node
│   │   └── Testfile
│   └── ruby
│       └── Testfile
├── server
│   └── Testfile
└── testbot
    ├── Testfile
    ├── farmer
    │   ├── Procfile
    │   └── main.go
    └── worker
        └── main.go

We open a GitHub pull request to add a new feature to the SDKs:

sdk/go/account.go             | 10 +++++-----
sdk/go/account_test.go        | 10 +++++-----
sdk/node/src/account.ts       | 10 +++++-----
sdk/node/test/account.ts      | 10 +++++-----
sdk/ruby/lib/account.rb       | 10 +++++-----
sdk/ruby/spec/account_spec.rb | 10 +++++-----
6 files changed, 60 insertions(+), 60 deletions(-)

A testbot farmer process on a server responds to the GitHub webhook by:

Each Testfile defines test jobs for its directory. Ours might be:

$ cat $ORG/sdk/go/Testfile
tests: cd $ORG/sdk && go test -cover ./...
$ cat $ORG/sdk/node/Testfile
tests: cd $ORG/sdk/node && npm install && npm test
$ cat $ORG/sdk/ruby/Testfile
tests: $ORG/sdk/ruby/test.sh

A single Testfile can define multiple test jobs. As test scripts become lengthier, it is convenient to extract them to their own script.

Each line contains a name and command separated by a colon. The name appears in GitHub checks. The command is run by a testbot worker, which is continuously long polling testbot farmer via HTTP over TLS, asking for test jobs.

Each testbot worker runs on Heroku with Go, Node, and Ruby buildpacks. We run more instances to increase test parallelism.

In this example, the tests for the Go, Node, and Ruby SDKs will begin to run almost simultaneously as different testbot worker processes pick them up.

Tests for dashboard and server will not run in this pull request because no files in their directories were changed.

Testing across services

To make it convenient to test across service boundaries, we write a with-serverd script that:

We place this script in $ORG/bin to make it available on our $PATH.

This script can be used in Testfiles:

$ cat $ORG/dashboard/Testfile
tests: cd $ORG/dashboard && with-serverd ./test.sh
$ cat $ORG/sdk/go/Testfile
tests: cd $ORG/sdk && with-serverd go test -cover ./...

Tests that depend on with-serverd can avoid mocking on HTTP boundaries, making actual requests to the backing service and generating observable logs.

To broadly cover the product's surface area, we move the Go SDK's Testfile to the top of the file hierarchy so its test jobs will run on all pull requests:

.
├── ...
├── Testfile
├── bin
│   └── with-serverd
└── sdk
    └── go

Open source

Our SDKs should be open source on GitHub and available on registries such as NPM and Rubygems but we like the monorepo as the canonical place for our development workflow.

So, we mirror the SDK code to open source repos. This provides community access. Although the community patch workflow is a bit manual, these are SDKs tightly coupled to our product and we expect community patches to be rare.

To meet our design goals, we write a mirrorbot program:

.
├── ...
├── .mirrorbot.yml
└── cmd
    └── mirrorbot
        ├── Procfile
        ├── Testfile
        └── main.go

mirrorbot runs as a command-line program that copies relevant changes from the $ORG monorepo to one or more target repos. It can be run on a laptop or a server.

On startup, it clones the monorepo and each target repo. Each copied commit includes an upstream:[SHA] line (example). mirrorbot reads that SHA from the latest commit on the destination branch of each target repo to get its cursor.

Mirror then fast-forwards the local monorepo, looking for new commits.

For each target repo, a patch file is produced for each new commit. The patch file is then filtered to remove anything not applicable to that repo. If anything remains, the patch is applied and committed. Any new commits are then pushed to GitHub.

A .mirrorbot.yml file configures the program. It maps monorepo branches to target repo branches and monorepo directories and files to target repo directories and files. For example:

github.com/org/org-sdk-node:
  - branch:
      - main: main
  - mirror:
      - sdk/node: /
github.com/org/org-sdk-ruby:
  - branch:
      - main: main
  - mirror:
      - sdk/ruby: /

Backwards compatibility

As our customers use the software, we better understand their needs and begin to design v2.

To do this well, we add new interfaces and deprecate old interfaces in the v1.x series of the SDKs. The server continues to support the v1.x series now and for a well-communicated time period past the release of v2.0 (such as 90 days).

As we release minor versions v1.1 and v1.2, patch versions v1.2.1 and v1.2.2, and eventually v2, we want to carefully ensure backwards compatibility.

To meet our design goals, we write with-go-sdk, with-node-sdk, and with-ruby-sdk scripts:

.
├── ...
└── bin
    ├── with-go-sdk
    ├── with-node-sdk
    └── with-ruby-sdk

We use these scripts to run processes using a given version of our SDKs:

with-go-sdk [version] [command]
with-node-sdk [version] [command]
with-ruby-sdk [version] [command]

The version can be a released version to a registry such as NPM or Rubygems or a special value monorepo to use the current $ORG source code.

For example:

with-ruby-sdk 1.0 ruby -e "require 'org-sdk'; puts OrgSDK::VERSION"

We update the Testfile at the top of the file hierarchy to test supported releases and the current source code (pre-release) on every pull request:

$ cat $ORG/Testfile
gohead: with-serverd with-go-sdk monorepo go test -cover ./...
gov1: with-serverd with-go-sdk 1.5 go test -cover ./...
gov2: with-serverd with-go-sdk 2 go test -cover ./...
# ... etc.
rubyv2: with-serverd with-ruby-sdk 2 $ORG/sdk/ruby/test.sh

The SDK scripts are composable with the with-serverd script.

In addition to running on CI, they are useful for quickly testing bug reports from customers on a particular version, and hopefully providing a great customer experience for them.

Deploy

We'll deploy mirrorbot, serverd, and testbot to Heroku, which expects a Procfile manifest for each program:

.
├── ...
├── cmd
│   ├── mirrorbot
│   │   ├── Procfile
│   │   └── main.go
│   ├── serverd
│   │   ├── Procfile
│   │   └── main.go
│   └── testbot
│       ├── Procfile
│       └── main.go

Heroku uses a stack of "buildpacks" to set up its build system. We use a "monorepo" buildpack first in the stack, which uses an APP_BASE=subdir config variable to tell Heroku where to find the Procfile program.

heroku buildpacks:add -a <app> https://github.com/lstoll/heroku-buildpack-monorepo
heroku config:set APP_BASE=cmd/serverd -a <app>

We use the Go buildpack second in the stack, which will install our Go program as a binary in bin/program.

heroku buildpacks:add -a <app> heroku/go

Our Procfile contains, for example:

web: bin/mirrorbot

Conclusion

Working in a monorepo can encourage a feeling of "tight integration", where service boundaries are well-defined and less likely to be mocked out. Some tasks are a particular pleasure, such as searching across projects for callsites of a function or RPC.

For other tasks, it can help to write custom tools. Writing those custom tools offers an opportunity to design an ideal experience for the engineering team.

The final directory structure looks like this:

.
├── .mirrorbot.yml
├── .prettierrc.yml
├── .rubocop.yml
├── .tool-versions
├── README.md
├── Testfile
├── bin
│   ├── setup-go
│   ├── setup-node
│   ├── setup-postgres
│   ├── setup-ruby
│   ├── with-go-sdk
│   ├── with-node-sdk
│   ├── with-ruby-sdk
│   └── with-serverd
├── cmd
│   ├── mirrorbot
│   │   ├── Procfile
│   │   ├── Testfile
│   │   ├── main.go
│   │   └── setup.sh
│   ├── serverd
│   │   ├── Procfile
│   │   └── main.go
│   └── testbot
│       ├── Procfile
│       └── main.go
├── dashboard
│   ├── Testfile
│   └── setup.sh
├── sdk
│   ├── go
│   │   └── setup.sh
│   ├── node
│   │   ├── Testfile
│   │   └── setup.sh
│   └── ruby
│       ├── Testfile
│       └── setup.sh
├── server
│   ├── Testfile
│   ├── http_handler.go
│   └── setup.sh
└── testbot
    ├── Testfile
    ├── farmer
    │   └── main.go
    ├── setup.sh
    └── worker
        └── main.go