Postgres tips

Here are a few quick Postgres tips.

db-download-prod, db-restore-dev

I frequently download and restore my production database to my laptop using the following scripts, which are placed in my project's Git repo.

They depend on standard Unix tools, Postgres, and Crunchy Bridge CLIs.

The db-download-prod script downloads the backup to tmp/latest.backup:

#!/bin/bash
set -euo pipefail

# Delete/create target directory
backup_dir="tmp/latest_backup_dir"
rm -rf "$backup_dir"
mkdir -p "$backup_dir"

# Detect the number of CPU cores
case "$(uname -s)" in
    Linux*)     cores=$(nproc);;
    Darwin*)    cores=$(sysctl -n hw.ncpu);;
    *)          cores=1;;
esac

# Use one less than the total number of cores, but ensure at least 1 is used
(( jobs = cores - 1 ))
if (( jobs < 1 )); then
    jobs=1
fi

echo "Downloading with $jobs parallel job(s)"

# Use the directory format and specify the number of jobs for parallel dumping
pg_dump -Fd "$(cb uri app-prod --role application)" -j "$jobs" -f "$backup_dir"

The db-restore-dev script restores from the tmp/latest.backup file and does custom post-processing as needed for the project:

#!/bin/bash
set -euo pipefail

db="app_dev"

dropdb --if-exists "$db"
createdb "$db"
psql "$db" <<SQL
  CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
  CREATE EXTENSION IF NOT EXISTS pg_trgm;
  CREATE EXTENSION IF NOT EXISTS plpgsql;
SQL

# Same directory defined in `bin/db-download-prod`
backup_dir="tmp/latest_backup_dir"

# Detect the number of CPU cores
case "$(uname -s)" in
    Linux*)     cores=$(nproc);;
    Darwin*)    cores=$(sysctl -n hw.ncpu);;
    *)          cores=1;;
esac

# Use one less than the total number of cores, but ensure at least 1 is used
(( jobs = cores - 1 ))
if (( jobs < 1 )); then
    jobs=1
fi

echo "Restoring with $jobs parallel job(s)"

# Restore from directory
pg_restore -d "$db" --verbose --no-acl --no-owner -j "$jobs" "$backup_dir"

# Post-process
psql "$db" <<SQL
  UPDATE ar_internal_metadata
  SET value = 'development'
  WHERE key = 'environment';

  -- Avoid re-running incomplete jobs
  DELETE FROM jobs
  WHERE status IN ('pending', 'started');

  -- Avoid emailing production users
  UPDATE users
  SET active = false;

  -- Turn on flags for developers
  UPDATE
    users
  SET
    active = true,
    admin = true
  WHERE
    email IN (
      '[email protected]',
      '[email protected]'
    );
SQL

concat_ws

Postgres provides a concat_ws() string function with this signature:

concat_ws(sep text, str "any" [, str "any" [, ...] ])

Consider a schema where projects.second_user_id and project.third_user_id columns can be NULL:

              Table "public.projects"
     Column     |            Type        | Nullable |
----------------+------------------------+-----------
 id             | bigint                 | not null |
 name           | character varying(255) | not null |
 lead_user_id   | bigint                 | not null |
 second_user_id | bigint                 |          |
 third_user_id  | bigint                 |          |

                Table "public.users"
     Column     |            Type        | Nullable |
----------------+------------------------+-----------
 id             | bigint                 | not null |
 initials       | character varying(255) | not null |

A query to get the team for a project by each user's initials:

SELECT
  projects.name,
  concat_ws(' / ', u1.initials, u2.initials, u3.initials) AS team
FROM
  projects
  LEFT JOIN users u1 ON u1.id = projects.lead_user_id
  LEFT JOIN users u2 ON u2.id = projects.second_user_id
  LEFT JOIN users u3 ON u3.id = projects.third_user_id
GROUP BY
  projects.name,
  team
ORDER BY
  projects.name ASC;

Example output:

     name     |     team
--------------+--------------
Private Beta  | AB / CD
SLA           | EF / GH / IJ
Treasury Ops  | KL
(3 rows)

Time: 1 ms

Using concat_ws() instead of concat() prevents AB / / CD.

Create indexes concurrently

By default, Postgres' CREATE INDEX locks writes (but not reads) to a table. That can be unacceptable during a production deploy. On a large table, indexing can take hours.

Postgres has a CONCURRENTLY option for CREATE INDEX that creates the index without preventing concurrent INSERTs, UPDATEs, or DELETEs on the table.

One caveat is that concurrent indexes must be created outside a transaction.

If you want to do this in ActiveRecord:

class AddIndexToAsksActive < ActiveRecord::Migration
  disable_ddl_transaction!

  def change
    add_index :asks, :active, algorithm: :concurrently
  end
end

The disable_ddl_transaction! method applies only to that migration file. Adjacent migrations still run in their own transactions and roll back automatically if they fail. Therefore, it's a good idea to isolate concurrent index migrations to their own migration files.