Cache API Calls
To ingest data from Foursquare's Places API, terms of use include:
- the data can not be cached longer than 24 hours
- an hourly rate limit and a daily call quota, whichever comes first
A database table stores API calls:
-- createdb venues
-- psql -d venues -f schema.sql
CREATE TABLE cache_foursquare (
req_url text NOT NULL,
resp_body jsonb NOT NULL,
fetched_at timestamp NOT NULL,
UNIQUE (req_url)
);
CREATE INDEX req_url_idx ON cache_foursquare (req_url);
A Ruby client makes API calls:
Foursquare.explore("tacos", near: "San Francisco, CA")
The first time this code runs, an HTTP request is made, the request URL (hashed), response body, and timestamp are saved to a Postgres database.
When it runs again within Foursquare's cache policy, the data is retrieved from Postgres and no HTTP is made.
The HTTP request URL is hashed as an extra security measure to obfuscate sensitive data (client ID and secret) in the query params.
require "bundler/inline"
require "digest"
require "json"
gemfile do
source "https://rubygems.org"
gem "dotenv"
gem "http"
gem "pg"
end
Dotenv.load
DB = PG.connect("postgres://postgres:postgres@localhost:5432/venues")
class Foursquare
def self.explore(query, near:)
http = HTTP::Client.new
req = http.build_request(
:get,
"https://api.foursquare.com/v2/venues/explore",
params: {
client_id: ENV.fetch("FSQ_ID"),
client_secret: ENV.fetch("FSQ_SECRET"),
v: "20180323",
query: query,
near: near
}
)
req_url = Digest::MD5.hexdigest(req.uri)
# lookup cache
cache = DB.exec_params(<<~SQL, [req_url]).first
SELECT
resp_body
FROM
cache_foursquare
WHERE
req_url = $1
AND fetched_at > now() - '24 hours'::interval
SQL
if cache
# return cache if fresh
return [200, cache["resp_body"]]
end
# GET req.uri
resp = http.perform(req, HTTP::Options.new({}))
if resp.code != 200
return [resp.code, JSON.parse(resp.body)]
end
# add to cache, or update stale cache
DB.exec_params(<<~SQL, [req_url, resp.body])
INSERT INTO cache_foursquare (fetched_at, req_url, resp_body)
VALUES (now(), $1, $2)
ON CONFLICT (req_url)
DO UPDATE SET
fetched_at = EXCLUDED.fetched_at, resp_body = EXCLUDED.resp_body
SQL
# return fresh data
[200, JSON.parse(resp.body)]
end
end
if $0 == __FILE__
Foursquare.explore("tacos", near: "San Francisco, CA")
end
Old data can be deleted via a clock process or pg_cron:
DELETE FROM cache_foursquare
WHERE fetched_at < now() - '24 hours'::interval;