Distributed caching systems dramatically improve Jekyll site performance by serving content from edge locations worldwide. By combining Ruby's processing power with Cloudflare Workers' edge execution, you can build sophisticated caching systems that intelligently manage content distribution, invalidation, and synchronization. This guide explores advanced distributed caching architectures that leverage Ruby for cache management logic and Cloudflare Workers for edge delivery, creating a performant global caching layer for static sites.
A distributed caching architecture for Jekyll involves multiple cache layers and synchronization mechanisms to ensure fast, consistent content delivery worldwide. The system must handle cache population, invalidation, and consistency across edge locations.
The architecture employs a hierarchical cache structure with origin cache (Ruby-managed), edge cache (Cloudflare Workers), and client cache (browser). Cache keys are derived from content hashes for easy invalidation. The system uses event-driven synchronization to propagate cache updates across regions while maintaining eventual consistency. Ruby controllers manage cache logic while Cloudflare Workers handle edge delivery with sub-millisecond response times.
# Distributed Cache Architecture:
# 1. Origin Layer (Ruby):
# - Content generation and processing
# - Cache key generation and management
# - Invalidation triggers and queue
#
# 2. Edge Layer (Cloudflare Workers):
# - Global cache storage (KV + R2)
# - Request routing and cache serving
# - Stale-while-revalidate patterns
#
# 3. Synchronization Layer:
# - WebSocket connections for real-time updates
# - Cache replication across regions
# - Conflict resolution mechanisms
#
# 4. Monitoring Layer:
# - Cache hit/miss analytics
# - Performance metrics collection
# - Automated optimization suggestions
# Cache Key Structure:
# - Content: content_{md5_hash}
# - Page: page_{path}_{locale}_{hash}
# - Fragment: fragment_{type}_{id}_{hash}
# - Asset: asset_{path}_{version}
The Ruby cache manager orchestrates cache operations, implements sophisticated invalidation strategies, and maintains cache consistency. It integrates with Jekyll's build process to optimize cache population.
# lib/distributed_cache/manager.rb
module DistributedCache
class Manager
def initialize(config)
@config = config
@stores = {}
@invalidation_queue = InvalidationQueue.new
@metrics = MetricsCollector.new
end
def store(key, value, options = {})
# Determine storage tier based on options
store = select_store(options[:tier])
# Generate cache metadata
metadata = {
stored_at: Time.now.utc,
expires_at: expiration_time(options[:ttl]),
version: options[:version] || 'v1',
tags: options[:tags] || []
}
# Store with metadata
store.write(key, value, metadata)
# Track in metrics
@metrics.record_store(key, value.bytesize)
value
end
def fetch(key, options = {}, &generator)
# Try to fetch from cache
cached = fetch_from_cache(key, options)
if cached
@metrics.record_hit(key)
return cached
end
# Cache miss - generate and store
@metrics.record_miss(key)
value = generator.call
# Store asynchronously to not block response
Thread.new do
store(key, value, options)
end
value
end
def invalidate(tags: nil, keys: nil, pattern: nil)
if tags
invalidate_by_tags(tags)
elsif keys
invalidate_by_keys(keys)
elsif pattern
invalidate_by_pattern(pattern)
end
end
def warm_cache(site_content)
# Pre-warm cache with site content
warm_pages_cache(site_content.pages)
warm_assets_cache(site_content.assets)
warm_data_cache(site_content.data)
end
private
def select_store(tier)
@stores[tier] ||= case tier
when :memory
MemoryStore.new(@config.memory_limit)
when :disk
DiskStore.new(@config.disk_path)
when :redis
RedisStore.new(@config.redis_url)
else
@stores[:memory]
end
end
def invalidate_by_tags(tags)
tags.each do |tag|
# Find all keys with this tag
keys = find_keys_by_tag(tag)
# Add to invalidation queue
@invalidation_queue.add(keys)
# Propagate to edge caches
propagate_invalidation(keys) if @config.edge_invalidation
end
end
def propagate_invalidation(keys)
# Use Cloudflare API to purge cache
client = Cloudflare::Client.new(@config.cloudflare_token)
client.purge_cache(keys.map { |k| key_to_url(k) })
end
end
# Intelligent invalidation queue
class InvalidationQueue
def initialize
@queue = []
@processing = false
end
def add(keys, priority: :normal)
@queue << {
keys: Array(keys),
priority: priority,
added_at: Time.now.utc
}
# Sort by priority
@queue.sort_by! { |item| [priority_score(item[:priority]), item[:added_at]] }
# Start processing if not already running
process_queue unless @processing
end
private
def priority_score(priority)
case priority
when :critical then 0
when :high then 1
when :normal then 2
when :low then 3
else 2
end
end
def process_queue
@processing = true
Thread.new do
while item = @queue.shift
process_invalidation(item[:keys])
sleep(0.1) # Throttle invalidation
end
@processing = false
end
end
end
# Jekyll integration
class JekyllCacheGenerator < Generator
def generate(site)
cache_manager = DistributedCache::Manager.new(site.config['cache'])
# Generate cache keys for all content
site.pages.each do |page|
cache_key = generate_cache_key(page)
cache_manager.store(cache_key, page.output,
ttl: page.data['cache_ttl'] || 3600,
tags: page.data['tags'] || []
)
end
# Warm API data cache
warm_api_data_cache(site, cache_manager)
end
def generate_cache_key(page)
# Generate deterministic cache key
hash_input = [
page.path,
page.content,
page.data.to_json,
page.site.config['version']
].join('|')
"page_#{Digest::MD5.hexdigest(hash_input)}"
end
end
end
Cloudflare Workers provide edge caching with global distribution and sub-millisecond response times. The Workers implement sophisticated caching logic including stale-while-revalidate and cache partitioning.
// workers/edge-cache.js
// Global edge cache implementation
export default {
async fetch(request, env, ctx) {
const url = new URL(request.url)
const cacheKey = generateCacheKey(request)
// Check if we should bypass cache
if (shouldBypassCache(request)) {
return fetch(request)
}
// Try to get from cache
let response = await getFromCache(cacheKey, env)
if (response) {
// Cache hit - check if stale
if (isStale(response)) {
// Serve stale content while revalidating
ctx.waitUntil(revalidateCache(request, cacheKey, env))
return markResponseAsStale(response)
}
// Fresh cache hit
return markResponseAsCached(response)
}
// Cache miss - fetch from origin
response = await fetch(request.clone())
// Cache the response if cacheable
if (isCacheable(response)) {
ctx.waitUntil(cacheResponse(cacheKey, response, env))
}
return response
}
}
async function getFromCache(cacheKey, env) {
// Try KV store first
const cached = await env.EDGE_CACHE_KV.get(cacheKey, { type: 'json' })
if (cached) {
return new Response(cached.content, {
headers: cached.headers,
status: cached.status
})
}
// Try R2 for large assets
const r2Key = `cache/${cacheKey}`
const object = await env.EDGE_CACHE_R2.get(r2Key)
if (object) {
return new Response(object.body, {
headers: object.httpMetadata.headers
})
}
return null
}
async function cacheResponse(cacheKey, response, env) {
const responseClone = response.clone()
const headers = Object.fromEntries(responseClone.headers.entries())
const status = responseClone.status
// Get response body based on size
const body = await responseClone.text()
const size = body.length
const cacheData = {
content: body,
headers: headers,
status: status,
cachedAt: Date.now(),
ttl: calculateTTL(responseClone)
}
if (size > 1024 * 1024) { // 1MB threshold
// Store large responses in R2
await env.EDGE_CACHE_R2.put(`cache/${cacheKey}`, body, {
httpMetadata: { headers }
})
// Store metadata in KV
await env.EDGE_CACHE_KV.put(cacheKey, JSON.stringify({
...cacheData,
content: null,
storage: 'r2'
}))
} else {
// Store in KV
await env.EDGE_CACHE_KV.put(cacheKey, JSON.stringify(cacheData), {
expirationTtl: cacheData.ttl
})
}
}
function generateCacheKey(request) {
const url = new URL(request.url)
// Create cache key based on request characteristics
const components = [
request.method,
url.hostname,
url.pathname,
url.search,
request.headers.get('accept-language') || 'en',
request.headers.get('cf-device-type') || 'desktop'
]
// Hash the components
const keyString = components.join('|')
return hashString(keyString)
}
function hashString(str) {
// Simple hash function
let hash = 0
for (let i = 0; i < str.length; i++) {
const char = str.charCodeAt(i)
hash = ((hash << 5) - hash) + char
hash = hash & hash // Convert to 32bit integer
}
return Math.abs(hash).toString(36)
}
// Cache invalidation worker
export class CacheInvalidationWorker {
constructor(state, env) {
this.state = state
this.env = env
}
async fetch(request) {
const url = new URL(request.url)
if (url.pathname === '/invalidate' && request.method === 'POST') {
return this.handleInvalidation(request)
}
return new Response('Not found', { status: 404 })
}
async handleInvalidation(request) {
const { keys, tags, pattern } = await request.json()
let keysToInvalidate = []
if (keys) {
keysToInvalidate = keys
} else if (tags) {
keysToInvalidate = await this.findKeysByTags(tags)
} else if (pattern) {
keysToInvalidate = await this.findKeysByPattern(pattern)
}
// Invalidate each key
await Promise.all(
keysToInvalidate.map(key => this.invalidateKey(key))
)
// Propagate to other edge locations
await this.propagateInvalidation(keysToInvalidate)
return new Response(JSON.stringify({
invalidated: keysToInvalidate.length
}))
}
async invalidateKey(key) {
// Delete from KV
await this.env.EDGE_CACHE_KV.delete(key)
// Delete from R2 if exists
await this.env.EDGE_CACHE_R2.delete(`cache/${key}`)
}
}
Jekyll build-time optimization involves generating cache-friendly content, adding cache headers, and creating cache manifests for intelligent edge delivery.
# _plugins/cache_optimizer.rb
module Jekyll
class CacheOptimizer
def optimize_site(site)
# Add cache headers to all pages
site.pages.each do |page|
add_cache_headers(page)
end
# Generate cache manifest
generate_cache_manifest(site)
# Optimize assets for caching
optimize_assets_for_cache(site)
end
def add_cache_headers(page)
cache_control = generate_cache_control(page)
expires = generate_expires_header(page)
page.data['cache_control'] = cache_control
page.data['expires'] = expires
# Add to page output
if page.output
page.output = inject_cache_headers(page.output, cache_control, expires)
end
end
def generate_cache_control(page)
# Determine cache strategy based on page type
if page.data['layout'] == 'default'
# Static content - cache for longer
"public, max-age=3600, stale-while-revalidate=7200"
elsif page.url.include?('_posts')
# Blog posts - moderate cache
"public, max-age=1800, stale-while-revalidate=3600"
else
# Default cache
"public, max-age=300, stale-while-revalidate=600"
end
end
def generate_cache_manifest(site)
manifest = {
version: '1.0',
generated: Time.now.utc.iso8601,
pages: {},
assets: {},
invalidation_map: {}
}
# Map pages to cache keys
site.pages.each do |page|
cache_key = generate_page_cache_key(page)
manifest[:pages][page.url] = {
key: cache_key,
hash: page.content_hash,
dependencies: find_page_dependencies(page)
}
# Build invalidation map
add_to_invalidation_map(page, manifest[:invalidation_map])
end
# Save manifest
File.write(File.join(site.dest, 'cache-manifest.json'),
JSON.pretty_generate(manifest))
end
def generate_page_cache_key(page)
components = [
page.url,
page.content,
page.data.to_json
]
Digest::SHA256.hexdigest(components.join('|'))[0..31]
end
def add_to_invalidation_map(page, map)
# Map tags to pages for quick invalidation
tags = page.data['tags'] || []
categories = page.data['categories'] || []
(tags + categories).each do |tag|
map[tag] ||= []
map[tag] << page.url
end
end
end
# Hook into Jekyll's build process
Jekyll::Hooks.register :site, :post_write do |site|
optimizer = CacheOptimizer.new
optimizer.optimize_site(site)
end
end
# Rake task for cache warm-up
namespace :cache do
desc 'Warm cache for entire site'
task :warm do
require 'net/http'
require 'uri'
site_url = ENV['SITE_URL'] || 'https://yourdomain.com'
urls_file = '_site/urls.txt'
# Read URLs from sitemap or generate list
urls = if File.exist?(urls_file)
File.readlines(urls_file).map(&:chomp)
else
generate_urls_from_sitemap
end
puts "Warming cache for #{urls.size} URLs..."
# Warm cache with concurrent requests
threads = []
urls.each_slice(10) do |batch|
batch.each do |url|
threads << Thread.new do
uri = URI.parse("#{site_url}#{url}")
Net::HTTP.get(uri)
puts "Warmed: #{url}"
end
end
threads.each(&:join)
threads.clear
sleep(0.5) # Rate limiting
end
puts "Cache warming completed!"
end
end
Multi-region cache synchronization ensures consistency across global edge locations. The system uses a combination of replication strategies and conflict resolution.
# lib/distributed_cache/synchronizer.rb
module DistributedCache
class Synchronizer
def initialize(config)
@config = config
@regions = config.regions
@connections = {}
@replication_queue = ReplicationQueue.new
end
def synchronize(key, value, operation = :write)
case operation
when :write
replicate_write(key, value)
when :delete
replicate_delete(key)
when :update
replicate_update(key, value)
end
end
def replicate_write(key, value)
# Primary region write
primary_region = @config.primary_region
write_to_region(primary_region, key, value)
# Async replication to other regions
(@regions - [primary_region]).each do |region|
@replication_queue.add({
type: :write,
region: region,
key: key,
value: value,
priority: :high
})
end
end
def ensure_consistency(key)
# Check consistency across regions
values = {}
@regions.each do |region|
values[region] = read_from_region(region, key)
end
# Find inconsistencies
unique_values = values.values.uniq.compact
if unique_values.size > 1
# Conflict detected - resolve
resolved_value = resolve_conflict(key, values)
# Replicate resolved value
replicate_resolution(key, resolved_value, values)
end
end
def resolve_conflict(key, regional_values)
# Implement conflict resolution strategy
case @config.conflict_resolution
when :last_write_wins
resolve_last_write_wins(regional_values)
when :priority_region
resolve_priority_region(regional_values)
when :merge
resolve_merge(regional_values)
else
resolve_last_write_wins(regional_values)
end
end
private
def write_to_region(region, key, value)
connection = connection_for_region(region)
connection.write(key, value)
# Update version vector
update_version_vector(key, region)
end
def connection_for_region(region)
@connections[region] ||= begin
case region
when /cf-/
CloudflareConnection.new(@config.cloudflare_token, region)
when /aws-/
AWSConnection.new(@config.aws_config, region)
else
RedisConnection.new(@config.redis_urls[region])
end
end
end
def update_version_vector(key, region)
vector = read_version_vector(key) || {}
vector[region] = Time.now.utc.to_i
write_version_vector(key, vector)
end
end
# Region-specific connections
class CloudflareConnection
def initialize(api_token, region)
@client = Cloudflare::Client.new(api_token)
@region = region
end
def write(key, value)
# Write to Cloudflare KV in specific region
@client.put_kv(@region, key, value)
end
def read(key)
@client.get_kv(@region, key)
end
end
# Replication queue with backoff
class ReplicationQueue
def initialize
@queue = []
@failed_replications = {}
@max_retries = 5
end
def add(item)
@queue << item
# Process queue if not already processing
process_queue unless @processing
end
def process_queue
@processing = true
Thread.new do
while item = @queue.shift
begin
execute_replication(item)
rescue => e
handle_replication_failure(item, e)
end
end
@processing = false
end
end
def execute_replication(item)
case item[:type]
when :write
replicate_write(item)
when :delete
replicate_delete(item)
when :update
replicate_update(item)
end
# Clear failure count on success
@failed_replications.delete(item[:key])
end
def replicate_write(item)
connection = connection_for_region(item[:region])
connection.write(item[:key], item[:value])
end
def handle_replication_failure(item, error)
failure_count = @failed_replications[item[:key]] || 0
if failure_count < @max_retries
# Retry with exponential backoff
@failed_replications[item[:key]] = failure_count + 1
# Requeue with delay
item[:retry_delay] = 2 ** failure_count
@queue << item
log("Replication failed for #{item[:key]}, retrying in #{item[:retry_delay]}s")
else
log("Replication permanently failed for #{item[:key]}: #{error.message}")
@failed_replications.delete(item[:key])
end
end
end
end
Cache monitoring provides insights into cache effectiveness, hit rates, and performance metrics for continuous optimization.
# lib/distributed_cache/monitoring.rb
module DistributedCache
class Monitoring
def initialize(config)
@config = config
@metrics = {
hits: 0,
misses: 0,
writes: 0,
invalidations: 0,
regional_hits: Hash.new(0),
response_times: []
}
@start_time = Time.now
end
def record_hit(key, region = nil)
@metrics[:hits] += 1
@metrics[:regional_hits][region] += 1 if region
end
def record_miss(key, region = nil)
@metrics[:misses] += 1
end
def record_response_time(milliseconds)
@metrics[:response_times] << milliseconds
# Keep only last 1000 measurements
if @metrics[:response_times].size > 1000
@metrics[:response_times].shift
end
end
def generate_report
uptime = Time.now - @start_time
total_requests = @metrics[:hits] + @metrics[:misses]
hit_rate = total_requests > 0 ? (@metrics[:hits].to_f / total_requests * 100).round(2) : 0
avg_response_time = if @metrics[:response_times].any?
(@metrics[:response_times].sum / @metrics[:response_times].size).round(2)
else
0
end
{
general: {
uptime_hours: (uptime / 3600).round(2),
total_requests: total_requests,
hit_rate_percent: hit_rate,
hit_count: @metrics[:hits],
miss_count: @metrics[:misses],
write_count: @metrics[:writes],
invalidation_count: @metrics[:invalidations]
},
performance: {
avg_response_time_ms: avg_response_time,
p95_response_time_ms: percentile(95),
p99_response_time_ms: percentile(99),
min_response_time_ms: @metrics[:response_times].min || 0,
max_response_time_ms: @metrics[:response_times].max || 0
},
regional: @metrics[:regional_hits],
recommendations: generate_recommendations
}
end
def generate_recommendations
recommendations = []
hit_rate = (@metrics[:hits].to_f / (@metrics[:hits] + @metrics[:misses]) * 100).round(2)
if hit_rate < 70
recommendations << "Low cache hit rate (#{hit_rate}%). Consider increasing cache TTLs or implementing more aggressive caching."
end
if @metrics[:response_times].any? && percentile(95) > 100
recommendations << "High p95 response time (#{percentile(95)}ms). Consider optimizing cache lookup or reducing cache key complexity."
end
if @metrics[:invalidations] > @metrics[:writes] * 0.1
recommendations << "High invalidation rate. Review cache key strategy to reduce unnecessary invalidations."
end
recommendations
end
private
def percentile(p)
return 0 if @metrics[:response_times].empty?
sorted = @metrics[:response_times].sort
index = (p / 100.0 * (sorted.length - 1)).ceil
sorted[index]
end
end
# Integration with monitoring services
class MetricsExporter
def initialize(monitoring, exporters = [])
@monitoring = monitoring
@exporters = exporters
@export_interval = 60 # seconds
@export_thread = nil
end
def start
@export_thread = Thread.new do
loop do
export_metrics
sleep @export_interval
end
end
end
def stop
@export_thread&.kill
@export_thread = nil
end
private
def export_metrics
metrics = @monitoring.generate_report
@exporters.each do |exporter|
begin
exporter.export(metrics)
rescue => e
log("Failed to export metrics to #{exporter.class}: #{e.message}")
end
end
end
end
# Cloudflare Analytics exporter
class CloudflareAnalyticsExporter
def initialize(api_token, zone_id)
@client = Cloudflare::Client.new(api_token)
@zone_id = zone_id
end
def export(metrics)
# Format for Cloudflare Analytics
analytics_data = {
cache_hit_rate: metrics[:general][:hit_rate_percent],
cache_requests: metrics[:general][:total_requests],
avg_response_time: metrics[:performance][:avg_response_time_ms],
timestamp: Time.now.utc.iso8601
}
@client.send_analytics(@zone_id, analytics_data)
end
end
end
This distributed caching system provides enterprise-grade caching capabilities for Jekyll sites, combining Ruby's processing power with Cloudflare's global edge network. The system ensures fast content delivery worldwide while maintaining cache consistency and providing comprehensive monitoring for continuous optimization.