Distributed databases utilize materialized views to pre-compute expensive aggregations and join queries. However, maintaining view consistency under high-frequency writes incurs substantial synchronization latency. In this paper, we present an asynchronous, transactionally consistent refresh engine powered by WAL-based log parsing and lock-free event pipelines.
Background & Problem
Materialized views must be refreshed when their base tables change. There are two primary refresh strategies:
- Immediate Refresh: Refreshes the view synchronously within the user’s transaction. This degrades write latency and creates database locks.
- Deferred Refresh: Updates the view periodically or on-demand. This introduces read staleness and consistency issues.
In high-throughput systems, neither approach scales well under thousands of writes per second. We require a streaming asynchronous system that updates base table changes within milliseconds while remaining transactionally correct.
Log-Structured Refresh Engine
To solve this, we design a log-structured view refresh scheduler. The pipeline is divided into three asynchronous layers:
1. Write-Ahead Log (WAL) Parser
We intercept raw transactions directly from the database’s Write-Ahead Log. By parsing log records off-thread, we completely bypass query engine parsing overhead and base table locks.
2. Lock-Free Delta Aggregation
Parsed changes are emitted as delta streams into ring-buffer structures. We aggregate delta updates (e.g. COUNT, SUM, AVG) in-memory using lock-free hash tables, grouping updates by the view’s primary keys.
package refresh
import (
"sync/atomic"
)
type ViewDelta struct {
Key string
Count int64
Sum int64
}
// AccumulateDelta updates a view record atomically without global mutex locks.
func AccumulateDelta(delta *ViewDelta, countVal, sumVal int64) {
atomic.AddInt64(&delta.Count, countVal)
atomic.AddInt64(&delta.Sum, sumVal)
}
3. Batched Flush Engine
Accumulated deltas are flushed to the materialized views in batch transactions, minimizing disk commit cycles and base table conflicts.
Evaluation
We benchmarked our engine under a simulated base table write workload of 50,000 operations per second.
- Traditional Deferred Refresh: Led to disk I/O bottlenecks and transaction queues.
- Log-Structured Async Refresh: Maintained materialized view freshness with a median delay of 8.2ms while introducing 0% query lock overhead to the base table transactions.