Flink: Decoding Time
So far in this series, we’ve built a system that captures events, enriches them with AI agents, and moves them through Kafka topics. Each event is processed individually: a link arrives, gets fetched, gets summarized, gets stored.
But here’s the thing: the most interesting insights don’t live in single events. They live in the spaces between events. The pattern of what you’ve been reading this week. The cluster of related topics you’ve been exploring. The sudden spike in activity around a particular project.
To decode these temporal patterns, we need to think about time differently. We need stream processing.
Why Stream Processing?
Let’s start with what we already have. Our Kafka consumers process events one at a time:
// Current approach: process each event independently
consumer.subscribe({ topic: 'links.enriched' });
for await (const message of consumer) {
const link = JSON.parse(message.value);
await processLink(link); // No awareness of surrounding events
}
This works great for enrichment. But what if we want to answer questions like:
- “What topics have I been most interested in over the past 7 days?”
- “Are there clusters of related links I’ve saved recently?”
- “Has my reading pattern changed this month compared to last?”
These questions require temporal context: the ability to look at events not in isolation, but as part of a flowing stream with a sense of time.
This is where Apache Flink enters the picture. Flink is a stream processing framework that treats time as a first-class citizen. It can window events, detect patterns across time, and maintain state as data flows through.
Time Windows
The fundamental building block of temporal analysis is the window: a bounded slice of the infinite stream. Flink supports several window types, each suited to different questions.
Tumbling Windows
Tumbling windows divide time into fixed, non-overlapping chunks. Think of them as buckets that fill up and then close.
// Conceptual Flink: count links saved per hour
DataStream<LinkEvent> links = source.map(this::parseLink);
links
.keyBy(link -> link.getUserId())
.window(TumblingEventTimeWindows.of(Time.hours(1)))
.aggregate(new LinkCounter())
.addSink(new AlertSink());
Use case: “How many links did I save each hour today?” Perfect for regular reporting intervals and activity summaries.
Timeline: |----Hour 1----|----Hour 2----|----Hour 3----|
Events: [* * * *] [* *] [* * * * *]
Output: count: 4 count: 2 count: 5
Sliding Windows
Sliding windows overlap, giving you a moving average view of the stream. They’re defined by both size and slide interval.
// Conceptual Flink: rolling 24-hour topic frequency
links
.keyBy(link -> extractMainTopic(link))
.window(SlidingEventTimeWindows.of(Time.hours(24), Time.hours(1)))
.aggregate(new TopicFrequencyAggregator())
.addSink(new TrendTracker());
Use case: “What are my trending topics over the last 24 hours, updated hourly?” The overlap means you catch patterns that might straddle window boundaries.
Window 1: |----------24h----------|
Window 2: |----------24h----------|
Window 3: |----------24h----------|
^---1h---^---1h---^
Session Windows
Session windows are perhaps the most interesting for personal streams. They group events by activity gaps: events close together form a session, and a period of inactivity closes it.
// Conceptual Flink: detect research sessions
links
.keyBy(link -> link.getUserId())
.window(EventTimeSessionWindows.withGap(Time.minutes(30)))
.process(new SessionAnalyzer())
.addSink(new SessionStore());
Use case: “What did my research session look like this afternoon?” If you save five links in quick succession, then nothing for an hour, that’s one session, likely representing a focused burst of exploration on a topic.
Pattern Detection
Beyond simple aggregations, Flink can detect complex patterns across events using its CEP (Complex Event Processing) library. This is where things get interesting for Vibe Decoding.
Connecting Related Events
Remember our schema’s correlation_id field? It exists precisely for this use case:
-- From our schema: tracing fields in the events table
correlation_id uuid null, -- ties together a pipeline run
causation_id uuid null, -- which prior event caused this event
With Flink, we can use these IDs to reconstruct the full journey of an event through our system:
// Conceptual: track an event through its entire lifecycle
Pattern<Event, ?> lifecyclePattern = Pattern.<Event>begin("created")
.where(event -> event.getType().equals("created"))
.followedBy("fetched")
.where(event -> event.getType().equals("content.fetched"))
.followedBy("enriched")
.where(event -> event.getType().equals("enriched"))
.within(Time.minutes(5));
CEP.pattern(events.keyBy(Event::getCorrelationId), lifecyclePattern)
.process(new LifecycleTracker());
This lets us answer: “How long does it typically take for a link to move from saved to enriched? Are there bottlenecks?”
Detecting Reading Patterns
More powerful still is detecting semantic patterns. Imagine identifying when you’re going deep on a topic:
// Conceptual: detect "deep dive" behavior
Pattern<EnrichedLink, ?> deepDivePattern = Pattern.<EnrichedLink>begin("first")
.where(link -> true) // any link
.followedBy("related")
.where((first, ctx) -> {
EnrichedLink previous = ctx.getEventsForPattern("first").get(0);
return semanticSimilarity(previous, link) > 0.7;
})
.oneOrMore()
.within(Time.hours(2));
When triggered, this could fire a notification: “You seem to be researching distributed systems. Want me to surface related bookmarks from your archive?”
Use Cases for Vibe Decoding
Let’s ground this in concrete scenarios for our personal life stream.
Time-Series Anomaly Detection
Our schema includes temperature readings from home sensors:
-- From schema.sql
create table if not exists temperature_readings (
subject_id text not null, -- "sensor:living_room"
occurred_at timestamptz not null,
celsius double precision not null,
humidity double precision null,
battery double precision null,
primary key (subject_id, occurred_at)
);
Flink can process this stream to detect anomalies:
// Conceptual: detect unusual temperature patterns
temperatureStream
.keyBy(reading -> reading.getSensorId())
.window(SlidingEventTimeWindows.of(Time.hours(1), Time.minutes(5)))
.aggregate(new StatsAggregator()) // mean, stddev
.process(new AnomalyDetector()) // flag readings > 2 stddev
.addSink(new AlertChannel());
Beyond simple thresholds, we can detect patterns: “The living room temperature has been climbing for three hours straight”, which might indicate a stuck thermostat or an open window.
Topic Clustering Over Time
As links flow through our system, Flink can maintain a running view of topic clusters:
// Conceptual: maintain evolving topic clusters
enrichedLinks
.keyBy(link -> "global") // single partition for clustering
.window(TumblingEventTimeWindows.of(Time.days(7)))
.process(new TopicClusterer())
.addSink(new WeeklyDigestGenerator());
The output: “This week you saved 23 links. They cluster into three main areas: stream processing (8), personal knowledge management (9), and home automation (6).”
Cross-Stream Correlation
The real power emerges when we join multiple streams. Imagine correlating your reading patterns with your calendar:
// Conceptual: correlate reading with context
DataStream<LinkEvent> links = ...;
DataStream<CalendarEvent> calendar = ...;
links
.join(calendar)
.where(link -> link.getUserId())
.equalTo(event -> event.getUserId())
.window(TumblingEventTimeWindows.of(Time.days(1)))
.apply((link, calEvent) -> {
return new ContextualReading(link, calEvent.getContext());
});
This could reveal: “You tend to save AI/ML links on days with ‘Research’ blocks in your calendar. You save productivity links on heavy meeting days.” Patterns you might never notice consciously.
Architecture Preview
Where does Flink fit in our existing system? Here’s how the pieces connect:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Sources │────▶│ Kafka │────▶│ Agents │
│ (Chrome, │ │ (Topics) │ │ (Enrichment)│
│ Phone...) │ └─────────────┘ └─────────────┘
└─────────────┘ │ │
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ Flink │────▶│ Supabase │
│ (Temporal │ │ (State + │
│ Analysis) │ │ Query) │
└─────────────┘ └─────────────┘
│
▼
┌─────────────┐
│ Insights │
│ (Alerts, │
│ Digests) │
└─────────────┘
Flink doesn’t replace our agents, it complements them. Agents handle per-event intelligence (summarization, entity extraction). Flink handles cross-event intelligence (patterns, trends, correlations).
Event Time vs. Processing Time
One crucial concept: Flink distinguishes between event time (when something happened) and processing time (when we process it). Our schema already captures this:
occurred_at timestamptz not null, -- when it happened at the source
received_at timestamptz not null, -- when we stored it
For Vibe Decoding, we almost always want event time. If you saved a link at 3pm but we didn’t process it until 5pm, we want window assignments based on 3pm. This matters for accurate pattern detection.
Watermarks and Late Data
Real-world streams are messy. Events arrive late, out of order, or not at all. Flink handles this with watermarks: markers that say “we believe we’ve seen all events up to this time.”
For a personal system, we can be relatively lenient:
// Conceptual: allow events up to 1 hour late
WatermarkStrategy
.<Event>forBoundedOutOfOrderness(Duration.ofHours(1))
.withTimestampAssigner((event, timestamp) ->
event.getOccurredAt().toEpochMilli()
);
Late data isn’t lost, it just might miss its intended window. For historical analysis, we can always reprocess.
What’s Coming
When we implement Flink, we’ll add:
- Weekly Digest Pipeline - Automated summaries of reading patterns
- Real-time Trend Detection - Alerts when topics spike in your stream
- Session Reconstruction - Grouping related research into coherent sessions
- Cross-Stream Joins - Correlating different event types for richer context
The key insight is that Flink transforms our event store from a log into a living, temporal data structure. Events stop being isolated facts and start forming narratives.
Stream processing is where Vibe Decoding moves from “capturing signals” to “understanding patterns.” The events we’ve been collecting since Part 1 become raw material for temporal analysis. The agents from Part 2 provide the semantic foundation. Kafka from Part 3 ensures we never lose a signal.
Flink ties it all together with the dimension we’ve been missing: time.
Ready to continue? Part 5 explores voice notes and context triggers: how we capture and surface insights through natural conversation (coming soon).