The prototype currently treats the CSV as a read-only data source generated by backend analysis. The user wants to flip this: every user action becomes a hypergraph mutation appended to the CSV. The CSV becomes the single source of truth – an append-only event log of the hypergraph’s evolution. This aligns with the VISION.md concept of “time as computational progression” and “rule application scrubber.”
Key principles:
Replace the current unidirectional SSE (server->client) + HTTP POST (client->server) with a single WebSocket connection that carries both directions.
Protocol:
Client -> Server (JSON lines):
{"rows": [{"t":1234,"type":"DISTANCE","label":"spatial","source":"A","target":"B","importance_xi":142.5}, ...]}
Server -> Client (JSON lines, same format):
{"rows": [...]} // rows from other clients, or from backend analysis
{"ack": 5, "t": 1234} // acknowledgement of N rows received
The existing SSE streams (/stream, /graph-events, /focus-events) can be consolidated into this single WebSocket over time, but initially the WebSocket handles only frontend->backend append + backend->frontend rebroadcast.
Current CSV format: t,type,label,source,target,importance_xi,cluster
Extended format (backwards compatible – new row types, same columns):
| type | label | source | target | importance_xi | cluster | meaning |
|---|---|---|---|---|---|---|
ACTION |
stick |
nodeId | User pinned node | |||
ACTION |
unstick |
nodeId | User unpinned node | |||
ACTION |
lock |
nodeId | User locked node | |||
ACTION |
unlock |
nodeId | User unlocked node | |||
ACTION |
select |
nodeId | User selected node | |||
ACTION |
deselect |
nodeId | User deselected node | |||
ACTION |
expand |
clusterId | User expanded cluster | |||
ACTION |
collapse |
clusterId | User collapsed cluster | |||
ACTION |
importance |
nodeId | scaleFactor | User changed importance | ||
DISTANCE |
spatial | nodeA | nodeB | distance | User-defined spatial relation | |
EDGE |
user-group | nodeA | nodeB | weight | User-created grouping edge | |
EDGE |
remove-edge | nodeA | nodeB | edgeType | User removed edge |
Coalescence: High-frequency rows (select/deselect, rapid DISTANCE updates) accumulate in the CSV. A background process (or the graph engine itself) can periodically “rewrite” – replacing 50 select/deselect toggles with a single weighted “attention” edge. This is the hypergraph forgetting: recent events are granular, old events coalesce into summary edges. The append-only log IS the full history; the coalesced view is a projection.
When a user drags node A (throttled at 500ms during drag, plus final on drag-end):
DISTANCE rows over WebSocketTriangulation: 3+ distance constraints in 2D uniquely determine position (up to rotation/reflection, resolved by the physics sim). Adaptive K ensures sparse areas don’t over-constrain while dense areas are robust.
Recovery: On reload, the layout engine reads the most recent DISTANCE edges per node-pair and uses them as spring rest-lengths. Position is fully recoverable from the CSV alone – no localStorage needed.
Expand cluster C:
ACTION:expand, source:CCollapse cluster C:
ACTION:collapse, source:CinitialLayoutPositions, savedPositions, stickyNodes – CSV is truthCreating a “user cluster” = creating N edges of type user-group between selected nodes, with weights derived from current screen distances. Deleting = sending remove-edge rows. Claude naming can happen server-side as reaction to seeing new user-group edges.
What: Replace HTTP POST with WebSocket for frontend->backend communication. Backend appends to CSV and rebroadcasts.
| Dimension | Assessment |
|---|---|
| Importance | CRITICAL – foundation for everything else. Without this, every drag emits HTTP requests at 500ms intervals. WebSocket makes the throttled stream viable. |
| Ease | Medium. Node.js has ws package or built-in WebSocket via node:http upgrade. Frontend WebSocket API is native. The existing SSE streaming code in streamer.mjs provides a pattern. |
| Cost | ~200 lines backend (WebSocket server, CSV append, broadcast), ~100 lines frontend (WebSocket client, send/receive). 1-2 sessions. |
| Probability of success | 95%. WebSocket is well-understood. The only risk is coordinating with the existing SSE streams during transition. |
| Dependencies | None. Can be added alongside existing endpoints. |
What: Wire up nodeDrag.on('drag') (throttled 500ms) and nodeDrag.on('end') to compute adaptive K distances and send over the stream.
| Dimension | Assessment |
|---|---|
| Importance | HIGH – this is the core of “position as relational distance.” Without it, the CSV can’t recover positions. |
| Ease | Easy. The spatial grid (_nearbyNodes) already exists for collision detection. Computing K nearest neighbors is a minor extension. |
| Cost | ~80 lines. Find neighbors, compute distances, format rows, send. |
| Probability of success | 90%. Straightforward computation. The 10% risk is calibrating K and distance scaling so the physics engine converges to the right layout on reload. |
| Dependencies | Option A (needs stream to send over). |
What: Wire up all state-mutation points to also send ACTION rows over the stream.
| Dimension | Assessment |
|---|---|
| Importance | MEDIUM-HIGH – completes the “all actions are hypergraph mutations” vision. But the app works without it (local state still functions). |
| Ease | Easy. Each integration point is 2-5 lines: format a row, send it. The integration points are well-identified in the review. |
| Cost | ~60 lines across 8-10 call sites. |
| Probability of success | 95%. Trivial per-site changes. Only risk is missing an integration point. |
| Dependencies | Option A. |
What: Server-side handler that receives rows from WebSocket, validates, appends to history.csv, and rebroadcasts to other connected clients.
| Dimension | Assessment |
|---|---|
| Importance | CRITICAL – paired with Option A. The backend must persist what the frontend sends. |
| Ease | Easy. appendFileSync or fs.createWriteStream in append mode. Validation is a type whitelist. Broadcast reuses existing SSE pattern. |
| Cost | ~100 lines. Row validation, CSV formatting, file append, broadcast. |
| Probability of success | 95%. File append is simple. Risk: concurrent writes if multiple clients connect (mitigated by serializing through a write queue). |
| Dependencies | Option A (WebSocket server). |
What: Modify GraphPhysics and computeLayout to read DISTANCE edges as spring rest-lengths, with recent ones overriding older ones.
| Dimension | Assessment |
|---|---|
| Importance | HIGH – without this, DISTANCE rows are recorded but don’t affect layout. Position recovery on reload won’t work. |
| Ease | Medium. GraphPhysics already has edge attraction with type multipliers (EDGE_TYPE_MULT). Adding DISTANCE as another type with rest-length semantics fits the architecture. The “most recent overrides” logic needs a timestamp index. |
| Cost | ~150 lines. New edge type handling in physics tick, timestamp-based deduplication, rest-length spring math. |
| Probability of success | 75%. The physics engine may need tuning to balance DISTANCE springs against other forces. Risk of oscillation if DISTANCE constraints conflict with cluster forces. Needs careful testing. |
| Dependencies | Option B (DISTANCE rows must exist). |
What: Replace /cluster POST/DELETE with user-group EDGE rows sent over WebSocket. Remove the codemap-based cluster persistence.
| Dimension | Assessment |
|---|---|
| Importance | MEDIUM – conceptual unification. Current cluster system works but is a separate persistence mechanism (codemap file). |
| Ease | Medium. The cluster creation UI, rendering, and deletion all need rewiring. Claude naming becomes a server-side reaction rather than inline. |
| Cost | ~200 lines of changes across frontend and backend. |
| Probability of success | 80%. The codemap format is deeply integrated (historygen reads it, clustering uses it). Removing it has ripple effects. Safer to run both in parallel initially. |
| Dependencies | Options A+D. |
What: Implement cluster expand (node->children) and collapse (children->node) operations that preserve surrounding layout.
| Dimension | Assessment |
|---|---|
| Importance | HIGH – the fundamental missing operation from the review. Key to node-cluster duality. |
| Ease | Hard. This requires: (1) a cluster-as-node representation, (2) meta-edge to individual-edge transition, (3) position animation that grows/shrinks space, (4) DISTANCE recording for recovery. |
| Cost | ~500+ lines. New data structures, new rendering path, new animation. |
| Probability of success | 55%. Most architecturally complex change. The “don’t disturb surrounding nodes” constraint is the hard part – requires the physics engine to selectively freeze nodes. Many edge cases (what if surrounding nodes have DISTANCE edges to members of the expanding cluster?). |
| Dependencies | Options A+B+D+E. |
What: X-key computes target positions from most recent DISTANCE edges and animates back. Remove localStorage-based spatial memory.
| Dimension | Assessment |
|---|---|
| Importance | MEDIUM – clean conceptual simplification, but current X-rewind “works” (returns to initialLayoutPositions). |
| Ease | Medium. Requires solving DISTANCE constraints into positions (essentially running a mini physics sim to converge). The animation path already exists (animateToPositions). |
| Cost | ~150 lines. Constraint solver + wiring. |
| Probability of success | 70%. The constraint solver might not converge to the “right” positions if DISTANCE data is sparse or conflicting. Needs the DISTANCE model (Option E) to be solid first. |
| Dependencies | Option E. |
What: Background process that rewrites old high-frequency rows into summary edges.
| Dimension | Assessment |
|---|---|
| Importance | LOW (for now) – the CSV will work fine without coalescence for months. Only matters at scale. |
| Ease | Medium. Read CSV, group by time windows, replace N select/deselect with weighted attention edge, rewrite file. |
| Cost | ~200 lines. |
| Probability of success | 85%. Straightforward data processing. Risk: defining the right coalescence rules. |
| Dependencies | Options A+C+D (need action rows to exist first). |
┌──────────────┐
│ A: WebSocket │ ← Foundation, do first
│ Stream │
└──────┬───────┘
│
┌────────────┼────────────┐
│ │ │
┌────────▼──┐ ┌──────▼─────┐ ┌──▼──────────┐
│ B: DISTANCE│ │ C: ACTION │ │ D: Backend │
│ on drag │ │ rows │ │ CSV append │
└────────┬──┘ └────────────┘ └──┬──────────┘
│ │
└──────────┬─────────────┘
│
┌────────▼────────┐
│ E: DISTANCE in │
│ layout engine │
└────────┬────────┘
│
┌──────────┼──────────┐
│ │ │
┌────────▼──┐ ┌────▼─────┐ ┌─▼──────────┐
│ F: Clusters│ │H: X-key │ │G: Expand/ │
│ as edges │ │rewind │ │ Collapse │
└───────────┘ └──────────┘ └────────────┘
│
┌───────▼───────┐
│ I: Coalescence│
└───────────────┘
Tier 1 (Foundation): A + D together. WebSocket + backend append. ~300 lines, 1 session. Without this nothing else works.
Tier 2 (Core value): B + C in parallel. Frontend sends DISTANCE and ACTION rows. ~140 lines, 1 session. This is where user actions start flowing into the CSV.
Tier 3 (Layout integration): E alone. DISTANCE edges affect physics. ~150 lines, needs careful tuning. This is where position recovery becomes real.
Tier 4 (Conceptual cleanup): F + H in parallel. Clusters-as-edges + X-rewind-from-CSV. ~350 lines. Removes legacy persistence mechanisms.
Tier 5 (The big one): G. Expand/collapse. ~500+ lines. Hardest, most impactful, most risky.
Tier 6 (Scale): I. Coalescence. Only when the CSV gets large enough to matter.
runtime/history.csv for DISTANCE rows within 500ms/cluster