depgraph

Plan: Hypergraph-Native Architecture – Frontend Writes Back to CSV

Context

The prototype currently treats the CSV as a read-only data source generated by backend analysis. The user wants to flip this: every user action becomes a hypergraph mutation appended to the CSV. The CSV becomes the single source of truth – an append-only event log of the hypergraph’s evolution. This aligns with the VISION.md concept of “time as computational progression” and “rule application scrubber.”

Key principles:

Node positions are relational distances (edges to neighbors), not absolute coordinates. Position is emergent from triangulation.
User clusters are just user-created edges with weights/distances.
ALL user actions become CSV rows – including high-frequency ones like select/deselect. The hypergraph itself handles coalescence: high-volume rewrites naturally decay into unified edges over time. Actions taken ON a selection (commit, cluster, lock) are more durable.
Cluster expand/collapse must not disturb surrounding nodes.
Transient visual actions (Ctrl+click repulse, attractor) are visual-only. X-key rewinds to committed CSV state.

Decisions Made

Select/deselect: Persist ALL to CSV. The graph naturally “forgets” high-volume events through coalescence. Committed actions (cluster creation, lock) that reference a selection are more durable rewrites.
Distance neighbors: Adaptive K = min(8, visible_neighbors). Sparse areas get fewer, dense areas get more.
Drag commit: Throttled at 500ms during movement + debounced on drag-end. NOT via HTTP request – via bidirectional WebSocket stream.
Communication: Replace request/response with a persistent bidirectional stream (WebSocket). Frontend sends CSV rows continuously, backend appends and rebroadcasts.

Architecture

Bidirectional WebSocket Stream

Replace the current unidirectional SSE (server->client) + HTTP POST (client->server) with a single WebSocket connection that carries both directions.

Protocol:

Client -> Server (JSON lines):
  {"rows": [{"t":1234,"type":"DISTANCE","label":"spatial","source":"A","target":"B","importance_xi":142.5}, ...]}

Server -> Client (JSON lines, same format):
  {"rows": [...]}  // rows from other clients, or from backend analysis
  {"ack": 5, "t": 1234}  // acknowledgement of N rows received

The existing SSE streams (/stream, /graph-events, /focus-events) can be consolidated into this single WebSocket over time, but initially the WebSocket handles only frontend->backend append + backend->frontend rebroadcast.

The CSV as Append-Only Event Log

Current CSV format: t,type,label,source,target,importance_xi,cluster

Extended format (backwards compatible – new row types, same columns):

type	label	source	target	importance_xi	cluster	meaning
`ACTION`	`stick`	nodeId				User pinned node
`ACTION`	`unstick`	nodeId				User unpinned node
`ACTION`	`lock`	nodeId				User locked node
`ACTION`	`unlock`	nodeId				User unlocked node
`ACTION`	`select`	nodeId				User selected node
`ACTION`	`deselect`	nodeId				User deselected node
`ACTION`	`expand`	clusterId				User expanded cluster
`ACTION`	`collapse`	clusterId				User collapsed cluster
`ACTION`	`importance`	nodeId		scaleFactor		User changed importance
`DISTANCE`	spatial	nodeA	nodeB	distance		User-defined spatial relation
`EDGE`	user-group	nodeA	nodeB	weight		User-created grouping edge
`EDGE`	remove-edge	nodeA	nodeB		edgeType	User removed edge

Coalescence: High-frequency rows (select/deselect, rapid DISTANCE updates) accumulate in the CSV. A background process (or the graph engine itself) can periodically “rewrite” – replacing 50 select/deselect toggles with a single weighted “attention” edge. This is the hypergraph forgetting: recent events are granular, old events coalesce into summary edges. The append-only log IS the full history; the coalesced view is a projection.

Position-as-Distance Model

When a user drags node A (throttled at 500ms during drag, plus final on drag-end):

Find K nearest visible neighbors (K = min(8, visible_neighbors))
Compute Euclidean distance from A to each neighbor
Send DISTANCE rows over WebSocket
These become spring rest-lengths in the physics engine

Triangulation: 3+ distance constraints in 2D uniquely determine position (up to rotation/reflection, resolved by the physics sim). Adaptive K ensures sparse areas don’t over-constrain while dense areas are robust.

Recovery: On reload, the layout engine reads the most recent DISTANCE edges per node-pair and uses them as spring rest-lengths. Position is fully recoverable from the CSV alone – no localStorage needed.

Cluster Expand/Collapse Without Disturbance

Expand cluster C:

Send ACTION:expand, source:C
Cluster node replaced by its member nodes, placed at cluster’s position
Members spread using their internal DISTANCE edges (recorded when cluster was last open)
Surrounding nodes DO NOT MOVE – their distance-edges reference each other, not cluster internals
Space grows from cluster position outward

Collapse cluster C:

Record member-to-member distances (send DISTANCE rows)
Send ACTION:collapse, source:C
Members merge into cluster node at centroid
Surrounding distance-edges reference the cluster node
No position disturbance

X-Key Rewind: Committed vs Transient

Committed state: everything in the CSV
Transient state: current frame positions from visual interactions
X-key rewind: interpolates back to positions implied by most recent DISTANCE edges
Replaces initialLayoutPositions, savedPositions, stickyNodes – CSV is truth

User Clusters as Edges

Creating a “user cluster” = creating N edges of type user-group between selected nodes, with weights derived from current screen distances. Deleting = sending remove-edge rows. Claude naming can happen server-side as reaction to seeing new user-group edges.

Development Options: Prioritized Analysis

Option A: WebSocket Bidirectional Stream

What: Replace HTTP POST with WebSocket for frontend->backend communication. Backend appends to CSV and rebroadcasts.

Dimension	Assessment
Importance	CRITICAL – foundation for everything else. Without this, every drag emits HTTP requests at 500ms intervals. WebSocket makes the throttled stream viable.
Ease	Medium. Node.js has `ws` package or built-in WebSocket via `node:http` upgrade. Frontend WebSocket API is native. The existing SSE streaming code in `streamer.mjs` provides a pattern.
Cost	~200 lines backend (WebSocket server, CSV append, broadcast), ~100 lines frontend (WebSocket client, send/receive). 1-2 sessions.
Probability of success	95%. WebSocket is well-understood. The only risk is coordinating with the existing SSE streams during transition.
Dependencies	None. Can be added alongside existing endpoints.

Option B: Frontend Sends DISTANCE Rows on Drag

What: Wire up nodeDrag.on('drag') (throttled 500ms) and nodeDrag.on('end') to compute adaptive K distances and send over the stream.

Dimension	Assessment
Importance	HIGH – this is the core of “position as relational distance.” Without it, the CSV can’t recover positions.
Ease	Easy. The spatial grid (`_nearbyNodes`) already exists for collision detection. Computing K nearest neighbors is a minor extension.
Cost	~80 lines. Find neighbors, compute distances, format rows, send.
Probability of success	90%. Straightforward computation. The 10% risk is calibrating K and distance scaling so the physics engine converges to the right layout on reload.
Dependencies	Option A (needs stream to send over).

Option C: Frontend Sends ACTION Rows (stick/lock/select/importance)

What: Wire up all state-mutation points to also send ACTION rows over the stream.

Dimension	Assessment
Importance	MEDIUM-HIGH – completes the “all actions are hypergraph mutations” vision. But the app works without it (local state still functions).
Ease	Easy. Each integration point is 2-5 lines: format a row, send it. The integration points are well-identified in the review.
Cost	~60 lines across 8-10 call sites.
Probability of success	95%. Trivial per-site changes. Only risk is missing an integration point.
Dependencies	Option A.

Option D: Backend CSV Append + Rebroadcast

What: Server-side handler that receives rows from WebSocket, validates, appends to history.csv, and rebroadcasts to other connected clients.

Dimension	Assessment
Importance	CRITICAL – paired with Option A. The backend must persist what the frontend sends.
Ease	Easy. `appendFileSync` or `fs.createWriteStream` in append mode. Validation is a type whitelist. Broadcast reuses existing SSE pattern.
Cost	~100 lines. Row validation, CSV formatting, file append, broadcast.
Probability of success	95%. File append is simple. Risk: concurrent writes if multiple clients connect (mitigated by serializing through a write queue).
Dependencies	Option A (WebSocket server).

Option E: DISTANCE Edges in Layout Engine

What: Modify GraphPhysics and computeLayout to read DISTANCE edges as spring rest-lengths, with recent ones overriding older ones.

Dimension	Assessment
Importance	HIGH – without this, DISTANCE rows are recorded but don’t affect layout. Position recovery on reload won’t work.
Ease	Medium. `GraphPhysics` already has edge attraction with type multipliers (`EDGE_TYPE_MULT`). Adding DISTANCE as another type with rest-length semantics fits the architecture. The “most recent overrides” logic needs a timestamp index.
Cost	~150 lines. New edge type handling in physics tick, timestamp-based deduplication, rest-length spring math.
Probability of success	75%. The physics engine may need tuning to balance DISTANCE springs against other forces. Risk of oscillation if DISTANCE constraints conflict with cluster forces. Needs careful testing.
Dependencies	Option B (DISTANCE rows must exist).

Option F: User Clusters as Edges

What: Replace /cluster POST/DELETE with user-group EDGE rows sent over WebSocket. Remove the codemap-based cluster persistence.

Dimension	Assessment
Importance	MEDIUM – conceptual unification. Current cluster system works but is a separate persistence mechanism (codemap file).
Ease	Medium. The cluster creation UI, rendering, and deletion all need rewiring. Claude naming becomes a server-side reaction rather than inline.
Cost	~200 lines of changes across frontend and backend.
Probability of success	80%. The codemap format is deeply integrated (historygen reads it, clustering uses it). Removing it has ripple effects. Safer to run both in parallel initially.
Dependencies	Options A+D.

Option G: Expand/Collapse Without Disturbance

What: Implement cluster expand (node->children) and collapse (children->node) operations that preserve surrounding layout.

Dimension	Assessment
Importance	HIGH – the fundamental missing operation from the review. Key to node-cluster duality.
Ease	Hard. This requires: (1) a cluster-as-node representation, (2) meta-edge to individual-edge transition, (3) position animation that grows/shrinks space, (4) DISTANCE recording for recovery.
Cost	~500+ lines. New data structures, new rendering path, new animation.
Probability of success	55%. Most architecturally complex change. The “don’t disturb surrounding nodes” constraint is the hard part – requires the physics engine to selectively freeze nodes. Many edge cases (what if surrounding nodes have DISTANCE edges to members of the expanding cluster?).
Dependencies	Options A+B+D+E.

Option H: X-Key Rewind from CSV State

What: X-key computes target positions from most recent DISTANCE edges and animates back. Remove localStorage-based spatial memory.

Dimension	Assessment
Importance	MEDIUM – clean conceptual simplification, but current X-rewind “works” (returns to `initialLayoutPositions`).
Ease	Medium. Requires solving DISTANCE constraints into positions (essentially running a mini physics sim to converge). The animation path already exists (`animateToPositions`).
Cost	~150 lines. Constraint solver + wiring.
Probability of success	70%. The constraint solver might not converge to the “right” positions if DISTANCE data is sparse or conflicting. Needs the DISTANCE model (Option E) to be solid first.
Dependencies	Option E.

Option I: Coalescence (Graph Forgetting)

What: Background process that rewrites old high-frequency rows into summary edges.

Dimension	Assessment
Importance	LOW (for now) – the CSV will work fine without coalescence for months. Only matters at scale.
Ease	Medium. Read CSV, group by time windows, replace N select/deselect with weighted attention edge, rewrite file.
Cost	~200 lines.
Probability of success	85%. Straightforward data processing. Risk: defining the right coalescence rules.
Dependencies	Options A+C+D (need action rows to exist first).

Recommended Development Sequence

                    ┌──────────────┐
                    │ A: WebSocket │  ← Foundation, do first
                    │   Stream     │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
     ┌────────▼──┐  ┌──────▼─────┐  ┌──▼──────────┐
     │ B: DISTANCE│  │ C: ACTION  │  │ D: Backend  │
     │ on drag   │  │ rows       │  │ CSV append  │
     └────────┬──┘  └────────────┘  └──┬──────────┘
              │                        │
              └──────────┬─────────────┘
                         │
                ┌────────▼────────┐
                │ E: DISTANCE in  │
                │ layout engine   │
                └────────┬────────┘
                         │
              ┌──────────┼──────────┐
              │          │          │
     ┌────────▼──┐ ┌────▼─────┐ ┌─▼──────────┐
     │ F: Clusters│ │H: X-key  │ │G: Expand/  │
     │ as edges  │ │rewind    │ │ Collapse   │
     └───────────┘ └──────────┘ └────────────┘
                                      │
                              ┌───────▼───────┐
                              │ I: Coalescence│
                              └───────────────┘

Tier 1 (Foundation): A + D together. WebSocket + backend append. ~300 lines, 1 session. Without this nothing else works.

Tier 2 (Core value): B + C in parallel. Frontend sends DISTANCE and ACTION rows. ~140 lines, 1 session. This is where user actions start flowing into the CSV.

Tier 3 (Layout integration): E alone. DISTANCE edges affect physics. ~150 lines, needs careful tuning. This is where position recovery becomes real.

Tier 4 (Conceptual cleanup): F + H in parallel. Clusters-as-edges + X-rewind-from-CSV. ~350 lines. Removes legacy persistence mechanisms.

Tier 5 (The big one): G. Expand/collapse. ~500+ lines. Hardest, most impactful, most risky.

Tier 6 (Scale): I. Coalescence. Only when the CSV gets large enough to matter.

Files to Modify

depgraph-server.mjs – WebSocket upgrade handler, CSV append, broadcast
prototypes/index.html – WebSocket client, drag/action emission, layout engine DISTANCE support
stream/streamer.mjs – may need WebSocket forwarding or consolidation

Verification

Open WebSocket connection in browser DevTools Network tab – verify bidirectional messages
Drag a node – check runtime/history.csv for DISTANCE rows within 500ms
Restart the app – verify dragged node returns to committed position
Create user cluster – verify EDGE rows, no POST to /cluster
X-key after drag – verify rewind to CSV-committed positions
Two browser tabs open – verify one tab’s drags appear in the other

Open Questions

Should DISTANCE edges decay? Or are they permanent? The coalescence model suggests they naturally compress over time, but the most recent ones should always be authoritative.
WebSocket vs SSE bidirectional: WebSocket is the obvious choice, but the existing SSE infrastructure works well for server->client. Worth considering: keep SSE for streaming history, add WebSocket only for client->server? Or consolidate everything?
Conflict resolution: If two clients drag the same node simultaneously, whose DISTANCE edges win? Last-write-wins by timestamp is simplest.
CSV file size: With 500ms throttled drags, a 10-minute session with active dragging produces ~1200 DISTANCE rows. Manageable, but coalescence will matter eventually.