depgraph

Plan: Hypergraph-Native Architecture – Frontend Writes Back to CSV

Context

The prototype currently treats the CSV as a read-only data source generated by backend analysis. The user wants to flip this: every user action becomes a hypergraph mutation appended to the CSV. The CSV becomes the single source of truth – an append-only event log of the hypergraph’s evolution. This aligns with the VISION.md concept of “time as computational progression” and “rule application scrubber.”

Key principles:

  1. Node positions are relational distances (edges to neighbors), not absolute coordinates. Position is emergent from triangulation.
  2. User clusters are just user-created edges with weights/distances.
  3. ALL user actions become CSV rows – including high-frequency ones like select/deselect. The hypergraph itself handles coalescence: high-volume rewrites naturally decay into unified edges over time. Actions taken ON a selection (commit, cluster, lock) are more durable.
  4. Cluster expand/collapse must not disturb surrounding nodes.
  5. Transient visual actions (Ctrl+click repulse, attractor) are visual-only. X-key rewinds to committed CSV state.

Decisions Made

Architecture

Bidirectional WebSocket Stream

Replace the current unidirectional SSE (server->client) + HTTP POST (client->server) with a single WebSocket connection that carries both directions.

Protocol:

Client -> Server (JSON lines):
  {"rows": [{"t":1234,"type":"DISTANCE","label":"spatial","source":"A","target":"B","importance_xi":142.5}, ...]}

Server -> Client (JSON lines, same format):
  {"rows": [...]}  // rows from other clients, or from backend analysis
  {"ack": 5, "t": 1234}  // acknowledgement of N rows received

The existing SSE streams (/stream, /graph-events, /focus-events) can be consolidated into this single WebSocket over time, but initially the WebSocket handles only frontend->backend append + backend->frontend rebroadcast.

The CSV as Append-Only Event Log

Current CSV format: t,type,label,source,target,importance_xi,cluster

Extended format (backwards compatible – new row types, same columns):

type label source target importance_xi cluster meaning
ACTION stick nodeId       User pinned node
ACTION unstick nodeId       User unpinned node
ACTION lock nodeId       User locked node
ACTION unlock nodeId       User unlocked node
ACTION select nodeId       User selected node
ACTION deselect nodeId       User deselected node
ACTION expand clusterId       User expanded cluster
ACTION collapse clusterId       User collapsed cluster
ACTION importance nodeId   scaleFactor   User changed importance
DISTANCE spatial nodeA nodeB distance   User-defined spatial relation
EDGE user-group nodeA nodeB weight   User-created grouping edge
EDGE remove-edge nodeA nodeB   edgeType User removed edge

Coalescence: High-frequency rows (select/deselect, rapid DISTANCE updates) accumulate in the CSV. A background process (or the graph engine itself) can periodically “rewrite” – replacing 50 select/deselect toggles with a single weighted “attention” edge. This is the hypergraph forgetting: recent events are granular, old events coalesce into summary edges. The append-only log IS the full history; the coalesced view is a projection.

Position-as-Distance Model

When a user drags node A (throttled at 500ms during drag, plus final on drag-end):

  1. Find K nearest visible neighbors (K = min(8, visible_neighbors))
  2. Compute Euclidean distance from A to each neighbor
  3. Send DISTANCE rows over WebSocket
  4. These become spring rest-lengths in the physics engine

Triangulation: 3+ distance constraints in 2D uniquely determine position (up to rotation/reflection, resolved by the physics sim). Adaptive K ensures sparse areas don’t over-constrain while dense areas are robust.

Recovery: On reload, the layout engine reads the most recent DISTANCE edges per node-pair and uses them as spring rest-lengths. Position is fully recoverable from the CSV alone – no localStorage needed.

Cluster Expand/Collapse Without Disturbance

Expand cluster C:

  1. Send ACTION:expand, source:C
  2. Cluster node replaced by its member nodes, placed at cluster’s position
  3. Members spread using their internal DISTANCE edges (recorded when cluster was last open)
  4. Surrounding nodes DO NOT MOVE – their distance-edges reference each other, not cluster internals
  5. Space grows from cluster position outward

Collapse cluster C:

  1. Record member-to-member distances (send DISTANCE rows)
  2. Send ACTION:collapse, source:C
  3. Members merge into cluster node at centroid
  4. Surrounding distance-edges reference the cluster node
  5. No position disturbance

X-Key Rewind: Committed vs Transient

User Clusters as Edges

Creating a “user cluster” = creating N edges of type user-group between selected nodes, with weights derived from current screen distances. Deleting = sending remove-edge rows. Claude naming can happen server-side as reaction to seeing new user-group edges.


Development Options: Prioritized Analysis

Option A: WebSocket Bidirectional Stream

What: Replace HTTP POST with WebSocket for frontend->backend communication. Backend appends to CSV and rebroadcasts.

Dimension Assessment
Importance CRITICAL – foundation for everything else. Without this, every drag emits HTTP requests at 500ms intervals. WebSocket makes the throttled stream viable.
Ease Medium. Node.js has ws package or built-in WebSocket via node:http upgrade. Frontend WebSocket API is native. The existing SSE streaming code in streamer.mjs provides a pattern.
Cost ~200 lines backend (WebSocket server, CSV append, broadcast), ~100 lines frontend (WebSocket client, send/receive). 1-2 sessions.
Probability of success 95%. WebSocket is well-understood. The only risk is coordinating with the existing SSE streams during transition.
Dependencies None. Can be added alongside existing endpoints.

Option B: Frontend Sends DISTANCE Rows on Drag

What: Wire up nodeDrag.on('drag') (throttled 500ms) and nodeDrag.on('end') to compute adaptive K distances and send over the stream.

Dimension Assessment
Importance HIGH – this is the core of “position as relational distance.” Without it, the CSV can’t recover positions.
Ease Easy. The spatial grid (_nearbyNodes) already exists for collision detection. Computing K nearest neighbors is a minor extension.
Cost ~80 lines. Find neighbors, compute distances, format rows, send.
Probability of success 90%. Straightforward computation. The 10% risk is calibrating K and distance scaling so the physics engine converges to the right layout on reload.
Dependencies Option A (needs stream to send over).

Option C: Frontend Sends ACTION Rows (stick/lock/select/importance)

What: Wire up all state-mutation points to also send ACTION rows over the stream.

Dimension Assessment
Importance MEDIUM-HIGH – completes the “all actions are hypergraph mutations” vision. But the app works without it (local state still functions).
Ease Easy. Each integration point is 2-5 lines: format a row, send it. The integration points are well-identified in the review.
Cost ~60 lines across 8-10 call sites.
Probability of success 95%. Trivial per-site changes. Only risk is missing an integration point.
Dependencies Option A.

Option D: Backend CSV Append + Rebroadcast

What: Server-side handler that receives rows from WebSocket, validates, appends to history.csv, and rebroadcasts to other connected clients.

Dimension Assessment
Importance CRITICAL – paired with Option A. The backend must persist what the frontend sends.
Ease Easy. appendFileSync or fs.createWriteStream in append mode. Validation is a type whitelist. Broadcast reuses existing SSE pattern.
Cost ~100 lines. Row validation, CSV formatting, file append, broadcast.
Probability of success 95%. File append is simple. Risk: concurrent writes if multiple clients connect (mitigated by serializing through a write queue).
Dependencies Option A (WebSocket server).

Option E: DISTANCE Edges in Layout Engine

What: Modify GraphPhysics and computeLayout to read DISTANCE edges as spring rest-lengths, with recent ones overriding older ones.

Dimension Assessment
Importance HIGH – without this, DISTANCE rows are recorded but don’t affect layout. Position recovery on reload won’t work.
Ease Medium. GraphPhysics already has edge attraction with type multipliers (EDGE_TYPE_MULT). Adding DISTANCE as another type with rest-length semantics fits the architecture. The “most recent overrides” logic needs a timestamp index.
Cost ~150 lines. New edge type handling in physics tick, timestamp-based deduplication, rest-length spring math.
Probability of success 75%. The physics engine may need tuning to balance DISTANCE springs against other forces. Risk of oscillation if DISTANCE constraints conflict with cluster forces. Needs careful testing.
Dependencies Option B (DISTANCE rows must exist).

Option F: User Clusters as Edges

What: Replace /cluster POST/DELETE with user-group EDGE rows sent over WebSocket. Remove the codemap-based cluster persistence.

Dimension Assessment
Importance MEDIUM – conceptual unification. Current cluster system works but is a separate persistence mechanism (codemap file).
Ease Medium. The cluster creation UI, rendering, and deletion all need rewiring. Claude naming becomes a server-side reaction rather than inline.
Cost ~200 lines of changes across frontend and backend.
Probability of success 80%. The codemap format is deeply integrated (historygen reads it, clustering uses it). Removing it has ripple effects. Safer to run both in parallel initially.
Dependencies Options A+D.

Option G: Expand/Collapse Without Disturbance

What: Implement cluster expand (node->children) and collapse (children->node) operations that preserve surrounding layout.

Dimension Assessment
Importance HIGH – the fundamental missing operation from the review. Key to node-cluster duality.
Ease Hard. This requires: (1) a cluster-as-node representation, (2) meta-edge to individual-edge transition, (3) position animation that grows/shrinks space, (4) DISTANCE recording for recovery.
Cost ~500+ lines. New data structures, new rendering path, new animation.
Probability of success 55%. Most architecturally complex change. The “don’t disturb surrounding nodes” constraint is the hard part – requires the physics engine to selectively freeze nodes. Many edge cases (what if surrounding nodes have DISTANCE edges to members of the expanding cluster?).
Dependencies Options A+B+D+E.

Option H: X-Key Rewind from CSV State

What: X-key computes target positions from most recent DISTANCE edges and animates back. Remove localStorage-based spatial memory.

Dimension Assessment
Importance MEDIUM – clean conceptual simplification, but current X-rewind “works” (returns to initialLayoutPositions).
Ease Medium. Requires solving DISTANCE constraints into positions (essentially running a mini physics sim to converge). The animation path already exists (animateToPositions).
Cost ~150 lines. Constraint solver + wiring.
Probability of success 70%. The constraint solver might not converge to the “right” positions if DISTANCE data is sparse or conflicting. Needs the DISTANCE model (Option E) to be solid first.
Dependencies Option E.

Option I: Coalescence (Graph Forgetting)

What: Background process that rewrites old high-frequency rows into summary edges.

Dimension Assessment
Importance LOW (for now) – the CSV will work fine without coalescence for months. Only matters at scale.
Ease Medium. Read CSV, group by time windows, replace N select/deselect with weighted attention edge, rewrite file.
Cost ~200 lines.
Probability of success 85%. Straightforward data processing. Risk: defining the right coalescence rules.
Dependencies Options A+C+D (need action rows to exist first).

                    ┌──────────────┐
                    │ A: WebSocket │  ← Foundation, do first
                    │   Stream     │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
     ┌────────▼──┐  ┌──────▼─────┐  ┌──▼──────────┐
     │ B: DISTANCE│  │ C: ACTION  │  │ D: Backend  │
     │ on drag   │  │ rows       │  │ CSV append  │
     └────────┬──┘  └────────────┘  └──┬──────────┘
              │                        │
              └──────────┬─────────────┘
                         │
                ┌────────▼────────┐
                │ E: DISTANCE in  │
                │ layout engine   │
                └────────┬────────┘
                         │
              ┌──────────┼──────────┐
              │          │          │
     ┌────────▼──┐ ┌────▼─────┐ ┌─▼──────────┐
     │ F: Clusters│ │H: X-key  │ │G: Expand/  │
     │ as edges  │ │rewind    │ │ Collapse   │
     └───────────┘ └──────────┘ └────────────┘
                                      │
                              ┌───────▼───────┐
                              │ I: Coalescence│
                              └───────────────┘

Tier 1 (Foundation): A + D together. WebSocket + backend append. ~300 lines, 1 session. Without this nothing else works.

Tier 2 (Core value): B + C in parallel. Frontend sends DISTANCE and ACTION rows. ~140 lines, 1 session. This is where user actions start flowing into the CSV.

Tier 3 (Layout integration): E alone. DISTANCE edges affect physics. ~150 lines, needs careful tuning. This is where position recovery becomes real.

Tier 4 (Conceptual cleanup): F + H in parallel. Clusters-as-edges + X-rewind-from-CSV. ~350 lines. Removes legacy persistence mechanisms.

Tier 5 (The big one): G. Expand/collapse. ~500+ lines. Hardest, most impactful, most risky.

Tier 6 (Scale): I. Coalescence. Only when the CSV gets large enough to matter.

Files to Modify

Verification

Open Questions

  1. Should DISTANCE edges decay? Or are they permanent? The coalescence model suggests they naturally compress over time, but the most recent ones should always be authoritative.
  2. WebSocket vs SSE bidirectional: WebSocket is the obvious choice, but the existing SSE infrastructure works well for server->client. Worth considering: keep SSE for streaming history, add WebSocket only for client->server? Or consolidate everything?
  3. Conflict resolution: If two clients drag the same node simultaneously, whose DISTANCE edges win? Last-write-wins by timestamp is simplest.
  4. CSV file size: With 500ms throttled drags, a 10-minute session with active dragging produces ~1200 DISTANCE rows. Manageable, but coalescence will matter eventually.