TypeGraph: Graph queries without a graph database
April 23, 2026
You keep building the same thing.
Permissions that inherit through a role hierarchy. An org chart where “reports to” needs to be traversable. Content that links to other content — related articles, prerequisites, see-also. A marketplace where buyers, sellers, products, and reviews all connect in ways that matter.
Each time, you model it a little differently. Different tables, different join patterns, different bugs. But the shape of the problem is always the same: entities connected by typed, meaningful relationships, and queries that need to follow those connections.
These are graph problems. You’ve been solving them in SQL for years. The joins are ugly, the recursive CTEs are fragile, but the system is stable. It ships. It works. You move on.
Then someone walks into a planning meeting and says “we need to build a knowledge graph” or “can we RAG this?” or “the AI needs to understand how these entities relate to each other.”
And suddenly the relational model that was good enough for a decade is actively in the way.
AI changed the equation. The data that used to sit in tables and get queried by humans through forms now needs to be traversable by models. An LLM doesn’t want to know that user_id = 47 has role_id = 3 which maps to permission_id = 12. It needs to know that Alice is an Admin, which means she’s also an Editor and a Viewer, and that she manages the Engineering team, which owns these three projects. It needs the graph — the relationships, the inheritance, the structure — not the join tables.
RAG pipelines need to explain why documents are related, not just that their embeddings are similar. Recommendation systems need to traverse actual connections, not approximate them with collaborative filtering. Permission systems that were “fine” now need to be introspectable so an AI agent can reason about what it’s allowed to do.
The graph problems that were hiding in your relational model? They just got promoted to the critical path.
The options aren’t great
You look at what’s available and the choices are:
-
Deploy a graph database — Neo4j, Amazon Neptune, etc. Real graph semantics, real query languages. Also real operational overhead: a new database to deploy, monitor, back up, and keep in sync with your primary data store. A new query language to learn. For most applications, this is a sledgehammer for a nail.
-
Write more recursive SQL — CTEs, self-joins, adjacency lists. The approach that was already straining under the old requirements now needs to support AI-driven traversals, ontological reasoning, and vector search. The SQL is doing triple duty: encoding the mechanics of traversal, the semantics of your domain, and now the context pipeline for your LLM.
Most teams pick option 2 because option 1 is too much infrastructure for the problem at hand. And that’s the right call — until the SQL becomes the thing everyone’s afraid to touch.
TypeGraph is a third option. It’s an open-source knowledge graph library for TypeScript that runs on the Postgres or SQLite you already have. Graph semantics as an application layer, not a new database. npm install and go.
I built it because I kept hitting this same gap across projects. Permissions, content relationships, knowledge graphs for RAG — these are all graph problems, and they all eventually outgrow flat relational modeling. But they rarely justify a dedicated graph database.
What makes it a graph library and not just an ORM
ORMs model tables. TypeGraph models relationships.
The distinction matters more than it sounds. An ORM gives you foreign keys and joins — which is enough to store a graph, but not enough to reason about one. TypeGraph gives you typed edges with properties, recursive traversals with depth tracking, connectivity algorithms, native fulltext and vector search, and — critically — an ontology layer that lets you declare the meaning of relationships, not just their existence.
The examples below are where “meaning” earns its keep.
Permissions that inherit
Let’s start with something every application has: access control. “Can this user do this thing?” Simple question. Here’s what the SQL looks like when permissions inherit through a role hierarchy:
WITH RECURSIVE role_hierarchy AS (
SELECT r.id, r.name, r.parent_role_id
FROM roles r
JOIN user_roles ur ON ur.role_id = r.id
WHERE ur.user_id = $1
UNION ALL
SELECT r.id, r.name, r.parent_role_id
FROM roles r
JOIN role_hierarchy rh ON rh.id = r.parent_role_id
),
effective_permissions AS (
SELECT DISTINCT p.action, p.resource
FROM permissions p
JOIN role_permissions rp ON rp.permission_id = p.id
JOIN role_hierarchy rh ON rh.id = rp.role_id
)
SELECT EXISTS (
SELECT 1 FROM effective_permissions
WHERE action = $2 AND resource = $3
);
It works. You ship it. The recursive CTE is ugly, but it answers the question correctly and the query plan is stable. That’s what matters.
Then the graph gets more complicated
Six months in, requirements pile up.
Permissions need to inherit at the team level too — if you’re on the Engineering team, you inherit Engineering’s permissions on Engineering’s resources. Add a teams table, a user_teams junction, a team_permissions table. Your CTE grows a second recursive branch to walk team memberships alongside role hierarchies.
Then product wants time-bounded delegations — Alice grants her approval rights to Bob while she’s on vacation. Now every role assignment needs valid_from and valid_to columns, and every permission check needs a WHERE valid_from <= now() AND (valid_to IS NULL OR valid_to > now()) clause threaded through every branch of the CTE.
Then resources become hierarchical. Permission on a folder cascades to the files inside it. Another recursive CTE, this time walking the resource tree, joined against the permission CTE you already had.
Then someone asks “why does Alice have edit access to this file?” and you need to return the path — which role granted it, through which inheritance chain, via which team, for which reason. You bolt on array columns to accumulate the path as you traverse. The query planner starts making choices you don’t understand.
Your permission check is now eighty lines of SQL and two engineers on the team fully understand it. The rest copy-paste and hope.
This is the ugly-but-stable phase. It works. Nobody’s happy, but it works.
The real kicker: reasoning
Then a requirement lands that breaks the model entirely.
“Admin implies Editor implies Viewer. If someone is an Admin, they should automatically have every Editor permission, which includes every Viewer permission. We shouldn’t have to grant the same permissions to all three roles manually.”
You reach for SQL and realize: where does that rule live?
You can’t put it in the roles table — that table stores roles, not rules about roles. You could add a role_implies junction and walk it recursively, but that just encodes which roles imply which others. It doesn’t capture what implication means. And the moment you have more than one kind of hierarchical relationship — role inheritance, team inheritance, resource containment — each one needs its own traversal logic, written by hand, in every query that cares about it.
So the rule ends up in application code. A getEffectivePermissions function in your service layer that knows Admin expands to Editor. Middleware that hardcodes if (role === 'admin') also grant editor. The mobile app duplicates the logic in TypeScript. The background worker duplicates it in Python. The data warehouse has its own copy in SQL for analytics queries.
The meaning of your permission model is now scattered across six codebases. It drifts — the mobile app thinks Admin implies Editor, but someone forgot to update the analytics copy, and your “users who can edit” dashboard is wrong for three quarters before anyone notices. And critically: the meaning is invisible to anything that can’t read your source code.
Which, now, includes your LLM.
This is the thing that breaks when AI enters the picture. An AI agent trying to reason about “what can this user do?” can’t read your middleware. It can only see the data. If the meaning of your permission model lives in code, the agent is blind to it. And no amount of prompt engineering fixes that — you can’t describe implicit logic that’s scattered across six services in a system prompt.
The same domain in TypeGraph
Here’s the schema:
const User = defineNode("User", {
schema: z.object({ username: z.string() }),
});
const Role = defineNode("Role", {
schema: z.object({ name: z.string() }),
});
const Permission = defineNode("Permission", {
schema: z.object({
action: z.string(),
resource: z.string(),
}),
});
const hasRole = defineEdge("hasRole");
const hasPermission = defineEdge("hasPermission");
Nodes are entities. Edges are relationships. The graph wires them together:
const graph = defineGraph({
id: "rbac_system",
nodes: {
User: { type: User },
Role: { type: Role },
Permission: { type: Permission },
},
edges: {
hasRole: { type: hasRole, from: [User], to: [Role] },
hasPermission: { type: hasPermission, from: [Role, User], to: [Permission] },
},
});
hasPermission can originate from either a Role or a User — one relationship type, multiple valid sources. In SQL that’s a junction table with nullable foreign keys, or a separate table per source type. Here it’s from: [Role, User].
Now the permission check:
async function checkPermission(userId: string, action: string) {
const result = await store
.query()
.from("User", "u")
.whereNode("u", (p) => p.id.eq(userId))
.optionalTraverse("hasRole", "r_edge")
.to("Role", "r")
.traverse("hasPermission", "p_edge")
.to("Permission", "p")
.whereNode("p", (p) => p.action.eq(action))
.execute();
return result.length > 0;
}
Read it out loud: start at a User, optionally traverse to their Roles, then traverse to Permissions, filter by action. The optionalTraverse means direct user permissions also match — no role required. The TypeGraph version reads like a description of the problem. The CTE reads like an algorithm for solving it.
But the interesting part isn’t the query. It’s what happens next.
Declaring meaning, not just structure
Remember the rule that broke the SQL model — “Admin implies Editor implies Viewer”? In TypeGraph, that’s a declaration in your schema:
const admin = defineEdge("admin");
const editor = defineEdge("editor");
const viewer = defineEdge("viewer");
admin.implies(editor);
editor.implies(viewer);
That’s the whole thing. You’ve now told the system that these relationships chain. When you query for everyone with viewer access, the engine automatically includes people with editor or admin — through the implication chain, at the database level, as part of the compiled query.
You don’t maintain a getEffectivePermissions function. You don’t duplicate the logic in the mobile app. You don’t hardcode if role === 'admin' in middleware. The rule is data. Every query respects it. Every service that reads the graph inherits the semantics for free.
And because it’s data, it’s introspectable. An AI agent can query the ontology directly and get a machine-readable answer about what Admin means — not buried in a TypeScript function three microservices away, but right there in the graph alongside everything else.
The same pattern extends beyond implication. subClassOf lets you declare that Podcast is a kind of Media, and queries for Media automatically include Podcasts — add Newsletter later and existing queries pick it up without changes. inverseOf lets you declare that manages is the inverse of reportsTo — create one, get the other for free. equivalentTo maps identical concepts across systems. disjointWith lets you declare that FullTime and Contractor are mutually exclusive — the schema enforces it, no application-level validation drifting out of sync. broader gives you concept hierarchies for taxonomies and tagging.
These aren’t academic features borrowed from OWL/RDF and awkwardly grafted onto TypeScript. They’re practical tools for moving domain rules out of scattered procedural code and into a single place where both your application and your LLMs can see them.
Extending the model
Want team-level inheritance? Add a Team node and a memberOf edge, make Team a valid source for hasPermission, and the traversal naturally expands. No new CTE. No second recursive branch. No new copy of the inheritance rules in your mobile app.
Want to know why a user has a permission? The query result already contains the full traversal path — which user, which role, which permission, through which implication. That’s not an afterthought; it’s the data model.
RAG that actually explains itself
Vector search is the default retrieval strategy for RAG. Embed your documents, embed the query, find the closest chunks. It works well for surface-level similarity. It falls apart when the user types an exact phrase, a proper noun, a SKU, a rare term — embeddings are great at “vibes,” bad at identifiers. It also falls apart when the answer requires connecting information across documents or explaining why two things are related.
“What companies has Elon Musk founded?” Vector similarity will surface chunks that mention Elon Musk. But the answer requires traversing relationships — from the entity “Elon Musk” through “founded” edges to company entities — regardless of which chunks those facts appear in. A flat vector index doesn’t have that structure. A graph does.
TypeGraph gives you three retrieval primitives in the same store: BM25 fulltext (FTS5 on SQLite, tsvector + GIN on Postgres), vector similarity (sqlite-vec or pgvector), and graph traversal. Hybrid RAG means using all three.
Here’s the schema:
import { defineNode, searchable, embedding } from "@nicia-ai/typegraph";
const Document = defineNode("Document", {
schema: z.object({
title: searchable({ language: "english" }),
source: z.string(),
}),
});
const Chunk = defineNode("Chunk", {
schema: z.object({
text: searchable({ language: "english" }),
position: z.number().int(),
embedding: embedding(1536),
}),
});
const Entity = defineNode("Entity", {
schema: z.object({
name: searchable({ language: "english" }),
type: z.enum(["person", "organization", "concept", "product"]),
}),
});
const containsChunk = defineEdge("containsChunk");
const mentions = defineEdge("mentions");
const relatesTo = defineEdge("relatesTo", {
schema: z.object({
relationship: z.string(), // "founded", "competes_with", etc.
}),
});
searchable() marks fields for fulltext indexing. embedding(1536) declares a vector column with the right backend-specific index. Both stay in sync with node data automatically through every create, update, and delete — no separate pipeline to maintain.
Hybrid retrieval in one query
Fuse BM25 and vector similarity with Reciprocal Rank Fusion, then traverse into the graph for context:
async function hybridContext(query: string) {
const queryEmbedding = await generateEmbedding(query);
return store
.query()
.from("Chunk", "c")
.whereNode("c", (c) => c.$fulltext.matches(query, 40).and(c.embedding.similarTo(queryEmbedding, 40, { metric: "cosine" })))
.traverse("mentions", "m")
.to("Entity", "e")
.traverse("containsChunk", "d_edge", { direction: "in", from: "c" })
.to("Document", "d")
.select((ctx) => ({
text: ctx.c.text,
source: ctx.d.title,
entityName: ctx.e.name,
entityType: ctx.e.type,
}))
.limit(10)
.execute();
}
Three things are happening in that query. $fulltext.matches() runs the BM25 search. .similarTo() runs the vector search. Both return ranked candidate sets that get fused with RRF inside the same SQL statement. Then the traversal fans out — from: "c" means the second traversal branches from the same Chunk, so each result carries both its mentioned Entities and its parent Document.
The result isn’t “here are chunks that vaguely match.” It’s “here are chunks that scored well on both exact-match and semantic similarity, here’s where they came from, and here are the entities they reference.” That’s context an LLM can actually reason about.
When you need more control over the fusion, there’s a store-level API with tunable weights:
const hits = await store.search.hybrid("Chunk", {
limit: 10,
vector: {
fieldPath: "embedding",
queryEmbedding: await generateEmbedding(query),
metric: "cosine",
k: 50,
},
fulltext: {
query,
k: 50,
includeSnippets: true,
},
fusion: {
method: "rrf",
k: 60,
weights: { vector: 1.0, fulltext: 1.5 },
},
});
Bump the fulltext weight when users are typing proper nouns. Bump the vector weight when they’re asking fuzzy conceptual questions. Same data, two knobs, no new infrastructure.
Multi-hop entity traversal
Once you have entity relationships in a graph, multi-hop questions become straightforward:
async function findRelatedEntities(entityName: string, maxHops = 2) {
return store
.query()
.from("Entity", "e")
.whereNode("e", (e) => e.name.eq(entityName))
.traverse("relatesTo", "r")
.recursive({ maxHops, depth: "depth", cyclePolicy: "prevent" })
.to("Entity", "related")
.select((ctx) => ({
name: ctx.related.name,
depth: ctx.depth,
}))
.execute();
}
“Find everything within 2 hops of Elon Musk” — the depth tracking tells you how far each result is from the starting entity, and cyclePolicy: "prevent" stops the traversal from looping back on itself. Try writing that as a SQL CTE that also tracks depth, handles cycles, and returns typed results. It’s possible. It’s not fun.
Why this matters for entity resolution
If you’ve built knowledge graphs from unstructured data — LLM-extracted entities from documents, transcripts, web scrapes — you know the hard part isn’t extraction. It’s deciding when two entities are the same.
“Apple” in a technology article and “Apple” in a recipe blog are not the same entity. But “Apple Inc.” and “Apple” in the same SEC filing probably are. Getting this wrong poisons your graph: either you split entities that should be merged (losing connections) or you merge entities that should be distinct (creating false relationships).
TypeGraph handles this with unique constraints on nodes:
const graph = defineGraph({
nodes: {
Entity: {
type: Entity,
unique: [
{
name: "entity_name_type",
fields: ["name", "type"],
scope: "kind",
},
],
},
},
// ...
});
The constraint deduplicates on (name, type) — so ("Apple", "organization") and ("Apple", "fruit") coexist, but you can’t accidentally create two ("Apple", "organization") nodes. And when ingesting new data, getOrCreateByConstraint handles the upsert atomically:
const { node, action } = await store.nodes.Entity.getOrCreateByConstraint("entity_name_type", { name: "Apple", type: "organization" }, { ifExists: "return" });
// action: "created" | "found" | "updated" | "resurrected"
You always know whether you created a new entity or found an existing one. No INSERT ... ON CONFLICT gymnastics. No separate “did it already exist?” query. This is the kind of operation that’s trivial to describe but surprisingly annoying to get right in raw SQL, especially when you’re processing documents concurrently.
Pulling a neighborhood for context
Once you’ve retrieved the right chunks, the next question is usually “what else does the LLM need to reason about these?” For the “load an entity and everything it touches” pattern — the detail page, the agent context bundle, the tool-call payload — store.subgraph() pulls a typed neighborhood in one SQL round trip:
const context = await store.subgraph(entityId, {
edges: ["mentions", "relatesTo", "containsChunk"],
maxDepth: 2,
project: {
nodes: {
Entity: ["name", "type"],
Chunk: ["text"],
Document: ["title", "source"],
},
},
});
// Iterate with full type narrowing per kind
for (const node of context.nodes.values()) {
if (node.kind === "Entity") {
const related = context.adjacency.get(node.id)?.get("relatesTo") ?? [];
// ...
}
}
Under the hood it’s still one WITH RECURSIVE CTE, but the result comes back pre-indexed: nodes keyed by ID, edges organized into forward and reverse adjacency maps. The project option extracts only the fields you need via json_extract() / JSONB paths, which matters when you’re streaming the result into a prompt and every extra kilobyte is tokens you’re paying for.
Compare that to the alternative: N parallel findFrom calls for each edge type, then getByIds calls to hydrate them, then manual bookkeeping to build the adjacency you actually wanted. One query versus O(edges) round trips, and the types flow through the whole thing.
Connectivity questions in one line
The long tail of graph questions is surprisingly narrow. “Is A connected to B?” “What’s the shortest path?” “Who’s within 2 hops?” “How many incident edges does this node have?” Writing those as recursive CTEs is repetitive, so TypeGraph ships them as a small facade:
// Is Alice connected to Bob at all? Short-circuits on first hit.
await store.algorithms.canReach(alice, bob, { edges: ["knows"] });
// What's the fewest-hop route between them?
const path = await store.algorithms.shortestPath(alice, bob, {
edges: ["knows", "collaboratesWith"],
maxHops: 6,
});
// Everything Alice can reach within 3 hops, annotated with depth.
await store.algorithms.reachable(alice, { edges: ["knows"], maxHops: 3 });
// Alice's two-hop neighborhood.
await store.algorithms.neighbors(alice, { edges: ["knows"], depth: 2 });
// Count her incident edges.
await store.algorithms.degree(alice, { edges: ["knows"], direction: "both" });
Each call compiles to a single recursive CTE (or a plain COUNT for degree). They honor the same temporal semantics as the rest of the store — pass temporalMode: "asOf" with a timestamp and you get the connectivity of the graph as it existed at that point in time. That’s “which agents could this user invoke on March 15th?” as a single function call.
The algorithms return lightweight { id, kind, depth } records — use them for fast checks and ranking, then hydrate full nodes with getByIds or subgraph() when you need the data.
Recommendations in six lines
Put the query builder and the graph model together and you get things like friend-of-friend recommendations — the kind of thing that sounds simple until you try to write the SQL:
const recommendations = await store
.query()
.from("User", "me")
.whereNode("me", (u) => u.id.eq(currentUserId))
.traverse("follows", "f1")
.to("User", "friend")
.traverse("follows", "f2")
.to("User", "fof")
.select((ctx) => ({ handle: ctx.fof.handle }))
.limit(10)
.execute();
Start at me, traverse to people I follow, traverse again to people they follow. Six lines, fully typed, readable. The SQL equivalent is a self-join with a NOT IN subquery to exclude existing connections, and it gets worse the moment you want to weight by mutual connection count or filter by activity.
Type safety through the whole stack
One thing I was determined to get right: types shouldn’t stop at the schema definition.
TypeGraph uses a single Zod schema as the source of truth for each node and edge. That schema drives runtime validation, database storage, and TypeScript type inference. When you write a query, the result type is inferred from your select clause. When you traverse an edge, autocomplete shows you only the valid target node types. When you filter on a property, the compiler knows the property’s type. When you project fields on a subgraph() call, accessing an omitted field is a compile-time error.
// The compiler knows this returns { person: string; project: string }[]
const results = await store
.query()
.from("Person", "p")
.traverse("worksOn", "e")
.to("Project", "proj")
.select((ctx) => ({
person: ctx.p.name, // TS knows this is string
project: ctx.proj.name, // TS knows this is string
}))
.execute();
This seems like table stakes for a TypeScript library, but compare it with writing SQL strings or even most ORM query builders. The error feedback loop is fundamentally different: mistakes surface at compile time in your editor, not at runtime in production when a query returns an unexpected shape.
What TypeGraph is not
Honest boundaries matter more than ambitious claims.
TypeGraph is not a graph database. It’s a graph modeling and query layer that compiles to SQL. It runs on your existing Postgres or SQLite. That gives you unified transactions, no data syncing, and zero new infrastructure — but it also means you inherit the performance characteristics of your underlying database.
For the vast majority of graph workloads — thousands to millions of nodes, traversals up to 5-10 hops deep, the kind of graph queries that show up in application code — this is more than sufficient. The query compiler generates efficient SQL, and your database’s query planner does the rest.
Where it’s not the right tool:
- Billions of edges — at planetary scale, you need a distributed graph engine
- Advanced graph analytics — PageRank, community detection, weighted shortest path (Dijkstra / A*), centrality beyond degree. TypeGraph ships shortest path, reachability, neighborhoods, and degree; for the rest, export edges via
.query().traverse()orstore.subgraph()and run an in-memory library like graphology - Real-time streaming graph data — if your graph is updating at high velocity and you need sub-millisecond traversal, you want an in-memory graph engine
TypeGraph is for the much more common case: you have a domain with meaningful relationships, you’re already running a SQL database, and you want to query those relationships — with fulltext, vectors, ontology, and algorithms — without fighting the relational model.
Getting started
npm install @nicia-ai/typegraph zod drizzle-orm better-sqlite3
Define your nodes and edges, create a graph, create a store, and start writing queries. The getting started guide walks through a complete working example. The recipes cover common patterns — RBAC, social graphs, tagging, tree navigation, content versioning. The examples go deeper with full implementations for RAG, product catalogs, and more.
If you’re currently maintaining recursive CTEs, polymorphic join tables, or hand-rolled hierarchy traversals — or if you’ve been putting off a feature because “that’s really a graph problem” — give it a look.