Querying Data
Last updated March 21, 2026
This page explains how to read and combine SynthLink data effectively — which endpoint to call, how to filter results, and how to interpret the fields you receive. It is a usage guide, not a parameter reference.
Choosing an endpoint
SynthLink exposes four read-only endpoints. Start withGET /api/v1/sources to discover available sources and update intervals, then choose the data endpoint that fits your needs.
GET /api/v1/sourcesWhen you need the catalogReturns the list of sources with descriptions, update intervals, content_source types, and quality metadata. Use this to build filters or validate inputs.
GET /api/v1/documentsWhen you only need source metadataReturns collected documents with title, URL, summary, content (when available), source, and timestamps. Use this when you do not need enrichment fields, or when you want to check whether a document exists before querying its insight.
GET /api/v1/insightsWhen you only need the insight layerReturns completed insight records with llm_summary, keywords, tags, category, source, and created_at. Use this when you only need the structured interpretation layer.
GET /api/v1/combinedWhen you need both in one requestReturns document and insight merged into a single object. Use this when rendering feeds, summaries, or any view that needs both source metadata and enrichment fields at the same time.
Filtering results
The data endpoints (/documents, /insights, /combined) accept query parameters for narrowing results. The most useful combinations depend on your access pattern.
Source-based filtering
Use source to limit results to a specific source. This is the most direct way to build a feed for one topic area — for example, all OpenAI announcements or all trending GitHub repositories. If you omit source, the API returns results across all sources. You can use source with /insights as well.
GET /api/v1/documents?source=openai_news&limit=20
Insight availability
The /api/v1/insights endpoint returns only completed insight records. If you need to know whether a document already has an insight, use /api/v1/combined and check whether insight is null.
GET /api/v1/insights?limit=10
Ingestion method filtering
content_source describes how content was collected, not where it came from. Filter by rss, detail, or api when the ingestion method affects how you process the result — for example, when you expect api-sourced content to be more structured than RSS-only content. This parameter is optional; omit it to return all ingestion methods.
GET /api/v1/documents?content_source=api&limit=20
Time-based filtering
Use start_date and end_date to fetch documents inside a time window. This is the right filter for incremental syncs, becausecreated_at represents when the document entered SynthLink.
GET /api/v1/documents?start_date=2026-03-20T00:00:00Z&end_date=2026-03-24T00:00:00Z&limit=20
The /api/v1/insights endpoint also supports start_date and end_date on the insight created_at timestamp.
Limiting result size
Use limit to control how many records are returned. The default is 10 and the maximum is 100. Values less than 1 are coerced to 1. Request only what your UI or pipeline actually needs — fetching 100 records when you display 5 wastes quota and increases latency.
Interpreting fields
Several fields carry meanings that are easy to misread without context.
created_atWhen the document entered SynthLink — not when the source published it. There is always a gap between these two times, determined by the crawl interval for that source.
content_sourceHow the content was obtained — rss, detail, or api. This tells you about completeness and reliability of the raw text, not which source the document belongs to.
summaryA raw excerpt from the source — not LLM-generated. This field may be short or inconsistent depending on what the source feed provides.
llm_summaryA generated plain-language summary produced by the insight pipeline. The /insights endpoint returns only completed enrichments. Do not use this as a source of truth — always link back to the original URL.
Access patterns
The following patterns cover most common use cases when building on top of SynthLink.
Source-first exploration
Start by fetching combined records from a specific source, then keep only items that already have an insight. This is useful when you want document metadata and enrichment together without running a separate join.
// 1. Fetch combined data for a source
const combined = await fetch("/api/v1/combined?source=arxiv&limit=20");
// 2. Keep items that already have an insight
const enriched = combined.filter((item) => item.insight !== null);Category-first exploration
Use /combined and filter by category on the client side after fetching. This is the fastest way to build a categorized feed without needing to know which sources cover which topics.
const combined = await fetch("/api/v1/combined?limit=50");
const aiResearch = combined.filter(
(item) => item.insight?.category === "AI Research"
);Summary feed
Fetch combined records and use llm_summary for display, with a link to the original url. Only render records where insight is present to avoid showing empty summaries.
const feed = combined
.filter((item) => item.insight !== null)
.map((item) => ({
title: item.document.title,
summary: item.insight.llm_summary,
url: item.document.url,
tags: item.insight.tags,
}));Freshness-aware fetching
Align your fetch schedule with the source update interval to avoid redundant requests. There is no benefit to querying more often than the slowest source you care about updates.
// openai_news updates every 12h — no need to fetch more often
const TWELVE_HOURS = 12 * 60 * 60 * 1000;
async function maybeRefresh(lastFetchedAt: number) {
if (Date.now() - lastFetchedAt < TWELVE_HOURS) return;
return fetch("/api/v1/documents?source=openai_news&limit=10");
}Client-side post-processing
SynthLink returns results in reverse chronological order by default. Most filtering beyond what the query parameters support — category matching, keyword search, deduplication — should happen on the client after fetching.
A few patterns are worth keeping in mind. When displaying multiple sources together, sort by created_at descending to get a unified timeline. When showing enriched content, use /api/v1/insights (which returns completed records only) or filter /api/v1/combined to insight !== null before rendering insight fields. When building search, prefer keywords and tags over llm_summary for matching — they are more consistent and compact.
Warning:Never treat llm_summary as a verbatim quote from the source. It is a generated interpretation. Always provide a link to the original url for verification.