Querying Data

Last updated March 21, 2026

This page explains how to read and combine SynthLink data effectively — which endpoint to call, how to filter results, and how to interpret the fields you receive. It is a usage guide, not a parameter reference.

Choosing an endpoint

SynthLink exposes four read-only endpoints. Start withGET /api/v1/sources to discover available sources and update intervals, then choose the data endpoint that fits your needs.

GET /api/v1/sourcesWhen you need the catalog

Returns the list of sources with descriptions, update intervals, content_source types, and quality metadata. Use this to build filters or validate inputs.

GET /api/v1/documentsWhen you only need source metadata

Returns collected documents with title, URL, summary, content (when available), source, and timestamps. Use this when you do not need enrichment fields, or when you want to check whether a document exists before querying its insight.

GET /api/v1/insightsWhen you only need the insight layer

Returns completed insight records with llm_summary, keywords, tags, category, source, and created_at. Use this when you only need the structured interpretation layer.

GET /api/v1/combinedWhen you need both in one request

Returns document and insight merged into a single object. Use this when rendering feeds, summaries, or any view that needs both source metadata and enrichment fields at the same time.

Filtering results

The data endpoints (/documents, /insights, /combined) accept query parameters for narrowing results. The most useful combinations depend on your access pattern.

Source-based filtering

Use source to limit results to a specific source. This is the most direct way to build a feed for one topic area — for example, all OpenAI announcements or all trending GitHub repositories. If you omit source, the API returns results across all sources. You can use source with /insights as well.

Filter by source

GET /api/v1/documents?source=openai_news&limit=20

Insight availability

The /api/v1/insights endpoint returns only completed insight records. If you need to know whether a document already has an insight, use /api/v1/combined and check whether insight is null.

Completed insights

GET /api/v1/insights?limit=10

Ingestion method filtering

content_source describes how content was collected, not where it came from. Filter by rss, detail, or api when the ingestion method affects how you process the result — for example, when you expect api-sourced content to be more structured than RSS-only content. This parameter is optional; omit it to return all ingestion methods.

API-sourced documents only

GET /api/v1/documents?content_source=api&limit=20

Time-based filtering

Use start_date and end_date to fetch documents inside a time window. This is the right filter for incremental syncs, becausecreated_at represents when the document entered SynthLink.

Documents within a time window

GET /api/v1/documents?start_date=2026-03-20T00:00:00Z&end_date=2026-03-24T00:00:00Z&limit=20

The /api/v1/insights endpoint also supports start_date and end_date on the insight created_at timestamp.

Limiting result size

Use limit to control how many records are returned. The default is 10 and the maximum is 100. Values less than 1 are coerced to 1. Request only what your UI or pipeline actually needs — fetching 100 records when you display 5 wastes quota and increases latency.

Interpreting fields

Several fields carry meanings that are easy to misread without context.

created_at

When the document entered SynthLink — not when the source published it. There is always a gap between these two times, determined by the crawl interval for that source.

content_source

How the content was obtained — rss, detail, or api. This tells you about completeness and reliability of the raw text, not which source the document belongs to.

summary

A raw excerpt from the source — not LLM-generated. This field may be short or inconsistent depending on what the source feed provides.

llm_summary

A generated plain-language summary produced by the insight pipeline. The /insights endpoint returns only completed enrichments. Do not use this as a source of truth — always link back to the original URL.

Access patterns

The following patterns cover most common use cases when building on top of SynthLink.

Source-first exploration

Start by fetching combined records from a specific source, then keep only items that already have an insight. This is useful when you want document metadata and enrichment together without running a separate join.

Source-first

// 1. Fetch combined data for a source
const combined = await fetch("/api/v1/combined?source=arxiv&limit=20");

// 2. Keep items that already have an insight
const enriched = combined.filter((item) => item.insight !== null);

Category-first exploration

Use /combined and filter by category on the client side after fetching. This is the fastest way to build a categorized feed without needing to know which sources cover which topics.

Category-first

const combined = await fetch("/api/v1/combined?limit=50");
const aiResearch = combined.filter(
  (item) => item.insight?.category === "AI Research"
);

Summary feed

Fetch combined records and use llm_summary for display, with a link to the original url. Only render records where insight is present to avoid showing empty summaries.

Summary feed

const feed = combined
  .filter((item) => item.insight !== null)
  .map((item) => ({
    title: item.document.title,
    summary: item.insight.llm_summary,
    url: item.document.url,
    tags: item.insight.tags,
  }));

Freshness-aware fetching

Align your fetch schedule with the source update interval to avoid redundant requests. There is no benefit to querying more often than the slowest source you care about updates.

Freshness-aware

// openai_news updates every 12h — no need to fetch more often
const TWELVE_HOURS = 12 * 60 * 60 * 1000;

async function maybeRefresh(lastFetchedAt: number) {
  if (Date.now() - lastFetchedAt < TWELVE_HOURS) return;
  return fetch("/api/v1/documents?source=openai_news&limit=10");
}

Client-side post-processing

SynthLink returns results in reverse chronological order by default. Most filtering beyond what the query parameters support — category matching, keyword search, deduplication — should happen on the client after fetching.

A few patterns are worth keeping in mind. When displaying multiple sources together, sort by created_at descending to get a unified timeline. When showing enriched content, use /api/v1/insights (which returns completed records only) or filter /api/v1/combined to insight !== null before rendering insight fields. When building search, prefer keywords and tags over llm_summary for matching — they are more consistent and compact.

Warning:Never treat llm_summary as a verbatim quote from the source. It is a generated interpretation. Always provide a link to the original url for verification.

Was this helpful?

Data Availability

Status