Concepts
Last updated March 21, 2026
This page explains the core ideas behind SynthLink — how data moves through the system, what each layer does, and why the API is structured the way it is.
Pipeline
SynthLink processes public data through a four-stage pipeline before it reaches your application. Each stage has a distinct responsibility.
The Crawler fetches raw content from public sources on a fixed schedule and normalizes each item into a structured record. The Database deduplicates records by URL and tracks timestamps. The LLM Enrichment stage processes each new document to generate summaries, keywords, tags, and categories. The REST API exposes the enriched records as read-only JSON.
Processing time
After a document is collected, enrichment typically completes within a few minutes. During periods of high volume, it may take longer. You can check the presence of insight in /api/v1/combined to see whether enrichment has finished for a document.
Failure scenarios
If the crawler fails to reach a source, no new documents are written for that cycle. Existing documents remain unchanged. If enrichment fails after the maximum number of retries, the insight record is marked failed and the document is still accessible via /api/v1/documents — only the insight is missing.
Note:The /status page shows the latest crawler run results and enrichment integrity checks so you can verify the pipeline is healthy.
Data freshness
SynthLink is not a real-time API. Data is collected on a schedule that varies by source. There is always some lag between when a document is published and when it appears in the API.
Update intervals by source
created_at
created_at is set when the document is first ingested and never changes. It tells you when the document entered SynthLink, not when the source published it.
Filtering by freshness
If you need only recent content, filter on the client side after fetching. The API always returns documents in reverse chronological order by default.
const res = await fetch(
"https://synth-link.com/api/v1/documents?limit=50",
{ headers: { "X-SYNTHLINK-KEY": process.env.SYNTHLINK_KEY } }
);
const docs = await res.json();
const oneDayAgo = Date.now() - 24 * 60 * 60 * 1000;
const recent = docs.filter(
(doc) => new Date(doc.created_at).getTime() > oneDayAgo
);Enrichment
Every document is sent to a language model which produces four outputs — a plain-language summary, a list of keywords, semantic tags, and a category label.
Example output
{
"llm_summary": "OpenAI releases GPT-4o, a new multimodal model capable of
reasoning across text, audio, and images with improved latency.",
"keywords": ["gpt-4o", "multimodal", "openai", "model release"],
"tags": ["AI", "language model", "product launch"],
"category": "AI Research",
"source": "openai_news",
"created_at": "2026-03-19T06:01:00Z"
}Status lifecycle
Enrichment is asynchronous. A new document starts with status: pending, transitions to completed when the model finishes, or failedafter the maximum retry count is exceeded.
Status is tracked internally and is not returned in the public insight response. The /api/v1/insights endpoint returns only completed insight records.
Common failure causes
Most failures are caused by documents with very little extractable text — empty pages, paywalled content, or documents in unsupported languages. Transient model errors are retried automatically and rarely result in a permanent failure.
Document and insight
SynthLink separates raw content from enriched content into two concepts. A document is the original collected item. An insight is the LLM output attached to it, linked internally in the enrichment pipeline.
Side by side
{
"title": "GPT-4o System Card",
"url": "https://openai.com/...",
"summary": "OpenAI releases...",
"source": "openai_news",
"content_source": "rss",
"created_at": "2026-03-19T06:00:00Z"
}{
"llm_summary": "OpenAI releases GPT-4o...",
"keywords": ["gpt-4o", "multimodal"],
"tags": ["AI", "product launch"],
"category": "AI Research",
"source": "openai_news",
"created_at": "2026-03-19T06:01:00Z"
}When to use /combined
Use /api/v1/combined when you need both the document and its insight in a single request — for example, when rendering a feed that shows the title, source, and LLM summary together.
Use the separate endpoints when you only need one side, when you want different filters on each, or when you're paginating large result sets and want finer control over each query.
Source identifiers
Every document has a source field that identifies where it was collected from. Use this value with the source query parameter to filter results to a specific source.
Note:New sources are added over time. Check the Sources reference for the full up-to-date list.
Pagination
The data list endpoints (/documents, /insights, /combined) accept a limit parameter that controls the maximum number of records returned per request. The default is 10 and the maximum is 100.
SynthLink does not currently support cursor-based or offset-based pagination. To retrieve large datasets, use a smaller limit and filter by created_at to walk through records in batches.
let before = new Date().toISOString();
const all = [];
while (true) {
const res = await fetch(
`https://synth-link.com/api/v1/documents?limit=100`,
{ headers: { "X-SYNTHLINK-KEY": process.env.SYNTHLINK_KEY } }
);
const batch = await res.json();
if (batch.length === 0) break;
all.push(...batch);
// 마지막 항목의 created_at 기준으로 다음 배치 필터링
before = batch[batch.length - 1].created_at;
if (batch.length < 100) break;
}Note:Cursor-based pagination is planned for a future API version. Check the Changelog for updates.