Insights
Last updated March 21, 2026
SynthLink Insights are the insight layer generated after a document is collected. While crawlers store normalized source documents, the insight pipeline turns each document into a compact, structured interpretation that is easier to search, filter, and consume in downstream applications.
What insights are
Insights do not replace the original document. They sit on top of it and provide a machine-friendly summary of what the document is about, which themes it contains, and how it can be categorized.
Insights are generated per document and exposed as a standalone insight record. The fields produced by the insight pipeline are consistent across all sources regardless of where the original content came from.
{
"llm_summary": string, // concise plain-language summary
"keywords": string[], // key terms extracted from the document
"tags": string[], // semantic topic tags
"category": string, // top-level category label
"source": string, // document source
"created_at": string // ISO 8601 timestamp
}Why they exist
Raw source documents are useful for traceability, but they are often too long, inconsistent, or source-specific to use directly in application logic. The insight layer exists to give every document a consistent analytical shape.
- Render concise feeds and summaries without processing raw content
- Filter insights by category or keyword across all sources
- Build search and recommendation features on top of structured fields
- Consume multiple sources through a common interpretation layer
Insight pipeline
The insight pipeline starts after a document has been written to the documents table. It selects either new documents that do not yet have an insight record, or previously failed insight jobs that are eligible for retry.
For each target document, the pipeline chooses the best available source text — content when present, falling back to summary when full content is not available. This ensures documents collected from different sources can pass through the same enrichment flow.
documents table → insight-worker selects unprocessed or retryable documents → chooses content (preferred) or summary (fallback) as input → produces llm_summary, keywords, tags, category → writes to insights table with document_id link
Note:The insight pipeline runs on a recurring schedule, decoupled from the crawl cycle. A document may appear in the documents API before its insight is ready.
What the API returns
Insights are exposed through two read-only endpoints. Use /api/v1/insights when you only need insight records. Use /api/v1/combined when you need the source document and its insight together in a single response.
/api/v1/insights
Returns insight records only. Useful when you already have document data and need the insight layer.
/api/v1/combined
Returns document and insight merged into one payload. Useful when building feeds that show both source and analysis.
Freshness and timing
Insight generation is asynchronous. A document may appear in the documents API before its insight is available — this is expected behavior, not an error. In normal operation, insights are generated shortly after ingestion, but availability depends on queue volume and retry state.
The worker processes a bounded number of items per cycle and runs repeatedly on a schedule rather than inline with crawling. This keeps source collection and analysis decoupled, so a slow enrichment queue does not delay document availability.
Note:If you need to check whether an insight is ready, query /api/v1/combined and check whether the insight field is null. The /api/v1/insights endpoint returns completed insight records only.
Failure and retry
Insight generation is designed as a retryable background job. If processing fails for a temporary reason, the job is retried with backoff at 1, 5, and 15 minute intervals up to a maximum of 3 attempts.
Some failure types are classified as non-retryable — for example, when the source document has no usable text, or when the document record is missing. These are marked failed immediately without further retries.
Importantly, the document remains accessible even when its insight is missing or failed. The insight layer is additive — document availability is never blocked by enrichment state.
Usage notes
Insights should be treated as a convenience layer for discovery and application logic, not as a replacement for the original source. If precision matters, use insight fields for filtering and triage, then refer back to the source document URL for final verification.
The most useful pattern for building a document feed is straightforward.
Fetch recent documents or combined records from /api/v1/combined
Use category, keywords, and tags to narrow the set
Use llm_summary for preview rendering
Use the original document url for full verification
Warning:llm_summary is a generated interpretation, not a verbatim excerpt. Always link users to the original source URL for authoritative content.