Integration Patterns

Last updated March 21, 2026

This page describes how to connect SynthLink data to products and workflows — not how to call the API, but how to structure the integration itself. The goal is to provide a basis for design decisions, not a list of code examples.

Overview

SynthLink is a read-only external data layer. It normalizes publicly available information from multiple sources and exposes it through a consistent API. It is not the primary system of record for your application — it is a supply layer for structured public information that your product or workflow can consume.

This distinction matters for integration design. SynthLink works best when treated as an upstream data source that your system polls, caches, and interprets — not as a real-time event stream or a database you query on demand for every user interaction. The right integration structure depends on what your product needs from the data and how much freshness matters for your use case.

Choosing an integration pattern

Before deciding how to integrate, it helps to identify which endpoint model fits your product structure.

Documents-first

You want direct control over source metadata and plan to apply your own post-processing or enrichment. Use /documents as the primary source and fetch insights separately when needed.

Insights-first

You want to consume the structured insight layer directly and already have a way to retrieve or cache document metadata. Use /insights as the primary source and join to document data on demand.

Combined

You need both document metadata and enrichment fields in one response — for example, when rendering a card UI that shows title, source, summary, and category together. Use /combined as the primary source.

The choice affects more than response shape — it determines where transformation logic lives, how you handle partial records, and what your application looks like when enrichment is delayed or incomplete.

Feed experiences

The most common integration type is a content feed — a UI that displays recent documents from one or more sources, grouped or filtered by category, with source attribution and a link to the original.

For this pattern, /combined is usually the right choice. The fields map naturally to feed UI elements.

FieldUI role
sourceSource badge or tab label
created_atSort order and recency indicator
llm_summaryCard preview text
categoryTopic filter or grouping
tagsTag display or secondary filter
urlLink to the original source

Only render insight fields when insight is present. When an insight is still missing, show the raw summary from the document as a fallback rather than leaving the card empty.

Feed with fallback summary
const cards = combined.map((item) => ({
  title: item.document.title,
  preview:
    item.insight
      ? item.insight.llm_summary
      : item.document.summary,
  source: item.document.source,
  url: item.document.url,
  category: item.insight?.category ?? null,
}));

Background sync

Many integrations benefit from fetching SynthLink data periodically and storing it locally — rather than querying the API on every user request. This pattern reduces latency, avoids redundant API calls, and decouples your application's read path from SynthLink's update schedule.

Because SynthLink is a periodic collection system rather than a real-time stream, polling more frequently than the source update interval provides no benefit. Aligning your sync schedule with the source update interval is both more efficient and more accurate.

Background sync aligned to source update interval
const SOURCE_INTERVALS: Record<string, number> = {
  openai_news: 12 * 60 * 60 * 1000,  // 12h
  nasa_news:   24 * 60 * 60 * 1000,  // 24h
  github_trending: 6 * 60 * 60 * 1000,  // 6h
  arxiv:       12 * 60 * 60 * 1000,  // 12h
  hn:           3 * 60 * 60 * 1000,  // 3h
};

async function syncSource(source: string, lastSyncedAt: number) {
  const interval = SOURCE_INTERVALS[source];
  if (Date.now() - lastSyncedAt < interval) return;

  const res = await fetch(
    `/api/v1/combined?source=${source}&limit=50`,
    { headers: { "X-SYNTHLINK-KEY": process.env.SYNTHLINK_KEY } }
  );
  const records = await res.json();
  await saveToLocalStore(source, records);
}

Note:Background sync also makes it easier to handle rate limits gracefully — your application reads from a local cache and the sync job handles retries independently.

Internal tools and dashboards

SynthLink works well as a backend for internal monitoring tools, research dashboards, and curation interfaces. In these contexts, the value is not just access to the data but access to a normalized view across multiple public sources — so operators do not need to read each source individually.

Common patterns for internal tools include monitoring the latest documents per source to detect new developments, reviewing all documents in a specific category, and inspecting failed insights alongside their source documents to assess enrichment quality. For these use cases, the/documents and /insights endpoints are often more useful separately than /combined, because they allow different filters on each layer.

Internal tools can also benefit from tracking enrichment state by comparing /documents to /combined (missing insights show up as null), and by checking the status page for pipeline health signals.

Agent and research workflows

SynthLink's separation of documents and insights makes it a natural input layer for automated analysis pipelines and agent-based workflows. Documents provide the raw signal — title, URL, source, and raw text. Insights provide a structured first-pass interpretation — category, keywords, tags, and a concise summary.

In agent workflows, the most reliable pattern is to use insight fields for candidate selection and triage, then retrieve the original document URL for verification before acting on the result. Insight fields are useful for narrowing a large set of documents down to a relevant subset, but they should not be treated as a final source of truth.

Agent triage pattern
// 1. Fetch recent combined records
const records = await fetchCombined({ limit: 100 });

// 2. Use insight fields to narrow candidates
const candidates = records.filter(
  (r) =>
    r.insight &&
    r.insight.category === "AI Research" &&
    r.insight.keywords.includes("safety")
);

// 3. Pass original URLs to the next agent step for verification
const urls = candidates.map((r) => r.document.url);

Warning:Do not use llm_summary as a factual claim in agent outputs. Always trace back to the original url before presenting content as authoritative.

Triage and alerting

Some workflows use SynthLink as a signal source — detecting when new documents matching certain criteria arrive and triggering a downstream action. This is a pull-based workflow, not a push notification system. SynthLink does not emit events or webhooks, so alerting-style integrations work by comparing the current API response against a previously stored state.

A reliable approach is to store the created_at of the most recently processed document and, on each sync, fetch records newer than that timestamp. Records that appear for the first time are treated as new signals and passed to the downstream action.

New document detection
async function checkForNew(lastCreatedAt: string) {
  const records = await fetchCombined({
    source: "openai_news",
    limit: 20,
  });

  const newRecords = records.filter(
    (r) => r.document.created_at > lastCreatedAt
  );

  if (newRecords.length > 0) {
    await triggerDownstream(newRecords);
  }

  return records[0]?.document.created_at ?? lastCreatedAt;
}

Freshness and caching

SynthLink data has a natural update interval determined by each source's crawl interval. Most integrations should match their cache invalidation strategy to this interval rather than polling continuously.

For server-rendered applications, use Next.js revalidate or equivalent framework-level caching to avoid redundant API calls. For background sync jobs, schedule them no more frequently than the fastest source you care about — currently Hacker News at 3 hours.

Caching also smooths over temporary API unavailability. If SynthLink returns an error, a stale cache is almost always better than an empty state for end users. Design your integration to serve stale data rather than fail visibly when the upstream is temporarily degraded.

Source attribution

SynthLink normalizes content from public sources, but the authoritative version of every document lives at its original URL. Any integration that exposes SynthLink data to end users should preserve the source attribution and link back to the original.

At minimum, every document displayed to a user should include the source identifier and the url field as a navigable link. This applies regardless of how much transformation or summarization happens between the API response and the final UI.

Note:Preserving source attribution is not just a best practice — it is the correct way to use generated summaries. llm_summary is a convenience layer for discovery, not a replacement for the original content.

Handling partial records

SynthLink data becomes available progressively. A document may exist before its insight is ready. A document may have a summary but no full content. An insight may be in a pending or failed state when your application first encounters it.

Integrations that require complete records before rendering will produce a worse experience than integrations designed to handle partial data gracefully. The practical implication is that your rendering logic should treat insight fields as optional enhancements rather than required inputs.

Document-first, insight-optional

Show the document immediately when it is available. Enrich the display with insight fields when an insight is present. This is the most resilient pattern and works well for feeds and dashboards.

Insight-required

Only surface records where an insight is present. Simpler to implement but introduces latency — documents may be available for minutes or hours before appearing in your UI. Works well for automated pipelines that need complete records.

Was this helpful?