Scientific - Medical Q&A

Scientific is Captain’s agentic Q&A dataset for the scientific literature. Send a natural-language question and the agent federates across multiple live sources, then returns a synthesized answer with real, verifiable citations.

What Scientific Covers

SourceDescriptionKey Data Points
PubMedNCBI E-utilities over 37M+ biomedical abstractsPMID, title, abstract, journal, year, authors, DOI, PMCID
PMC Full TextOpen-access article bodies, tables, and figures from PubMed CentralSection headings and text, table captions, figure captions
ClinicalTrials.govv2 API over all registered trialsNCT ID, phase, overall status, conditions, interventions, brief summary, start/completion dates
Semantic ScholarEmbedding-backed paper search for true semantic recallPaper ID, title, abstract, year, citation count, venue, authors, DOI, PMID

How It Works

  1. Ask a natural-language question about a medical or biomedical topic
  2. The agent plans which sources to call and fans out up to 6 tool calls in parallel
  3. An answer is synthesized with inline citations to the exact records retrieved
  4. A gaps[] field surfaces what the agent could not find or verify

All requests require authentication via Authorization: Bearer {api_key} and X-Organization-ID headers.

Search Medical Papers

POST /v2/datasets/scientific/medical/ask with a question in the body.

Asking a Question

Get a synthesized answer with cited sources. First invocation of the day can hit a cold Lambda (~60s) + agent loop (~30-60s) — set client timeouts to 180s and guard against non-JSON responses (504/502) so errors surface cleanly:

Python
1import requests
2import uuid
3import json
4
5headers = {
6 "Authorization": f"Bearer {API_KEY}",
7 "X-Organization-ID": ORG_ID,
8 "Content-Type": "application/json",
9 "Idempotency-Key": str(uuid.uuid4())
10}
11
12response = requests.post(
13 f"{BASE_URL}/v2/datasets/scientific/medical/ask",
14 headers=headers,
15 json={
16 "question": "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
17 "max_sources": 10,
18 "include_trials": True,
19 "recency_years": 10
20 },
21 timeout=180.0,
22)
23
24# Surface non-JSON responses (504 gateway timeout, 502 bad gateway, empty body,
25# HTML error pages) instead of blowing up with an opaque JSONDecodeError.
26if response.headers.get("content-type", "").startswith("application/json"):
27 print(json.dumps(response.json(), indent=2))
28else:
29 print(f"HTTP {response.status_code} content-type={response.headers.get('content-type')!r}")
30 print(f"body: {response.text[:500]!r}")
TypeScript
1const controller = new AbortController();
2const timeoutId = setTimeout(() => controller.abort(), 180_000);
3
4const response = await fetch(
5 `${BASE_URL}/v2/datasets/scientific/medical/ask`,
6 {
7 method: "POST",
8 headers: {
9 "Authorization": `Bearer ${API_KEY}`,
10 "X-Organization-ID": ORG_ID,
11 "Content-Type": "application/json",
12 "Idempotency-Key": crypto.randomUUID()
13 },
14 body: JSON.stringify({
15 question: "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
16 max_sources: 10,
17 include_trials: true,
18 recency_years: 10
19 }),
20 signal: controller.signal
21 }
22);
23clearTimeout(timeoutId);
24
25// Surface non-JSON responses (504/502/empty body) instead of crashing.
26const contentType = response.headers.get("content-type") || "";
27if (!contentType.startsWith("application/json")) {
28 const body = await response.text();
29 console.log(`HTTP ${response.status} content-type=${contentType}`);
30 console.log(`body: ${body.slice(0, 500)}`);
31} else {
32 const result = await response.json();
33 console.log(JSON.stringify(result, null, 2));
34}
Ruby
1require 'securerandom'
2
3uri = URI("#{BASE_URL}/v2/datasets/scientific/medical/ask")
4http = Net::HTTP.new(uri.host, uri.port)
5http.use_ssl = true
6http.read_timeout = 180
7
8request = Net::HTTP::Post.new(uri)
9request["Authorization"] = "Bearer #{API_KEY}"
10request["X-Organization-ID"] = ORG_ID
11request["Content-Type"] = "application/json"
12request["Idempotency-Key"] = SecureRandom.uuid
13request.body = {
14 question: "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
15 max_sources: 10,
16 include_trials: true,
17 recency_years: 10
18}.to_json
19
20response = http.request(request)
21content_type = response["content-type"] || ""
22if content_type.start_with?("application/json")
23 puts JSON.pretty_generate(JSON.parse(response.body))
24else
25 puts "HTTP #{response.code} content-type=#{content_type}"
26 puts "body: #{response.body[0, 500]}"
27end

Example Response

1{
2 "domain": "medical",
3 "answer": "Strong evidence supports PARP inhibitor use in BRCA1-mutated breast cancer across the metastatic and adjuvant settings. The OlympiAD phase 3 trial showed olaparib monotherapy extended progression-free survival versus standard chemotherapy in metastatic HER2-negative disease (7.0 vs. 4.2 months; HR 0.58) [PMID:28578601]. EMBRACA demonstrated similar benefit with talazoparib (8.6 vs. 5.6 months; HR 0.54) [PMID:30110579]. In the adjuvant setting, OlympiA [NCT:NCT02000622] randomized 1,836 patients with germline BRCA1/2 mutations to one year of olaparib versus placebo, showing improvements in overall survival (HR 0.68) and invasive disease-free survival [PMID:34081848][PMID:36228963]. Mechanistic studies attribute the response to synthetic lethality between PARP inhibition and BRCA1-driven homologous recombination deficiency [PMID:33710534][PMC:PMC6503629].",
4 "sources": [
5 {
6 "type": "pubmed",
7 "pmid": "28578601",
8 "title": "Olaparib for Metastatic Breast Cancer in Patients with a Germline BRCA Mutation.",
9 "journal": "The New England journal of medicine",
10 "year": 2017,
11 "doi": "10.1056/NEJMoa1706450"
12 },
13 {
14 "type": "pubmed",
15 "pmid": "30110579",
16 "title": "Talazoparib in Patients with Advanced Breast Cancer and a Germline BRCA Mutation.",
17 "journal": "The New England journal of medicine",
18 "year": 2018,
19 "doi": "10.1056/NEJMoa1802905"
20 },
21 {
22 "type": "clinical_trial",
23 "nct_id": "NCT02000622",
24 "title": "Olaparib as Adjuvant Treatment in Patients With Germline BRCA Mutated High Risk HER2 Negative Primary Breast Cancer (OlympiA)",
25 "phase": "PHASE3",
26 "status": "ACTIVE_NOT_RECRUITING"
27 },
28 {
29 "type": "pubmed",
30 "pmid": "34081848",
31 "title": "Adjuvant Olaparib for Patients with BRCA1- or BRCA2-Mutated Breast Cancer.",
32 "journal": "The New England journal of medicine",
33 "year": 2021,
34 "doi": "10.1056/NEJMoa2105215"
35 },
36 {
37 "type": "pmc_full_text",
38 "pmcid": "PMC6503629",
39 "quoted": "PARP inhibitors exploit synthetic lethality in cells with homologous recombination deficiency..."
40 }
41 ],
42 "gaps": [
43 "Long-term (>10 year) survival data for patients on adjuvant olaparib",
44 "Head-to-head comparative efficacy between BRCA1 and BRCA2 subgroups"
45 ],
46 "tool_calls": 5,
47 "latency_ms": 7420
48}

Asking with Streaming

Get real-time progress as the agent calls tools and writes the answer token-by-token:

Python
1import json
2import requests
3
4response = requests.post(
5 f"{BASE_URL}/v2/datasets/scientific/medical/ask",
6 headers={
7 "Authorization": f"Bearer {API_KEY}",
8 "X-Organization-ID": ORG_ID,
9 "Content-Type": "application/json"
10 },
11 json={
12 "question": "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
13 "stream": True
14 },
15 stream=True # Important: enable streaming
16)
17
18# Process streamed response
19for line in response.iter_lines():
20 if line:
21 line_text = line.decode('utf-8')
22 if line_text.startswith('data: '):
23 data = line_text[6:]
24 try:
25 parsed = json.loads(data)
26 if parsed.get('type') == 'tool.start':
27 print(f"\n[tool] {parsed['name']}: {parsed['args']}")
28 elif parsed.get('type') == 'text.delta':
29 print(parsed['data'], end='', flush=True)
30 elif parsed.get('type') == 'stream_complete':
31 print("\nStream complete!")
32 break
33 except json.JSONDecodeError:
34 print(data, end='', flush=True)
TypeScript
1const response = await fetch(
2 `${BASE_URL}/v2/datasets/scientific/medical/ask`,
3 {
4 method: "POST",
5 headers: {
6 "Authorization": `Bearer ${API_KEY}`,
7 "X-Organization-ID": ORG_ID,
8 "Content-Type": "application/json"
9 },
10 body: JSON.stringify({
11 question: "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
12 stream: true
13 })
14 }
15);
16
17// Process streamed response
18const reader = response.body!.getReader();
19const decoder = new TextDecoder();
20
21while (true) {
22 const { done, value } = await reader.read();
23 if (done) break;
24
25 const chunk = decoder.decode(value);
26 const lines = chunk.split('\n');
27
28 for (const line of lines) {
29 if (line.startsWith('data: ')) {
30 const data = line.slice(6);
31 try {
32 const parsed = JSON.parse(data);
33 if (parsed.type === 'tool.start') {
34 console.log(`\n[tool] ${parsed.name}:`, parsed.args);
35 } else if (parsed.type === 'text.delta') {
36 process.stdout.write(parsed.data);
37 } else if (parsed.type === 'stream_complete') {
38 console.log("\nStream complete!");
39 break;
40 }
41 } catch {
42 process.stdout.write(data);
43 }
44 }
45 }
46}
Ruby
1uri = URI("#{BASE_URL}/v2/datasets/scientific/medical/ask")
2http = Net::HTTP.new(uri.host, uri.port)
3http.use_ssl = true
4
5request = Net::HTTP::Post.new(uri)
6request["Authorization"] = "Bearer #{API_KEY}"
7request["X-Organization-ID"] = ORG_ID
8request["Content-Type"] = "application/json"
9request.body = {
10 question: "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
11 stream: true
12}.to_json
13
14# Process streamed response
15http.request(request) do |response|
16 response.read_body do |chunk|
17 chunk.each_line do |line|
18 if line.start_with?("data: ")
19 data = line[6..-1].strip
20 begin
21 parsed = JSON.parse(data)
22 case parsed["type"]
23 when "tool.start" then puts "\n[tool] #{parsed['name']}: #{parsed['args']}"
24 when "text.delta" then print parsed["data"]
25 when "stream_complete"
26 puts "\nStream complete!"
27 break
28 end
29 rescue JSON::ParserError
30 print data
31 end
32 end
33 end
34 end
35end

Event Types

EventWhen it firesPayload
tool.startAgent is about to call a source (PubMed, PMC, CT.gov, or Semantic Scholar){type, tool_call_id, name, args}
tool.endThe tool returned{type, tool_call_id, name, ok, result_summary}
text.deltaAnswer tokens as the model writes them{type, data}
stream_completeTerminal event; stream has ended{type, stats, metadata}metadata carries tool_calls, sources_count, latency_ms

Request Fields

FieldTypeDefaultDescription
questionstring (3–2000 chars)requiredNatural-language biomedical question.
max_sourcesinteger (1–25)10Target number of distinct cited sources in the final answer.
include_trialsbooleantrueWhether the agent may call ClinicalTrials.gov. Set false to skip trials entirely.
recency_yearsinteger (1–50)10Prefer evidence from the last N years where the question allows.
streambooleanfalseIf true, response is text/event-stream; otherwise JSON.

Response Fields

Top-level response

FieldDescriptionExample
domainThe scientific domain resolved for the request."medical"
answerSynthesized answer with inline [PMID:…] / [PMC:…] / [NCT:…] / [S2:…] citation markers."Strong evidence supports..."
sourcesArray of typed source records expanded from the citation markers.See below.
gapsWhat the agent could NOT find or verify.["Long-term (>10 year) survival data..."]
tool_callsNumber of tool calls the agent made.5
latency_msEnd-to-end wall-clock latency.7420

PubMed source (type: "pubmed")

FieldDescriptionExample
pmidPubMed ID"28578601"
titleArticle title"Olaparib for Metastatic Breast Cancer..."
journalJournal name"The New England journal of medicine"
yearPublication year2017
doiDOI, if available"10.1056/NEJMoa1706450"

PMC full-text source (type: "pmc_full_text")

FieldDescriptionExample
pmcidPMC ID"PMC6503629"
quotedShort opening quote from the referenced article"PARP inhibitors exploit synthetic lethality..."

Clinical trial source (type: "clinical_trial")

FieldDescriptionExample
nct_idClinicalTrials.gov identifier"NCT02000622"
titleTrial brief title"Olaparib as Adjuvant Treatment..."
phasePhase(s), comma-joined"PHASE3"
statusOverall status"ACTIVE_NOT_RECRUITING"

Semantic Scholar source (type: "semantic_scholar")

FieldDescriptionExample
paper_idSemantic Scholar paper ID"a1b2c3d4..."
titlePaper title"Mechanisms of PARP inhibitor resistance..."
yearPublication year2023
citation_countTotal citations412

Guarantees and Limits

  • Citations are real. Every [PMID:…], [PMC:…], [NCT:…], and [S2:…] marker in the answer corresponds to a record the agent actually retrieved during the request. The agent is explicitly instructed not to cite from memory.
  • Tool-call budget: 8. The agent loop hard-caps at 8 calls per request; the system prompt asks the model to stay under 6.
  • No shared corpus yet. Every request fetches live from the four upstream APIs. Federation happens in real time.
  • Latency. p50 around 8s; p95 up to 12s when full-text fetches are involved. Set client timeouts ≥ 90s.
  • Gaps are load-bearing. If the agent cannot find evidence at the specificity you asked for, it says so in gaps[] rather than hallucinating. Display that in your UI.

Deprecated URL Aliases

/v2/datasets/pubmed/* and /v2/datasets/medical/* return HTTP 400 with a pointer to /v2/datasets/scientific/medical/ask. Update your clients to use the new URL.