Scientific - Medical Q&A

Scientific is Captain’s agentic Q&A dataset for the scientific literature. Send a natural-language question and the agent federates across multiple live sources, then returns a synthesized answer with real, verifiable citations.

What Scientific Covers

Source	Description	Key Data Points
PubMed	NCBI E-utilities over 37M+ biomedical abstracts	PMID, title, abstract, journal, year, authors, DOI, PMCID
PMC Full Text	Open-access article bodies, tables, and figures from PubMed Central	Section headings and text, table captions, figure captions
ClinicalTrials.gov	v2 API over all registered trials	NCT ID, phase, overall status, conditions, interventions, brief summary, start/completion dates
Semantic Scholar	Embedding-backed paper search for true semantic recall	Paper ID, title, abstract, year, citation count, venue, authors, DOI, PMID

How It Works

Ask a natural-language question about a medical or biomedical topic
The agent plans which sources to call and fans out up to 6 tool calls in parallel
An answer is synthesized with inline citations to the exact records retrieved
A gaps[] field surfaces what the agent could not find or verify

All requests require authentication via Authorization: Bearer {api_key} and X-Organization-ID headers.

Search Medical Papers

POST /v2/datasets/scientific/medical/ask with a question in the body.

Asking a Question

Get a synthesized answer with cited sources. First invocation of the day can hit a cold Lambda (~60s) + agent loop (~30-60s) — set client timeouts to 180s and guard against non-JSON responses (504/502) so errors surface cleanly:

Python

1 import requests
2 import uuid
3 import json
4 
5 headers = {
6     "Authorization": f"Bearer {API_KEY}",
7     "X-Organization-ID": ORG_ID,
8     "Content-Type": "application/json",
9     "Idempotency-Key": str(uuid.uuid4())
10 }
11 
12 response = requests.post(
13     f"{BASE_URL}/v2/datasets/scientific/medical/ask",
14     headers=headers,
15     json={
16         "question": "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
17         "max_sources": 10,
18         "include_trials": True,
19         "recency_years": 10
20     },
21     timeout=180.0,
22 )
23 
24 # Surface non-JSON responses (504 gateway timeout, 502 bad gateway, empty body,
25 # HTML error pages) instead of blowing up with an opaque JSONDecodeError.
26 if response.headers.get("content-type", "").startswith("application/json"):
27     print(json.dumps(response.json(), indent=2))
28 else:
29     print(f"HTTP {response.status_code}  content-type={response.headers.get('content-type')!r}")
30     print(f"body: {response.text[:500]!r}")

TypeScript

1 const controller = new AbortController();
2 const timeoutId = setTimeout(() => controller.abort(), 180_000);
3 
4 const response = await fetch(
5   `${BASE_URL}/v2/datasets/scientific/medical/ask`,
6   {
7     method: "POST",
8     headers: {
9       "Authorization": `Bearer ${API_KEY}`,
10       "X-Organization-ID": ORG_ID,
11       "Content-Type": "application/json",
12       "Idempotency-Key": crypto.randomUUID()
13     },
14     body: JSON.stringify({
15       question: "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
16       max_sources: 10,
17       include_trials: true,
18       recency_years: 10
19     }),
20     signal: controller.signal
21   }
22 );
23 clearTimeout(timeoutId);
24 
25 // Surface non-JSON responses (504/502/empty body) instead of crashing.
26 const contentType = response.headers.get("content-type") || "";
27 if (!contentType.startsWith("application/json")) {
28   const body = await response.text();
29   console.log(`HTTP ${response.status}  content-type=${contentType}`);
30   console.log(`body: ${body.slice(0, 500)}`);
31 } else {
32   const result = await response.json();
33   console.log(JSON.stringify(result, null, 2));
34 }

Ruby

1 require 'securerandom'
2 
3 uri = URI("#{BASE_URL}/v2/datasets/scientific/medical/ask")
4 http = Net::HTTP.new(uri.host, uri.port)
5 http.use_ssl = true
6 http.read_timeout = 180
7 
8 request = Net::HTTP::Post.new(uri)
9 request["Authorization"] = "Bearer #{API_KEY}"
10 request["X-Organization-ID"] = ORG_ID
11 request["Content-Type"] = "application/json"
12 request["Idempotency-Key"] = SecureRandom.uuid
13 request.body = {
14   question: "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
15   max_sources: 10,
16   include_trials: true,
17   recency_years: 10
18 }.to_json
19 
20 response = http.request(request)
21 content_type = response["content-type"] || ""
22 if content_type.start_with?("application/json")
23   puts JSON.pretty_generate(JSON.parse(response.body))
24 else
25   puts "HTTP #{response.code}  content-type=#{content_type}"
26   puts "body: #{response.body[0, 500]}"
27 end

Example Response

1 {
2   "domain": "medical",
3   "answer": "Strong evidence supports PARP inhibitor use in BRCA1-mutated breast cancer across the metastatic and adjuvant settings. The OlympiAD phase 3 trial showed olaparib monotherapy extended progression-free survival versus standard chemotherapy in metastatic HER2-negative disease (7.0 vs. 4.2 months; HR 0.58) [PMID:28578601]. EMBRACA demonstrated similar benefit with talazoparib (8.6 vs. 5.6 months; HR 0.54) [PMID:30110579]. In the adjuvant setting, OlympiA [NCT:NCT02000622] randomized 1,836 patients with germline BRCA1/2 mutations to one year of olaparib versus placebo, showing improvements in overall survival (HR 0.68) and invasive disease-free survival [PMID:34081848][PMID:36228963]. Mechanistic studies attribute the response to synthetic lethality between PARP inhibition and BRCA1-driven homologous recombination deficiency [PMID:33710534][PMC:PMC6503629].",
4   "sources": [
5     {
6       "type": "pubmed",
7       "pmid": "28578601",
8       "title": "Olaparib for Metastatic Breast Cancer in Patients with a Germline BRCA Mutation.",
9       "journal": "The New England journal of medicine",
10       "year": 2017,
11       "doi": "10.1056/NEJMoa1706450"
12     },
13     {
14       "type": "pubmed",
15       "pmid": "30110579",
16       "title": "Talazoparib in Patients with Advanced Breast Cancer and a Germline BRCA Mutation.",
17       "journal": "The New England journal of medicine",
18       "year": 2018,
19       "doi": "10.1056/NEJMoa1802905"
20     },
21     {
22       "type": "clinical_trial",
23       "nct_id": "NCT02000622",
24       "title": "Olaparib as Adjuvant Treatment in Patients With Germline BRCA Mutated High Risk HER2 Negative Primary Breast Cancer (OlympiA)",
25       "phase": "PHASE3",
26       "status": "ACTIVE_NOT_RECRUITING"
27     },
28     {
29       "type": "pubmed",
30       "pmid": "34081848",
31       "title": "Adjuvant Olaparib for Patients with BRCA1- or BRCA2-Mutated Breast Cancer.",
32       "journal": "The New England journal of medicine",
33       "year": 2021,
34       "doi": "10.1056/NEJMoa2105215"
35     },
36     {
37       "type": "pmc_full_text",
38       "pmcid": "PMC6503629",
39       "quoted": "PARP inhibitors exploit synthetic lethality in cells with homologous recombination deficiency..."
40     }
41   ],
42   "gaps": [
43     "Long-term (>10 year) survival data for patients on adjuvant olaparib",
44     "Head-to-head comparative efficacy between BRCA1 and BRCA2 subgroups"
45   ],
46   "tool_calls": 5,
47   "latency_ms": 7420
48 }

Asking with Streaming

Get real-time progress as the agent calls tools and writes the answer token-by-token:

Python

1 import json
2 import requests
3 
4 response = requests.post(
5     f"{BASE_URL}/v2/datasets/scientific/medical/ask",
6     headers={
7         "Authorization": f"Bearer {API_KEY}",
8         "X-Organization-ID": ORG_ID,
9         "Content-Type": "application/json"
10     },
11     json={
12         "question": "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
13         "stream": True
14     },
15     stream=True  # Important: enable streaming
16 )
17 
18 # Process streamed response
19 for line in response.iter_lines():
20     if line:
21         line_text = line.decode('utf-8')
22         if line_text.startswith('data: '):
23             data = line_text[6:]
24             try:
25                 parsed = json.loads(data)
26                 if parsed.get('type') == 'tool.start':
27                     print(f"\n[tool] {parsed['name']}: {parsed['args']}")
28                 elif parsed.get('type') == 'text.delta':
29                     print(parsed['data'], end='', flush=True)
30                 elif parsed.get('type') == 'stream_complete':
31                     print("\nStream complete!")
32                     break
33             except json.JSONDecodeError:
34                 print(data, end='', flush=True)

TypeScript

1 const response = await fetch(
2   `${BASE_URL}/v2/datasets/scientific/medical/ask`,
3   {
4     method: "POST",
5     headers: {
6       "Authorization": `Bearer ${API_KEY}`,
7       "X-Organization-ID": ORG_ID,
8       "Content-Type": "application/json"
9     },
10     body: JSON.stringify({
11       question: "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
12       stream: true
13     })
14   }
15 );
16 
17 // Process streamed response
18 const reader = response.body!.getReader();
19 const decoder = new TextDecoder();
20 
21 while (true) {
22   const { done, value } = await reader.read();
23   if (done) break;
24 
25   const chunk = decoder.decode(value);
26   const lines = chunk.split('\n');
27 
28   for (const line of lines) {
29     if (line.startsWith('data: ')) {
30       const data = line.slice(6);
31       try {
32         const parsed = JSON.parse(data);
33         if (parsed.type === 'tool.start') {
34           console.log(`\n[tool] ${parsed.name}:`, parsed.args);
35         } else if (parsed.type === 'text.delta') {
36           process.stdout.write(parsed.data);
37         } else if (parsed.type === 'stream_complete') {
38           console.log("\nStream complete!");
39           break;
40         }
41       } catch {
42         process.stdout.write(data);
43       }
44     }
45   }
46 }

Ruby

1 uri = URI("#{BASE_URL}/v2/datasets/scientific/medical/ask")
2 http = Net::HTTP.new(uri.host, uri.port)
3 http.use_ssl = true
4 
5 request = Net::HTTP::Post.new(uri)
6 request["Authorization"] = "Bearer #{API_KEY}"
7 request["X-Organization-ID"] = ORG_ID
8 request["Content-Type"] = "application/json"
9 request.body = {
10   question: "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
11   stream: true
12 }.to_json
13 
14 # Process streamed response
15 http.request(request) do |response|
16   response.read_body do |chunk|
17     chunk.each_line do |line|
18       if line.start_with?("data: ")
19         data = line[6..-1].strip
20         begin
21           parsed = JSON.parse(data)
22           case parsed["type"]
23           when "tool.start"       then puts "\n[tool] #{parsed['name']}: #{parsed['args']}"
24           when "text.delta"       then print parsed["data"]
25           when "stream_complete"
26             puts "\nStream complete!"
27             break
28           end
29         rescue JSON::ParserError
30           print data
31         end
32       end
33     end
34   end
35 end

Event Types

Event	When it fires	Payload
`tool.start`	Agent is about to call a source (PubMed, PMC, CT.gov, or Semantic Scholar)	`{type, tool_call_id, name, args}`
`tool.end`	The tool returned	`{type, tool_call_id, name, ok, result_summary}`
`text.delta`	Answer tokens as the model writes them	`{type, data}`
`stream_complete`	Terminal event; stream has ended	`{type, stats, metadata}` — `metadata` carries `tool_calls`, `sources_count`, `latency_ms`

Request Fields

Field	Type	Default	Description
`question`	string (3–2000 chars)	required	Natural-language biomedical question.
`max_sources`	integer (1–25)	`10`	Target number of distinct cited sources in the final answer.
`include_trials`	boolean	`true`	Whether the agent may call ClinicalTrials.gov. Set `false` to skip trials entirely.
`recency_years`	integer (1–50)	`10`	Prefer evidence from the last N years where the question allows.
`stream`	boolean	`false`	If `true`, response is `text/event-stream`; otherwise JSON.

Response Fields

Top-level response

Field	Description	Example
`domain`	The scientific domain resolved for the request.	`"medical"`
`answer`	Synthesized answer with inline `[PMID:…]` / `[PMC:…]` / `[NCT:…]` / `[S2:…]` citation markers.	`"Strong evidence supports..."`
`sources`	Array of typed source records expanded from the citation markers.	See below.
`gaps`	What the agent could NOT find or verify.	`["Long-term (>10 year) survival data..."]`
`tool_calls`	Number of tool calls the agent made.	`5`
`latency_ms`	End-to-end wall-clock latency.	`7420`

PubMed source (`type: "pubmed"`)

Field	Description	Example
`pmid`	PubMed ID	`"28578601"`
`title`	Article title	`"Olaparib for Metastatic Breast Cancer..."`
`journal`	Journal name	`"The New England journal of medicine"`
`year`	Publication year	`2017`
`doi`	DOI, if available	`"10.1056/NEJMoa1706450"`

PMC full-text source (`type: "pmc_full_text"`)

Field	Description	Example
`pmcid`	PMC ID	`"PMC6503629"`
`quoted`	Short opening quote from the referenced article	`"PARP inhibitors exploit synthetic lethality..."`

Clinical trial source (`type: "clinical_trial"`)

Field	Description	Example
`nct_id`	ClinicalTrials.gov identifier	`"NCT02000622"`
`title`	Trial brief title	`"Olaparib as Adjuvant Treatment..."`
`phase`	Phase(s), comma-joined	`"PHASE3"`
`status`	Overall status	`"ACTIVE_NOT_RECRUITING"`

Semantic Scholar source (`type: "semantic_scholar"`)

Field	Description	Example
`paper_id`	Semantic Scholar paper ID	`"a1b2c3d4..."`
`title`	Paper title	`"Mechanisms of PARP inhibitor resistance..."`
`year`	Publication year	`2023`
`citation_count`	Total citations	`412`

Guarantees and Limits

Citations are real. Every [PMID:…], [PMC:…], [NCT:…], and [S2:…] marker in the answer corresponds to a record the agent actually retrieved during the request. The agent is explicitly instructed not to cite from memory.
Tool-call budget: 8. The agent loop hard-caps at 8 calls per request; the system prompt asks the model to stay under 6.
No shared corpus yet. Every request fetches live from the four upstream APIs. Federation happens in real time.
Latency. p50 around 8s; p95 up to 12s when full-text fetches are involved. Set client timeouts ≥ 90s.
Gaps are load-bearing. If the agent cannot find evidence at the specificity you asked for, it says so in gaps[] rather than hallucinating. Display that in your UI.

Deprecated URL Aliases

/v2/datasets/pubmed/* and /v2/datasets/medical/* return HTTP 400 with a pointer to /v2/datasets/scientific/medical/ask. Update your clients to use the new URL.