> For a complete page index of the Captain API documentation, fetch https://docs.runcaptain.com/llms.txt?excludeSpec=true

# Scientific - Medical Q&A

> Captain's Scientific dataset provides agentic Q&A over the biomedical literature. One endpoint federates PubMed, PMC open-access full text, ClinicalTrials.gov, and Semantic Scholar, then returns a synthesized answer with cited sources.

## Agent Quick Reference - Scientific Medical Q\&A

* **Endpoint**: `POST /v2/datasets/scientific/medical/ask`
* **Auth**: `Authorization: Bearer {api_key}` (`X-Organization-ID: {org_id}` optional; defaults to the key's organization)
* **Request body**: `{"question": "...", "max_sources": 10, "include_trials": true, "recency_years": 10, "stream": false}`
* **Modes**: JSON (default) and Server-Sent Events (`stream: true`). SSE event types: `tool.start`, `tool.end`, `text.delta`, `stream_complete`.
* **Federated sources**: PubMed E-utilities, PMC OA full text, ClinicalTrials.gov v2, Semantic Scholar Graph API.
* **Citations**: Answer contains inline `[PMID:...]`, `[PMC:PMC...]`, `[NCT:NCT...]`, `[S2:...]` markers. The response `sources[]` array maps each marker to a typed record.
* **Gaps**: The `gaps[]` array surfaces what the agent could NOT find or verify. Display it. it's load-bearing honesty.
* **Latency**: p50 \~8s, p95 \~12s. Set client timeouts ≥ 90s.
* **Legacy URLs**: `/v2/datasets/pubmed/*` and `/v2/datasets/medical/*` return a 400 pointing at `/v2/datasets/scientific/medical/ask`.

Scientific is Captain's agentic Q\&A dataset for the scientific literature. Send a natural-language question and the agent federates across multiple live sources, then returns a synthesized answer with real, verifiable citations.

## What Scientific Covers

| Source                 | Description                                                         | Key Data Points                                                                                 |
| ---------------------- | ------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| **PubMed**             | NCBI E-utilities over 37M+ biomedical abstracts                     | PMID, title, abstract, journal, year, authors, DOI, PMCID                                       |
| **PMC Full Text**      | Open-access article bodies, tables, and figures from PubMed Central | Section headings and text, table captions, figure captions                                      |
| **ClinicalTrials.gov** | v2 API over all registered trials                                   | NCT ID, phase, overall status, conditions, interventions, brief summary, start/completion dates |
| **Semantic Scholar**   | Embedding-backed paper search for true semantic recall              | Paper ID, title, abstract, year, citation count, venue, authors, DOI, PMID                      |

## How It Works

1. **Ask** a natural-language question about a medical or biomedical topic
2. The agent **plans** which sources to call and fans out up to 6 tool calls in parallel
3. An **answer** is synthesized with inline citations to the exact records retrieved
4. A **`gaps[]`** field surfaces what the agent could not find or verify

All requests require authentication via an `Authorization: Bearer {api_key}` header. `X-Organization-ID` is optional: include it only when your key is not already scoped to an organization.

## Search Medical Papers

`POST /v2/datasets/scientific/medical/ask` with a question in the body.

### Asking a Question

Get a synthesized answer with cited sources. First invocation of the day can hit a cold Lambda (\~60s) + agent loop (\~30-60s). set client timeouts to 180s and guard against non-JSON responses (504/502) so errors surface cleanly:

```python title="Python"
import requests
import uuid
import json

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json",
    "Idempotency-Key": str(uuid.uuid4())
}

response = requests.post(
    f"{BASE_URL}/v2/datasets/scientific/medical/ask",
    headers=headers,
    json={
        "question": "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
        "max_sources": 10,
        "include_trials": True,
        "recency_years": 10
    },
    timeout=180.0,
)

# Surface non-JSON responses (504 gateway timeout, 502 bad gateway, empty body,
# HTML error pages) instead of blowing up with an opaque JSONDecodeError.
if response.headers.get("content-type", "").startswith("application/json"):
    print(json.dumps(response.json(), indent=2))
else:
    print(f"HTTP {response.status_code}  content-type={response.headers.get('content-type')!r}")
    print(f"body: {response.text[:500]!r}")
```

```typescript title="TypeScript"
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 180_000);

const response = await fetch(
  `${BASE_URL}/v2/datasets/scientific/medical/ask`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json",
      "Idempotency-Key": crypto.randomUUID()
    },
    body: JSON.stringify({
      question: "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
      max_sources: 10,
      include_trials: true,
      recency_years: 10
    }),
    signal: controller.signal
  }
);
clearTimeout(timeoutId);

// Surface non-JSON responses (504/502/empty body) instead of crashing.
const contentType = response.headers.get("content-type") || "";
if (!contentType.startsWith("application/json")) {
  const body = await response.text();
  console.log(`HTTP ${response.status}  content-type=${contentType}`);
  console.log(`body: ${body.slice(0, 500)}`);
} else {
  const result = await response.json();
  console.log(JSON.stringify(result, null, 2));
}
```

```ruby title="Ruby"
require 'securerandom'

uri = URI("#{BASE_URL}/v2/datasets/scientific/medical/ask")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.read_timeout = 180

request = Net::HTTP::Post.new(uri)
request["Authorization"] = "Bearer #{API_KEY}"
request["Content-Type"] = "application/json"
request["Idempotency-Key"] = SecureRandom.uuid
request.body = {
  question: "What is the evidence that BRCA1-mutated breast cancer patients benefit from PARP inhibitors?",
  max_sources: 10,
  include_trials: true,
  recency_years: 10
}.to_json

response = http.request(request)
content_type = response["content-type"] || ""
if content_type.start_with?("application/json")
  puts JSON.pretty_generate(JSON.parse(response.body))
else
  puts "HTTP #{response.code}  content-type=#{content_type}"
  puts "body: #{response.body[0, 500]}"
end
```

#### Example Response

```json
{
  "domain": "medical",
  "answer": "Strong evidence supports PARP inhibitor use in BRCA1-mutated breast cancer across the metastatic and adjuvant settings. The OlympiAD phase 3 trial showed olaparib monotherapy extended progression-free survival versus standard chemotherapy in metastatic HER2-negative disease (7.0 vs. 4.2 months; HR 0.58) [PMID:28578601]. EMBRACA demonstrated similar benefit with talazoparib (8.6 vs. 5.6 months; HR 0.54) [PMID:30110579]. In the adjuvant setting, OlympiA [NCT:NCT02000622] randomized 1,836 patients with germline BRCA1/2 mutations to one year of olaparib versus placebo, showing improvements in overall survival (HR 0.68) and invasive disease-free survival [PMID:34081848][PMID:36228963]. Mechanistic studies attribute the response to synthetic lethality between PARP inhibition and BRCA1-driven homologous recombination deficiency [PMID:33710534][PMC:PMC6503629].",
  "sources": [
    {
      "type": "pubmed",
      "pmid": "28578601",
      "title": "Olaparib for Metastatic Breast Cancer in Patients with a Germline BRCA Mutation.",
      "journal": "The New England journal of medicine",
      "year": 2017,
      "doi": "10.1056/NEJMoa1706450"
    },
    {
      "type": "pubmed",
      "pmid": "30110579",
      "title": "Talazoparib in Patients with Advanced Breast Cancer and a Germline BRCA Mutation.",
      "journal": "The New England journal of medicine",
      "year": 2018,
      "doi": "10.1056/NEJMoa1802905"
    },
    {
      "type": "clinical_trial",
      "nct_id": "NCT02000622",
      "title": "Olaparib as Adjuvant Treatment in Patients With Germline BRCA Mutated High Risk HER2 Negative Primary Breast Cancer (OlympiA)",
      "phase": "PHASE3",
      "status": "ACTIVE_NOT_RECRUITING"
    },
    {
      "type": "pubmed",
      "pmid": "34081848",
      "title": "Adjuvant Olaparib for Patients with BRCA1- or BRCA2-Mutated Breast Cancer.",
      "journal": "The New England journal of medicine",
      "year": 2021,
      "doi": "10.1056/NEJMoa2105215"
    },
    {
      "type": "pmc_full_text",
      "pmcid": "PMC6503629",
      "quoted": "PARP inhibitors exploit synthetic lethality in cells with homologous recombination deficiency..."
    }
  ],
  "gaps": [
    "Long-term (>10 year) survival data for patients on adjuvant olaparib",
    "Head-to-head comparative efficacy between BRCA1 and BRCA2 subgroups"
  ],
  "tool_calls": 5,
  "latency_ms": 7420
}
```

### Asking with Streaming

Get real-time progress as the agent calls tools and writes the answer token-by-token:

```python title="Python"
import json
import requests

response = requests.post(
    f"{BASE_URL}/v2/datasets/scientific/medical/ask",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json={
        "question": "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
        "stream": True
    },
    stream=True  # Important: enable streaming
)

# Process streamed response
for line in response.iter_lines():
    if line:
        line_text = line.decode('utf-8')
        if line_text.startswith('data: '):
            data = line_text[6:]
            try:
                parsed = json.loads(data)
                if parsed.get('type') == 'tool.start':
                    print(f"\n[tool] {parsed['name']}: {parsed['args']}")
                elif parsed.get('type') == 'text.delta':
                    print(parsed['data'], end='', flush=True)
                elif parsed.get('type') == 'stream_complete':
                    print("\nStream complete!")
                    break
            except json.JSONDecodeError:
                print(data, end='', flush=True)
```

```typescript title="TypeScript"
const response = await fetch(
  `${BASE_URL}/v2/datasets/scientific/medical/ask`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      question: "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
      stream: true
    })
  }
);

// Process streamed response
const reader = response.body!.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split('\n');

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const data = line.slice(6);
      try {
        const parsed = JSON.parse(data);
        if (parsed.type === 'tool.start') {
          console.log(`\n[tool] ${parsed.name}:`, parsed.args);
        } else if (parsed.type === 'text.delta') {
          process.stdout.write(parsed.data);
        } else if (parsed.type === 'stream_complete') {
          console.log("\nStream complete!");
          break;
        }
      } catch {
        process.stdout.write(data);
      }
    }
  }
}
```

```ruby title="Ruby"
uri = URI("#{BASE_URL}/v2/datasets/scientific/medical/ask")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true

request = Net::HTTP::Post.new(uri)
request["Authorization"] = "Bearer #{API_KEY}"
request["Content-Type"] = "application/json"
request.body = {
  question: "What are the latest published outcomes for CAR-T therapy in multiple myeloma?",
  stream: true
}.to_json

# Process streamed response
http.request(request) do |response|
  response.read_body do |chunk|
    chunk.each_line do |line|
      if line.start_with?("data: ")
        data = line[6..-1].strip
        begin
          parsed = JSON.parse(data)
          case parsed["type"]
          when "tool.start"       then puts "\n[tool] #{parsed['name']}: #{parsed['args']}"
          when "text.delta"       then print parsed["data"]
          when "stream_complete"
            puts "\nStream complete!"
            break
          end
        rescue JSON::ParserError
          print data
        end
      end
    end
  end
end
```

#### Event Types

| Event             | When it fires                                                              | Payload                                                                                   |
| ----------------- | -------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| `tool.start`      | Agent is about to call a source (PubMed, PMC, CT.gov, or Semantic Scholar) | `{type, tool_call_id, name, args}`                                                        |
| `tool.end`        | The tool returned                                                          | `{type, tool_call_id, name, ok, result_summary}`                                          |
| `text.delta`      | Answer tokens as the model writes them                                     | `{type, data}`                                                                            |
| `stream_complete` | Terminal event; stream has ended                                           | `{type, stats, metadata}`. `metadata` carries `tool_calls`, `sources_count`, `latency_ms` |

## Request Fields

| Field            | Type                  | Default  | Description                                                                         |
| ---------------- | --------------------- | -------- | ----------------------------------------------------------------------------------- |
| `question`       | string (3–2000 chars) | required | Natural-language biomedical question.                                               |
| `max_sources`    | integer (1–25)        | `10`     | Target number of distinct cited sources in the final answer.                        |
| `include_trials` | boolean               | `true`   | Whether the agent may call ClinicalTrials.gov. Set `false` to skip trials entirely. |
| `recency_years`  | integer (1–50)        | `10`     | Prefer evidence from the last N years where the question allows.                    |
| `stream`         | boolean               | `false`  | If `true`, response is `text/event-stream`; otherwise JSON.                         |

## Response Fields

### Top-level response

| Field        | Description                                                                                    | Example                                     |
| ------------ | ---------------------------------------------------------------------------------------------- | ------------------------------------------- |
| `domain`     | The scientific domain resolved for the request.                                                | `"medical"`                                 |
| `answer`     | Synthesized answer with inline `[PMID:…]` / `[PMC:…]` / `[NCT:…]` / `[S2:…]` citation markers. | `"Strong evidence supports..."`             |
| `sources`    | Array of typed source records expanded from the citation markers.                              | See below.                                  |
| `gaps`       | What the agent could NOT find or verify.                                                       | `["Long-term (>10 year) survival data..."]` |
| `tool_calls` | Number of tool calls the agent made.                                                           | `5`                                         |
| `latency_ms` | End-to-end wall-clock latency.                                                                 | `7420`                                      |

### PubMed source (`type: "pubmed"`)

| Field     | Description       | Example                                      |
| --------- | ----------------- | -------------------------------------------- |
| `pmid`    | PubMed ID         | `"28578601"`                                 |
| `title`   | Article title     | `"Olaparib for Metastatic Breast Cancer..."` |
| `journal` | Journal name      | `"The New England journal of medicine"`      |
| `year`    | Publication year  | `2017`                                       |
| `doi`     | DOI, if available | `"10.1056/NEJMoa1706450"`                    |

### PMC full-text source (`type: "pmc_full_text"`)

| Field    | Description                                     | Example                                            |
| -------- | ----------------------------------------------- | -------------------------------------------------- |
| `pmcid`  | PMC ID                                          | `"PMC6503629"`                                     |
| `quoted` | Short opening quote from the referenced article | `"PARP inhibitors exploit synthetic lethality..."` |

### Clinical trial source (`type: "clinical_trial"`)

| Field    | Description                   | Example                               |
| -------- | ----------------------------- | ------------------------------------- |
| `nct_id` | ClinicalTrials.gov identifier | `"NCT02000622"`                       |
| `title`  | Trial brief title             | `"Olaparib as Adjuvant Treatment..."` |
| `phase`  | Phase(s), comma-joined        | `"PHASE3"`                            |
| `status` | Overall status                | `"ACTIVE_NOT_RECRUITING"`             |

### Semantic Scholar source (`type: "semantic_scholar"`)

| Field            | Description               | Example                                        |
| ---------------- | ------------------------- | ---------------------------------------------- |
| `paper_id`       | Semantic Scholar paper ID | `"a1b2c3d4..."`                                |
| `title`          | Paper title               | `"Mechanisms of PARP inhibitor resistance..."` |
| `year`           | Publication year          | `2023`                                         |
| `citation_count` | Total citations           | `412`                                          |

## Guarantees and Limits

* **Citations are real.** Every `[PMID:…]`, `[PMC:…]`, `[NCT:…]`, and `[S2:…]` marker in the answer corresponds to a record the agent actually retrieved during the request. The agent is explicitly instructed not to cite from memory.
* **Tool-call budget: 8.** The agent loop hard-caps at 8 calls per request; the system prompt asks the model to stay under 6.
* **No shared corpus yet.** Every request fetches live from the four upstream APIs. Federation happens in real time.
* **Latency.** p50 around 8s; p95 up to 12s when full-text fetches are involved. Set client timeouts ≥ 90s.
* **Gaps are load-bearing.** If the agent cannot find evidence at the specificity you asked for, it says so in `gaps[]` rather than hallucinating. Display that in your UI.

## Deprecated URL Aliases

`/v2/datasets/pubmed/*` and `/v2/datasets/medical/*` return HTTP 400 with a pointer to `/v2/datasets/scientific/medical/ask`. Update your clients to use the new URL.