> For a complete page index of the Captain API documentation, fetch https://docs.runcaptain.com/llms.txt?excludeSpec=true

# Metadata Filtering

> Attach custom metadata to indexed files and filter search results by metadata fields. Supports equality, comparison, set membership, and logical operators.

## Agent Quick Reference - Metadata Filtering

* **Attach metadata at index time**: Pass `custom_metadata` (object) on any indexing endpoint. Keys are strings; values are string, int, float, bool, or string array.
* **Filter at query time**: Pass `metadata_filter` (object) on `POST /v2/collections/{name}/query`.
* **Operators**: `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$in`, `$nin`. Bare values default to `$eq`.
* **Logical**: Multiple top-level keys are ANDed. Use `$or` / `$and` for explicit logic.
* **Works with**: All file types (text, PDF, DOCX, images, video, audio, JSON, spreadsheets). Works with both `inference: true` and `inference: false` queries.

Metadata filtering lets you attach structured key-value pairs to your files at index time, then narrow search results at query time using those fields. This is useful for scoping searches by department, date range, access level, content type, or any other dimension relevant to your data.

## Attaching Metadata at Index Time

Pass `custom_metadata` on any indexing endpoint. Every chunk from that file inherits the metadata.

### Plain text

```python
import requests

BASE_URL = "https://api.runcaptain.com"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(
    f"{BASE_URL}/v2/collections/my_collection/index/text",
    headers=headers,
    json={
        "content": "Q4 2024 revenue grew 23% year-over-year to $4.2B...",
        "filename": "q4-2024-earnings.txt",
        "custom_metadata": {
            "department": "finance",
            "year": 2024,
            "quarter": "Q4",
            "is_public": False
        }
    }
)
```

```typescript
const response = await fetch(
  `${BASE_URL}/v2/collections/my_collection/index/text`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      content: "Q4 2024 revenue grew 23% year-over-year to $4.2B...",
      filename: "q4-2024-earnings.txt",
      custom_metadata: {
        department: "finance",
        year: 2024,
        quarter: "Q4",
        is_public: false
      }
    })
  }
);
```

### Cloud storage (S3, GCS, Azure, R2)

Metadata applies to all files in the indexing job.

```python
response = requests.post(
    f"{BASE_URL}/v2/collections/my_collection/index/s3",
    headers=headers,
    json={
        "bucket_name": "company-docs",
        "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
        "bucket_region": "us-east-1",
        "processing_type": "advanced",
        "custom_metadata": {
            "source": "s3",
            "department": "engineering",
            "confidentiality": "internal"
        }
    }
)
```

### Supported value types

| Type         | Example                             |
| ------------ | ----------------------------------- |
| String       | `"department": "legal"`             |
| Integer      | `"year": 2024`                      |
| Float        | `"confidence": 0.95`                |
| Boolean      | `"is_public": true`                 |
| String array | `"tags": ["earnings", "quarterly"]` |

## Filtering at Query Time

Pass `metadata_filter` in the query request body to restrict results to chunks matching your criteria.

```python
response = requests.post(
    f"{BASE_URL}/v2/collections/my_collection/query",
    headers=headers,
    json={
        "query": "What were the key revenue drivers?",
        "inference": False,
        "top_k": 10,
        "metadata_filter": {
            "department": "finance",
            "year": {"$gte": 2024}
        }
    }
)
```

```typescript
const response = await fetch(
  `${BASE_URL}/v2/collections/my_collection/query`,
  {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      query: "What were the key revenue drivers?",
      inference: false,
      top_k: 10,
      metadata_filter: {
        department: "finance",
        year: { $gte: 2024 }
      }
    })
  }
);
```

Filters work with both `inference: true` (AI-powered answers) and `inference: false` (raw search results).

## Filter Operators

| Operator       | Description           | Example                                         |
| -------------- | --------------------- | ----------------------------------------------- |
| *(bare value)* | Equals                | `{"department": "legal"}`                       |
| `$eq`          | Equals (explicit)     | `{"department": {"$eq": "legal"}}`              |
| `$ne`          | Not equals            | `{"department": {"$ne": "hr"}}`                 |
| `$gt`          | Greater than          | `{"year": {"$gt": 2023}}`                       |
| `$gte`         | Greater than or equal | `{"year": {"$gte": 2024}}`                      |
| `$lt`          | Less than             | `{"year": {"$lt": 2025}}`                       |
| `$lte`         | Less than or equal    | `{"year": {"$lte": 2024}}`                      |
| `$in`          | In set                | `{"department": {"$in": ["legal", "finance"]}}` |
| `$nin`         | Not in set            | `{"department": {"$nin": ["hr", "ops"]}}`       |

## Combining Filters

### Implicit AND

Multiple fields at the top level are combined with AND:

```json
{
  "metadata_filter": {
    "department": "finance",
    "year": {"$gte": 2024},
    "is_public": false
  }
}
```

This matches chunks where `department` is "finance" **AND** `year` is at least 2024 **AND** `is_public` is false.

### Explicit OR

Use `$or` to match chunks that satisfy any of the conditions:

```json
{
  "metadata_filter": {
    "$or": [
      {"department": "legal"},
      {"department": "finance"}
    ]
  }
}
```

### Mixing AND and OR

Combine top-level AND with nested OR:

```json
{
  "metadata_filter": {
    "year": {"$gte": 2024},
    "$or": [
      {"department": "legal"},
      {"department": "finance"}
    ]
  }
}
```

This matches chunks from 2024 or later in either the legal or finance department.

## Full Example

Index documents with metadata, then query with filters:

```python
import requests

BASE_URL = "https://api.runcaptain.com"
API_KEY = "your_api_key"
COLLECTION = "company_docs"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# 1. Index with metadata
requests.post(
    f"{BASE_URL}/v2/collections/{COLLECTION}/index/text",
    headers=headers,
    json={
        "content": "The board approved a 15% increase in R&D spending for fiscal year 2025...",
        "filename": "board-minutes-2025.txt",
        "custom_metadata": {
            "department": "executive",
            "year": 2025,
            "document_type": "minutes",
            "is_public": False
        }
    }
)

# 2. Query with filters
response = requests.post(
    f"{BASE_URL}/v2/collections/{COLLECTION}/query",
    headers=headers,
    json={
        "query": "R&D budget decisions",
        "inference": False,
        "top_k": 5,
        "metadata_filter": {
            "department": {"$in": ["executive", "finance"]},
            "year": {"$gte": 2024}
        }
    }
)

for result in response.json().get("search_results", []):
    print(f"  {result['filename']} (score: {result['score']:.3f})")
```