Parsers (BETA)
Parsers (BETA)
What are Parsers?
Parsers let you tell Captain how to read your JSON files. Instead of indexing raw JSON as text, you write a short JavaScript function that extracts the content you actually want to search over.
Without a parser, Captain indexes your JSON as-is:
With a parser, Captain indexes clean, searchable text:
Same data. Much better search results.
How It Works
Other file types (PDF, DOCX, images, etc.) are completely unaffected. The parser only applies to .json files.
Writing Your First Parser
A parsing script is a JavaScript function that receives your JSON object and returns a string:
That’s it. doc is your parsed JSON. Return an object with a text field containing the string you want Captain to index.
The text field
The text you return becomes the searchable content. You can format it however you want. Markdown works great because it preserves structure:
What to return
Your function must return a string. That string becomes the searchable content.
What does NOT work:
Examples
Research Papers
Input JSON:
Parsing script:
Indexed text:
Product Catalog
Input JSON:
Parsing script:
Indexed text:
Healthcare Records (FHIR)
Input JSON:
Parsing script:
Indexed text:
Generic Key-Value Flattener
Don’t know your schema yet? This script handles any JSON by flattening all fields into readable text:
This works for any JSON structure. Good for prototyping before you write a specialized parser.
Validating Scripts Programmatically
Before uploading a script, you can validate it via the API to catch syntax errors and structural problems. Captain runs your script in the same sandboxed V8 engine that json_handler uses at indexing time, so a script that validates will work when you index files.
Endpoint: POST /v2/parsing-scripts/validate
Content type: multipart/form-data. upload your .js file under the file field.
The validation does NOT run your script against real data. it just confirms:
- The code is syntactically valid JavaScript
- It exports a default function
The actual execution against your JSON files happens at indexing time in json_handler, which enforces the return-type contract (must be a string) on real data.
Valid response:
Invalid response:
Error types:
The endpoint returns HTTP 200 for both valid and invalid scripts. valid: false is a normal result, not an error. HTTP 4xx/5xx codes are reserved for auth failures and malformed requests.
Use this endpoint in CI/CD pipelines to catch bad scripts before they get uploaded, or in your own tools before calling the S3 upload proxy.
Using Parsers with the API
Add the parsing_script parameter to any indexing endpoint:
The parsing_script parameter is the relative path to your script. Captain resolves it to your org’s script storage automatically:
You can include or omit the .js extension. Both work.
Without parsing_script
If you don’t include parsing_script, JSON files are indexed as raw text. This is the existing behavior and it’s unchanged. Parsers are opt-in.
Supported Endpoints
parsing_script works on all indexing endpoints:
Tips for Writing Good Parsers
1. Return markdown, not plain text. Headers, bold, and lists help Captain understand document structure and improve search relevance.
2. Put the most important content first. Title, key identifiers, and summary should come before detailed content. Search results show previews from the beginning of each chunk.
3. Skip noise. Internal IDs, timestamps, and system fields don’t help search. Only extract what a human would search for.
4. Handle missing fields. Not every document has every field. Use guards:
5. Use var, not let/const. The V8 sandbox uses a JavaScript version that works best with var declarations. Avoid optional chaining (?.) and use explicit null checks instead.
6. Test in the Parser Studio. The browser-based test runner gives instant feedback. Write your script, paste sample JSON, click Run, and see the output immediately.
Limits
Error Handling
If your parsing script fails, that specific file is marked as failed in the indexing job. Other files in the same batch continue processing normally.
Common errors:
Check the job status for per-file error details:
FAQ
Can I use the same parser across multiple collections? Yes. Parsers are org-scoped, not collection-scoped. Any indexing job in your org can reference any parser.
What happens to non-JSON files when I set parsing_script?
Nothing. PDFs, DOCX, images, and other files are processed normally. The parser only affects .json files.
Can I use TypeScript? Not yet. Write plain JavaScript. TypeScript support would require shipping a compiler to the sandbox.
Can my script make API calls or read files?
No. The sandbox is completely isolated. No network access, no filesystem, no require/import. Your script receives the JSON object and returns text. That’s it.
Can I use async/await?
No. The sandbox runs synchronous JavaScript only. No Promises, no setTimeout, no async patterns.
How do I update a parser? Upload a new version with the same path. The next indexing job that references it will use the updated script. There’s no versioning. The latest upload is what runs.