Parsers (BETA)
Parsers (BETA)
What are Parsers?
Parsers let you tell Captain how to read your JSON files. Instead of indexing raw JSON as text, you write a short JavaScript function that extracts the content you actually want to search over.
Without a parser, Captain indexes your JSON as-is:
With a parser, Captain indexes clean, searchable text:
Same data. Much better search results.
How It Works
Other file types (PDF, DOCX, images, etc.) are completely unaffected. The parser only applies to .json files.
Writing Your First Parser
A parsing script is a JavaScript function that receives your JSON object and returns a string:
That’s it. doc is your parsed JSON. Return an object with a text field containing the string you want Captain to index.
The text field
The text you return becomes the searchable content. You can format it however you want. Markdown works great because it preserves structure:
What to return
Your function must return a string. That string becomes the searchable content.
What does NOT work:
Examples
Research Papers
Input JSON:
Parsing script:
Indexed text:
Product Catalog
Input JSON:
Parsing script:
Indexed text:
Healthcare Records (FHIR)
Input JSON:
Parsing script:
Indexed text:
Generic Key-Value Flattener
Don’t know your schema yet? This script handles any JSON by flattening all fields into readable text:
This works for any JSON structure. Good for prototyping before you write a specialized parser.
Using Parsers with the API
Add the parsing_script parameter to any indexing endpoint:
The parsing_script parameter is the relative path to your script. Captain resolves it to your org’s script storage automatically:
You can include or omit the .js extension. Both work.
Without parsing_script
If you don’t include parsing_script, JSON files are indexed as raw text. This is the existing behavior and it’s unchanged. Parsers are opt-in.
Supported Endpoints
parsing_script works on all indexing endpoints:
Tips for Writing Good Parsers
1. Return markdown, not plain text. Headers, bold, and lists help Captain understand document structure and improve search relevance.
2. Put the most important content first. Title, key identifiers, and summary should come before detailed content. Search results show previews from the beginning of each chunk.
3. Skip noise. Internal IDs, timestamps, and system fields don’t help search. Only extract what a human would search for.
4. Handle missing fields. Not every document has every field. Use guards:
5. Use var, not let/const. The V8 sandbox uses a JavaScript version that works best with var declarations. Avoid optional chaining (?.) and use explicit null checks instead.
6. Test in the Parser Studio. The browser-based test runner gives instant feedback. Write your script, paste sample JSON, click Run, and see the output immediately.
Limits
Error Handling
If your parsing script fails, that specific file is marked as failed in the indexing job. Other files in the same batch continue processing normally.
Common errors:
Check the job status for per-file error details:
FAQ
Can I use the same parser across multiple collections? Yes. Parsers are org-scoped, not collection-scoped. Any indexing job in your org can reference any parser.
What happens to non-JSON files when I set parsing_script?
Nothing. PDFs, DOCX, images, and other files are processed normally. The parser only affects .json files.
Can I use TypeScript? Not yet. Write plain JavaScript. TypeScript support would require shipping a compiler to the sandbox.
Can my script make API calls or read files?
No. The sandbox is completely isolated. No network access, no filesystem, no require/import. Your script receives the JSON object and returns text. That’s it.
Can I use async/await?
No. The sandbox runs synchronous JavaScript only. No Promises, no setTimeout, no async patterns.
How do I update a parser? Upload a new version with the same path. The next indexing job that references it will use the updated script. There’s no versioning. The latest upload is what runs.