A Python parser for BlackLab Corpus Query Language¶
A full-coverage Python parser for the BlackLab Corpus Query Language (BCQL) that converts query strings into a Pydantic v2 AST (Abstract Syntax Tree) with lossless round-trip reconstruction and structured error reporting.
To get started, you can check out:
- A Quickstart guide
bcql_pyand BCQL general guides- The full API reference
- Python code examples
- A Gradio demo
Features¶
- Complete BCQL coverage: token queries, sequences, repetitions, spans, lookarounds, captures, global constraints, relations, alignments, and built-in functions.
- Immutable Pydantic v2 AST: every node is a frozen
BaseModelsubclass with anode_typediscriminator, making inspection and pattern matching straightforward. - Lossless BCQL round-trip:
to_bcql()reproduces the original query (preserving shorthand forms, quote characters, sensitivity flags, etc.). - Position-aware syntax errors:
BCQLSyntaxErrorcarries the original query, the 0-based offset, and a caret-annotated message: ready to forward to a user or LLM. - Optional semantic validation: a
CorpusSpecdescribes which annotations, span tags, alignment fields, and dependency relations your corpus supports. Pass it asparse(query, spec=spec)to catch typos and unsupported features before they reach the corpus. See the tagset validation guide. - Zero runtime dependencies beyond Pydantic.
Installation¶
Or with uv:
Try the demo¶
A small Gradio app under app/
lets you paste a BCQL query, pick or build a CorpusSpec, and inspect parse +
validation results. The hosted demo runs on Hugging Face Spaces at
BramVanroy/bcql_py_validation.
To run it locally:
Supported BCQL constructs¶
| Category | Examples |
|---|---|
| Token queries | [word="man"], "man", [], [pos != "noun"] |
| Regex & literal strings | "(wo)?man", l"e.g.", "(?-i)Panama" |
| Boolean constraints | [lemma="search" & pos="noun"], [a="x" \| b="y"] |
| Sequences | "the" "tall" "man" |
| Repetitions | [pos="ADJ"]+, []{2,5}, "word"? |
| Spans | <s/>, <s>, </s>, <ne type="PERS"/> |
| Position filters | "baker" within <person/>, <s/> containing "dog" |
| Captures | A:[pos="ADJ"], A:[] "by" B:[] :: A.word = B.word |
| Relations | _ -obj-> _, _ -subj-> _ ; -obj-> _, ^--> "have" |
| Alignments | "cat" ==>nl _, "cat" ==>nl? _ |
| Lookaround | (?= "next"), (?<= "prev"), (?! "not") |
| Functions | meet(...), rspan(...), rfield(...) |
See the cheatsheet for a quick-reference table of every operator.