A Python parser for BlackLab Corpus Query Language¶
A full-coverage Python parser for the BlackLab Corpus Query Language (BCQL) that converts query strings into a Pydantic v2 AST (Abstract Syntax Tree) with lossless round-trip reconstruction and structured error reporting.
To get started, you can check out:
- A Quickstart guide
bcql_pyand BCQL general guides- The full API reference
- Python code examples
- A Gradio demo
Features¶
- Complete BCQL coverage: token queries, sequences, repetitions, spans, lookarounds, captures, global constraints, relations, alignments, and built-in functions.
- Immutable Pydantic v2 AST: every node is a frozen
BaseModelsubclass with anode_typediscriminator, making inspection and pattern matching straightforward. - Lossless BCQL round-trip:
to_bcql()reproduces the original query, preserving shorthand forms, quote characters, and sensitivity flags. - Position-aware syntax errors:
BCQLSyntaxErrorcarries the original query, the 0-based offset, and a caret-annotated message ready to forward to a user or LLM. - Optional semantic validation: a
CorpusSpecdescribes which annotations, span tags, alignment fields, and dependency relations your corpus supports. Pass it asparse(query, spec=spec)to catch typos and unsupported features before they reach the corpus. See the tagset validation guide. - Zero runtime dependencies beyond Pydantic.
Installation¶
Or with uv:
Try the Demo¶
A small Gradio app under
app/ lets you paste a
BCQL query, pick or build a CorpusSpec, and inspect parse and validation
results. The hosted demo runs on Hugging Face Spaces at
BramVanroy/bcql_py_validation.
To run it locally:
Development¶
Clone and set up the project:
Enable pre-commit hooks:
After installation, hooks run automatically on every git commit. We do style
checking with ruff and type-checking with mypy. You can also run them manually
across the whole repo:
To work on documentation locally:
This rebuilds a fresh local mike preview before serving it, which avoids re-using stale versioned docs while testing.
You can and should run tests before pushing to the remote, although a GitHub workflow will run those anyway on push. To run them locally: