Skip to content

Quickstart

Install

pip install bcql_py
# or:
uv add bcql_py

Parse a query

parse() runs the lexer and the recursive-descent parser and returns the root BCQLNode. The returned tree is a frozen Pydantic v2 model: inspect it, pattern-match on node_type, or round-trip it back to BCQL.

from bcql_py import parse

ast = parse('[word="man"]')
ast.node_type       # 'token_query'
ast.to_bcql()       # '[word="man"]'

Reconstruct the query

Every node implements to_bcql(). The output is guaranteed to re-parse to an AST that is == to the original:

from bcql_py import parse

source = '"the" [pos="ADJ"]+ "man"'
ast = parse(source)
assert parse(ast.to_bcql()) == ast

See AST & parser design for the full picture of the AST hierarchy.

Handle errors

BCQLSyntaxError exposes the original query and the error position so you can render them inline or forward them to another system.

from bcql_py import parse, BCQLSyntaxError

try:
    parse('[word=')
except BCQLSyntaxError as err:
    err.position        # int: 0-based character offset
    err.query           # str: the original query
    str(err)            # Full message with a caret under err.position

Validate against a corpus

Pass a CorpusSpec to parse() to enforce corpus-specific rules (which annotations exist, which closed-class values are allowed, whether relations or alignment are supported). See the tagset validation guide for the full picture.

from bcql_py import CorpusSpec, parse

spec = CorpusSpec(
    open_attributes={"word", "lemma"},
    closed_attributes={"pos": {"NOUN", "VERB", "ADJ"}},
    strict_attributes=True,
)
parse('[pos="NOUN"]', spec=spec)   # ok
parse('[pos="NUMBER"]', spec=spec) # raises BCQLValidationError

Tokenize without parsing

Need just the token stream (e.g. for syntax highlighting)? Use tokenize():

from bcql_py import tokenize

tokens = tokenize('"man"')
for tok in tokens:
    print(tok.type.name, tok.value, tok.position)

Next steps