Quickstart¶

Install¶

pip install bcql_py
# or:
uv add bcql_py

Parse a query¶

parse() runs the lexer and the recursive-descent parser and returns the root BCQLNode. The returned tree is a frozen Pydantic v2 model: inspect it, pattern-match on node_type, or round-trip it back to BCQL.

from bcql_py import parse

ast = parse('[word="man"]')
ast.node_type       # 'token_query'
ast.to_bcql()       # '[word="man"]'

Reconstruct the query¶

Every node implements to_bcql(). The output is guaranteed to re-parse to an AST that is == to the original:

from bcql_py import parse

source = '"the" [pos="ADJ"]+ "man"'
ast = parse(source)
assert parse(ast.to_bcql()) == ast

See AST & parser design for the full picture of the AST hierarchy.

Handle errors¶

BCQLSyntaxError exposes the original query and the error position so you can render them inline or forward them to another system.

from bcql_py import parse, BCQLSyntaxError

try:
    parse('[word=')
except BCQLSyntaxError as err:
    err.position        # int: 0-based character offset
    err.query           # str: the original query
    str(err)            # Full message with a caret under err.position

Validate against a corpus¶

Pass a CorpusSpec to parse() to enforce corpus-specific rules (which annotations exist, which closed-class values are allowed, whether relations or alignment are supported). See the tagset validation guide for the full picture.

from bcql_py import CorpusSpec, parse

spec = CorpusSpec(
    open_attributes={"word", "lemma"},
    closed_attributes={"pos": {"NOUN", "VERB", "ADJ"}},
    strict_attributes=True,
)
parse('[pos="NOUN"]', spec=spec)   # ok
parse('[pos="NUMBER"]', spec=spec) # raises BCQLValidationError

Tokenize without parsing¶

Need just the token stream (e.g. for syntax highlighting)? Use tokenize():

from bcql_py import tokenize

tokens = tokenize('"man"')
for tok in tokens:
    print(tok.type.name, tok.value, tok.position)

Next steps¶

Guides: a number of guides related to this library specifically and BCQL in general.
Example scripts.