Quickstart¶
Install¶
Parse a query¶
parse() runs the lexer and the recursive-descent parser and returns the root
BCQLNode. The returned tree is a frozen Pydantic v2 model:
inspect it, pattern-match on node_type, or round-trip it back to BCQL.
from bcql_py import parse
ast = parse('[word="man"]')
ast.node_type # 'token_query'
ast.to_bcql() # '[word="man"]'
Reconstruct the query¶
Every node implements to_bcql(). The output is
guaranteed to re-parse to an AST that is == to the original:
from bcql_py import parse
source = '"the" [pos="ADJ"]+ "man"'
ast = parse(source)
assert parse(ast.to_bcql()) == ast
See AST & parser design for the full picture of the AST hierarchy.
Handle errors¶
BCQLSyntaxError exposes the original query and the error position so
you can render them inline or forward them to another system.
from bcql_py import parse, BCQLSyntaxError
try:
parse('[word=')
except BCQLSyntaxError as err:
err.position # int: 0-based character offset
err.query # str: the original query
str(err) # Full message with a caret under err.position
Validate against a corpus¶
Pass a CorpusSpec to parse() to enforce corpus-specific
rules (which annotations exist, which closed-class values are allowed, whether relations or
alignment are supported). See the tagset validation guide for
the full picture.
from bcql_py import CorpusSpec, parse
spec = CorpusSpec(
open_attributes={"word", "lemma"},
closed_attributes={"pos": {"NOUN", "VERB", "ADJ"}},
strict_attributes=True,
)
parse('[pos="NOUN"]', spec=spec) # ok
parse('[pos="NUMBER"]', spec=spec) # raises BCQLValidationError
Tokenize without parsing¶
Need just the token stream (e.g. for syntax highlighting)?
Use tokenize():
from bcql_py import tokenize
tokens = tokenize('"man"')
for tok in tokens:
print(tok.type.name, tok.value, tok.position)
Next steps¶
- Guides: a number of guides related to this library specifically and BCQL in general.
- Example scripts.