bcql_py: Top-level API¶

The public interface of bcql_py is exposed directly from bcql_py.

Top-level public API for bcql_py.

CorpusSpec ¶

Bases: BaseModel

Immutable description of a corpus' semantic vocabulary.

All fields default to the most permissive setting ("anything goes") so that a bare CorpusSpec() is a no-op validator. Narrow the spec by listing the annotations, tags, and relations your corpus actually supports.

Attributes:

Name	Type	Description
`open_attributes`	`frozenset[str]`	Annotation names whose value space is unconstrained (e.g. `word`, `lemma`).
`closed_attributes`	`dict[str, frozenset[str]]`	Annotation names whose values are restricted to a fixed set (e.g. `pos` -> `{"NOUN", "VERB", ...}`).
`strict_attributes`	`bool`	When `True`, any annotation not listed in `open_attributes` or `closed_attributes` is an error. When `False` (default), unknown annotations are accepted.
`allowed_span_tags`	`frozenset[str] \| None`	Allowed XML span tag names (e.g. `s`, `p`, `ne`), or `None` to allow any tag.
`allowed_span_attributes`	`dict[str, frozenset[str]] \| None`	Per-tag allowed XML attribute values. Missing tags default to no constraint. Use `None` to allow any attribute.
`allow_alignment`	`bool`	If `False`, any use of the alignment (`==>`) operator raises a validation error.
`allowed_alignment_fields`	`frozenset[str] \| None`	Allowed target field names for alignment queries, or `None` to allow any.
`allow_relations`	`bool`	If `False`, any relation operator (`-type->` or `^-type->`) raises a validation error.
`allowed_relations`	`frozenset[str] \| None`	Allowed relation type names, or `None` to allow any. An empty set means "no named relations allowed" (use `allow_relations=False` for that instead).

Example::

spec = CorpusSpec(open_attributes={"word"}, closed_attributes={"pos": {"NOUN", "VERB"}})
"pos" in spec.closed_attributes
# True
sorted(spec.closed_attributes["pos"])
# ['NOUN', 'VERB']

description `property` ¶

description: str

A human-readable description of this spec. Can be overridden in subclasses. Potentially useful for error messages, debugging, or as information to LLM agents.

extend ¶

extend(
    *,
    open_attributes: Iterable[str] | None = None,
    closed_attributes: Mapping[str, Iterable[str]]
    | None = None,
    allowed_span_tags: Iterable[str] | None = None,
    allowed_span_attributes: Mapping[str, Iterable[str]]
    | None = None,
    allowed_alignment_fields: Iterable[str] | None = None,
    allowed_relations: Iterable[str] | None = None,
    strict_attributes: bool | None = None,
    allow_alignment: bool | None = None,
    allow_relations: bool | None = None,
) -> CorpusSpec

Return a new spec with the given additions/overrides merged in. Similar to merge(), but with a more granular API that allows adding specific entries without having to construct a full spec.

Parameters:

Name	Type	Description	Default
`open_attributes`	`Iterable[str] \| None`	Extra open-class annotation names to union in.	`None`
`closed_attributes`	`Mapping[str, Iterable[str]] \| None`	Extra closed-class attributes; per-key values union.	`None`
`allowed_span_tags`	`Iterable[str] \| None`	Extra allowed span tag names.	`None`
`allowed_span_attributes`	`Mapping[str, Iterable[str]] \| None`	Extra per-tag attribute names.	`None`
`allowed_alignment_fields`	`Iterable[str] \| None`	Extra alignment target fields.	`None`
`allowed_relations`	`Iterable[str] \| None`	Extra relation type names.	`None`
`strict_attributes`	`bool \| None`	Override the strict-attributes flag.	`None`
`allow_alignment`	`bool \| None`	Override the alignment allowed flag.	`None`
`allow_relations`	`bool \| None`	Override the relations allowed flag.	`None`

Returns:

Type	Description
`CorpusSpec`	A new CorpusSpec; the receiver is not modified.

Example::

base = CorpusSpec(open_attributes={"word"})
extended = base.extend(open_attributes={"lemma"})
sorted(extended.open_attributes)
# ['lemma', 'word']

View source on GitHub: src/bcql_py/validation/spec.py lines 145–221

merge ¶

merge(other: CorpusSpec) -> CorpusSpec

Return a new spec combining this spec with other. In case of conflict, other wins (except for boolean flags, see below).

Set-valued fields are unioned. For the nullable set-valued fields (allowed_span_tags, allowed_alignment_fields, allowed_relations, and the dict-shaped allowed_span_attributes), None means "no constraint". A concrete set/dict is treated as more restrictive than None, so when one side is None and the other lists entries, the result is the listed entries: None survives only when both sides are None. This mirrors the boolean rule below: a concrete restriction always beats "no constraint".

WARNING: For boolean flags, other wins only when it is more restrictive (False beats True) so that merging in a preset cannot silently re-enable something the caller disabled.

Parameters:

Name	Type	Description	Default
`other`	`CorpusSpec`	Another spec to merge into this one.	required

Returns:

Type	Description
`CorpusSpec`	A new CorpusSpec representing the union.

Example::

spec1 = CorpusSpec(open_attributes={"word"}, allow_alignment=True)
spec2 = CorpusSpec(open_attributes={"lemma"}, closed_attributes={"pos": {"NOUN", "VERB"}}, allow_alignment=False)
merged = spec1.merge(spec2)
sorted(merged.open_attributes)
# ['lemma', 'word']
"pos" in merged.closed_attributes
# True
merged.allow_alignment
# False

View source on GitHub: src/bcql_py/validation/spec.py lines 223–304

has_annotation ¶

has_annotation(name: str) -> bool

Return whether name is a known annotation on this spec.

An annotation is considered known when it is listed in either open_attributes or closed_attributes. This method is independent of strict_attributes: it only reports membership, not whether an unknown annotation would raise during validation.

Parameters:

Name	Type	Description	Default
`name`	`str`	The annotation name to check.	required

Returns:

Type	Description
`bool`	`True` if name is either an open or closed attribute on this
`bool`	spec, `False` otherwise.

Example::

spec = CorpusSpec(
    open_attributes={"word"},
    closed_attributes={"pos": {"NOUN", "VERB"}},
)
spec.has_annotation("word")
# True
spec.has_annotation("pos")
# True
spec.has_annotation("lemma")
# False

View source on GitHub: src/bcql_py/validation/spec.py lines 306–334

BCQLSyntaxError ¶

BCQLSyntaxError(
    error_message: str,
    *,
    bcql_query: str = "",
    error_position: int | None = None,
)

Bases: Exception

A syntax error with optional source and position, raised when tokenization or parsing of a BCQL query fails.

Attributes:

Name	Type	Description
`error_message`		Human-readable parse or lexing error message.
`bcql_query`		Original BCQL source query.
`error_position`		0-based character position in `bcql_query`.

View source on GitHub: src/bcql_py/exceptions.py lines 21–31

str ¶

__str__() -> str

Return a readable message including a caret position when available.

View source on GitHub: src/bcql_py/exceptions.py lines 33–41

BCQLValidationError ¶

BCQLValidationError(issues: list[ValidationIssue])

Bases: Exception

Raised when an AST does not satisfy a CorpusSpec.

Collects one or more ValidationIssue instances so that callers can surface every problem at once (when fail_fast=False) or just the first (default).

Attributes:

Name	Type	Description
`issues`		List of ValidationIssue entries describing what went wrong.

View source on GitHub: src/bcql_py/exceptions.py lines 93–99

str ¶

__str__() -> str

Return one issue or a multi-line list of all validation issues as a string.

View source on GitHub: src/bcql_py/exceptions.py lines 101–107

ValidationIssue `dataclass` ¶

ValidationIssue(
    kind: IssueKind,
    message: str,
    node_type: str,
    context: dict[str, Any] = dict(),
)

A single semantic validation problem found during validate(). In practice, there may be multiple issues collected in a BCQLValidationError to report them all at once instead of just the first one.

Attributes:

Name	Type	Description
`kind`	`IssueKind`	A short machine-readable label identifying the issue category.
`message`	`str`	Human-readable description of the problem.
`node_type`	`str`	The `node_type` discriminator of the offending AST node.
`context`	`dict[str, Any]`	Extra context (e.g. the offending annotation name, value, or tag).

str ¶

__str__() -> str

Return this issue as a compact single-line message.

View source on GitHub: src/bcql_py/exceptions.py lines 75–80

tokenize `cached` ¶

tokenize(source: str) -> tuple[Token, ...]

Tokenize a BCQL query string into a tuple of Tokens.

Parameters:

Name	Type	Description	Default
`source`	`str`	The BCQL query to tokenize.	required

Returns:

Type	Description
`tuple[Token, ...]`	tuple[Token, ...]: The tuple of tokens.

View source on GitHub: src/bcql_py/parser/__init__.py lines 14–25

parse ¶

parse(
    source: str,
    *,
    spec: CorpusSpec | None = None,
    fail_fast: bool = True,
) -> BCQLNode

Tokenize then parse a BCQL query string and return the root AST node.

When spec is given, the parsed AST is additionally run through validate() so that any corpus-specific semantic problems are surfaced immediately rather than at query-execution time.

Parameters:

Name	Type	Description	Default
`source`	`str`	The BCQL query to parse.	required
`spec`	`CorpusSpec \| None`	Optional CorpusSpec describing the target corpus. When provided, semantic validation runs after a successful parse.	`None`
`fail_fast`	`bool`	Forwarded to validate(); only has an effect when spec is provided. `True` raises on the first validation issue, `False` collects every issue before raising.	`True`

Returns:

Type	Description
`BCQLNode`	The root BCQLNode of the parsed AST.

Raises:

Type	Description
`BCQLSyntaxError`	If the query cannot be parsed.
`BCQLValidationError`	If spec is provided and the AST violates it.

View source on GitHub: src/bcql_py/parser/__init__.py lines 35–63

parse_from_tokens ¶

parse_from_tokens(
    tokens: Sequence[Token],
    source: str,
    *,
    spec: CorpusSpec | None = None,
    fail_fast: bool = True,
) -> BCQLNode

Parse a BCQL token list into an abstract syntax tree.

Parameters:

Name	Type	Description	Default
`tokens`	`Sequence[Token]`	The list of tokens to parse (from tokenize()).	required
`source`	`str`	The original source string.	required
`spec`	`CorpusSpec \| None`	Optional CorpusSpec; see parse().	`None`
`fail_fast`	`bool`	Forwarded to validate() when spec is provided.	`True`

Returns:

Type	Description
`BCQLNode`	The root BCQLNode.

View source on GitHub: src/bcql_py/parser/__init__.py lines 66–88

validate ¶

validate(
    ast: BCQLNode,
    spec: CorpusSpec,
    *,
    fail_fast: bool = True,
)

Validate a parsed BCQL AST against spec, raising on any issue.

Parameters:

Name	Type	Description	Default
`ast`	`BCQLNode`	The root BCQLNode returned by parse().	required
`spec`	`CorpusSpec`	The CorpusSpec describing what the corpus allows.	required
`fail_fast`	`bool`	When `True` (default), raise as soon as the first issue is found. When `False`, collect every issue and raise once at the end so callers can report them all together.	`True`

Raises:

Type	Description
`BCQLValidationError`	If one or more validation issues are found. The raised exception's `issues` attribute holds the full list.

Example::

from bcql_py import CorpusSpec, parse, validate
spec = CorpusSpec(
    open_attributes={"word"},
    closed_attributes={"pos": {"NOUN", "VERB"}},
)
validate(parse('[pos="NOUN"]'), spec)  # passes silently
try:
    validate(parse('[pos="ADJ"]'), spec)
except Exception as exc:
    print(exc.issues[0].kind)
# invalid_annotation_value

View source on GitHub: src/bcql_py/validation/validator.py lines 435–467

bcql_py: Top-level API¶

CorpusSpec ¶

description property ¶

extend ¶

merge ¶

has_annotation ¶

BCQLSyntaxError ¶

__str__ ¶

BCQLValidationError ¶

__str__ ¶

ValidationIssue dataclass ¶

__str__ ¶

tokenize cached ¶

parse ¶

parse_from_tokens ¶

validate ¶

description `property` ¶

str ¶

str ¶

ValidationIssue `dataclass` ¶

str ¶

tokenize `cached` ¶