Skip to content

bcql_py: Top-level API

The public interface of bcql_py is exposed directly from bcql_py.

CorpusSpec

Bases: BaseModel

Immutable description of a corpus' semantic vocabulary.

All fields default to the most permissive setting ("anything goes") so that a bare CorpusSpec() is a no-op validator. Narrow the spec by listing the annotations, tags, and relations your corpus actually supports.

Attributes:

Name Type Description
open_attributes frozenset[str]

Annotation names whose value space is unconstrained (e.g. word, lemma).

closed_attributes dict[str, frozenset[str]]

Annotation names whose values are restricted to a fixed set (e.g. pos -> {"NOUN", "VERB", ...}).

strict_attributes bool

When True, any annotation not listed in open_attributes or closed_attributes is an error. When False (default), unknown annotations are accepted.

allowed_span_tags frozenset[str] | None

Allowed XML span tag names (e.g. s, p, ne), or None to allow any tag.

allowed_span_attributes dict[str, frozenset[str]] | None

Per-tag allowed XML attribute values. Missing tags default to no constraint. Use None to allow any attribute.

allow_alignment bool

If False, any use of the alignment (==>) operator raises a validation error.

allowed_alignment_fields frozenset[str] | None

Allowed target field names for alignment queries, or None to allow any.

allow_relations bool

If False, any relation operator (-type-> or ^-type->) raises a validation error.

allowed_relations frozenset[str] | None

Allowed relation type names, or None to allow any. An empty set means "no named relations allowed" (use allow_relations=False for that instead).

Example::

spec = CorpusSpec(open_attributes={"word"}, closed_attributes={"pos": {"NOUN", "VERB"}})
"pos" in spec.closed_attributes
# True
sorted(spec.closed_attributes["pos"])
# ['NOUN', 'VERB']

description property

description: str

A human-readable description of this spec. Can be overridden in subclasses. Potentially useful for error messages, debugging, or as information to LLM agents.

extend

extend(
    *,
    open_attributes: Iterable[str] | None = None,
    closed_attributes: Mapping[str, Iterable[str]]
    | None = None,
    allowed_span_tags: Iterable[str] | None = None,
    allowed_span_attributes: Mapping[str, Iterable[str]]
    | None = None,
    allowed_alignment_fields: Iterable[str] | None = None,
    allowed_relations: Iterable[str] | None = None,
    strict_attributes: bool | None = None,
    allow_alignment: bool | None = None,
    allow_relations: bool | None = None,
) -> CorpusSpec

Return a new spec with the given additions/overrides merged in. Similar to :meth:merge, but with a more granular API that allows adding specific entries without having to construct a full spec.

Parameters:

Name Type Description Default
open_attributes Iterable[str] | None

Extra open-class annotation names to union in.

None
closed_attributes Mapping[str, Iterable[str]] | None

Extra closed-class attributes; per-key values union.

None
allowed_span_tags Iterable[str] | None

Extra allowed span tag names.

None
allowed_span_attributes Mapping[str, Iterable[str]] | None

Extra per-tag attribute names.

None
allowed_alignment_fields Iterable[str] | None

Extra alignment target fields.

None
allowed_relations Iterable[str] | None

Extra relation type names.

None
strict_attributes bool | None

Override the strict-attributes flag.

None
allow_alignment bool | None

Override the alignment allowed flag.

None
allow_relations bool | None

Override the relations allowed flag.

None

Returns:

Type Description
CorpusSpec

A new :class:CorpusSpec; the receiver is not modified.

Example::

base = CorpusSpec(open_attributes={"word"})
extended = base.extend(open_attributes={"lemma"})
sorted(extended.open_attributes)
# ['lemma', 'word']

merge

merge(other: CorpusSpec) -> CorpusSpec

Return a new spec combining this spec with other. In case of conflict, other wins (except for boolean flags, see below).

Set-valued fields are unioned. For the nullable set-valued fields (allowed_span_tags, allowed_alignment_fields, allowed_relations, and the dict-shaped allowed_span_attributes), None means "no constraint". A concrete set/dict is treated as more restrictive than None, so when one side is None and the other lists entries, the result is the listed entries: None survives only when both sides are None. This mirrors the boolean rule below: a concrete restriction always beats "no constraint".

WARNING: For boolean flags, other wins only when it is more restrictive (False beats True) so that merging in a preset cannot silently re-enable something the caller disabled.

Parameters:

Name Type Description Default
other CorpusSpec

Another spec to merge into this one.

required

Returns:

Type Description
CorpusSpec

A new :class:CorpusSpec representing the union.

Example::

spec1 = CorpusSpec(open_attributes={"word"}, allow_alignment=True)
spec2 = CorpusSpec(open_attributes={"lemma"}, closed_attributes={"pos": {"NOUN", "VERB"}}, allow_alignment=False)
merged = spec1.merge(spec2)
sorted(merged.open_attributes)
# ['lemma', 'word']
"pos" in merged.closed_attributes
# True
merged.allow_alignment
# False

has_annotation

has_annotation(name: str) -> bool

Return whether name is a known annotation on this spec.

An annotation is considered known when it is listed in either :attr:open_attributes or :attr:closed_attributes. This method is independent of :attr:strict_attributes: it only reports membership, not whether an unknown annotation would raise during validation.

Parameters:

Name Type Description Default
name str

The annotation name to check.

required

Returns:

Type Description
bool

True if name is either an open or closed attribute on this

bool

spec, False otherwise.

Example::

spec = CorpusSpec(
    open_attributes={"word"},
    closed_attributes={"pos": {"NOUN", "VERB"}},
)
spec.has_annotation("word")
# True
spec.has_annotation("pos")
# True
spec.has_annotation("lemma")
# False

BCQLSyntaxError

BCQLSyntaxError(
    error_message: str,
    *,
    bcql_query: str = "",
    error_position: int | None = None,
)

BCQLValidationError

BCQLValidationError(issues: list[ValidationIssue])

Bases: Exception

Raised when an AST does not satisfy a :class:CorpusSpec.

Collects one or more :class:ValidationIssue instances so that callers can surface every problem at once (when fail_fast=False) or just the first (default).

Attributes:

Name Type Description
issues

One or more :class:ValidationIssue entries describing what went wrong.

ValidationIssue dataclass

ValidationIssue(
    kind: IssueKind,
    message: str,
    node_type: str,
    context: dict[str, Any] = dict(),
)

A single semantic validation problem found during :func:bcql_py.validate. In practice, there may be multiple issues collected in a :class:BCQLValidationError to report them all at once instead of just the first one.

Attributes:

Name Type Description
kind IssueKind

A short machine-readable label identifying the issue category.

message str

Human-readable description of the problem.

node_type str

The node_type discriminator of the offending AST node.

context dict[str, Any]

Extra context (e.g. the offending annotation name, value, or tag).

tokenize cached

tokenize(source: str) -> tuple[Token, ...]

Tokenize a BCQL query string into a tuple of Tokens.

Parameters:

Name Type Description Default
source str

The BCQL query to tokenize.

required

Returns:

Type Description
tuple[Token, ...]

tuple[Token, ...]: The tuple of tokens.

parse

parse(
    source: str,
    *,
    spec: CorpusSpec | None = None,
    fail_fast: bool = True,
) -> BCQLNode

Tokenize then parse a BCQL query string and return the root AST node.

When spec is given, the parsed AST is additionally run through :func:bcql_py.validation.validate so that any corpus-specific semantic problems are surfaced immediately rather than at query-execution time.

Parameters:

Name Type Description Default
source str

The BCQL query to parse.

required
spec CorpusSpec | None

Optional :class:~bcql_py.validation.CorpusSpec describing the target corpus. When provided, semantic validation runs after a successful parse.

None
fail_fast bool

Forwarded to :func:bcql_py.validation.validate; only has an effect when spec is provided. True raises on the first validation issue, False collects every issue before raising.

True

Returns:

Type Description
BCQLNode

The root :class:~bcql_py.models.base.BCQLNode of the parsed AST.

Raises:

Type Description
BCQLSyntaxError

If the query cannot be parsed.

BCQLValidationError

If spec is provided and the AST violates it.

parse_from_tokens

parse_from_tokens(
    tokens: Sequence[Token],
    source: str,
    *,
    spec: CorpusSpec | None = None,
    fail_fast: bool = True,
) -> BCQLNode

Parse a BCQL token list into an abstract syntax tree.

Parameters:

Name Type Description Default
tokens Sequence[Token]

The list of tokens to parse (from tokenize).

required
source str

The original source string.

required
spec CorpusSpec | None

Optional :class:~bcql_py.validation.CorpusSpec; see :func:parse.

None
fail_fast bool

Forwarded to :func:bcql_py.validation.validate when spec is provided.

True

Returns:

Type Description
BCQLNode

The root BCQLNode.

validate

validate(
    ast: BCQLNode,
    spec: CorpusSpec,
    *,
    fail_fast: bool = True,
)

Validate a parsed BCQL AST against spec, raising on any issue.

Parameters:

Name Type Description Default
ast BCQLNode

The root :class:~bcql_py.models.base.BCQLNode returned by :func:bcql_py.parse.

required
spec CorpusSpec

The :class:CorpusSpec describing what the corpus allows.

required
fail_fast bool

When True (default), raise as soon as the first issue is found. When False, collect every issue and raise once at the end so callers can report them all together.

True

Raises:

Type Description
BCQLValidationError

If one or more validation issues are found. The raised exception's issues attribute holds the full list.

Example::

from bcql_py import CorpusSpec, parse, validate
spec = CorpusSpec(
    open_attributes={"word"},
    closed_attributes={"pos": {"NOUN", "VERB"}},
)
validate(parse('[pos="NOUN"]'), spec)  # passes silently
try:
    validate(parse('[pos="ADJ"]'), spec)
except Exception as exc:
    print(exc.issues[0].kind)
# invalid_annotation_value