bcql_py: Top-level API¶
The public interface of bcql_py is exposed directly from bcql_py.
Top-level public API for bcql_py.
CorpusSpec
¶
Bases: BaseModel
Immutable description of a corpus' semantic vocabulary.
All fields default to the most permissive setting ("anything goes") so that a
bare CorpusSpec() is a no-op validator. Narrow the spec by listing the
annotations, tags, and relations your corpus actually supports.
Attributes:
| Name | Type | Description |
|---|---|---|
open_attributes |
frozenset[str]
|
Annotation names whose value space is unconstrained
(e.g. |
closed_attributes |
dict[str, frozenset[str]]
|
Annotation names whose values are restricted to a
fixed set (e.g. |
strict_attributes |
bool
|
When |
allowed_span_tags |
frozenset[str] | None
|
Allowed XML span tag names (e.g. |
allowed_span_attributes |
dict[str, frozenset[str]] | None
|
Per-tag allowed XML attribute values. Missing
tags default to no constraint. Use |
allow_alignment |
bool
|
If |
allowed_alignment_fields |
frozenset[str] | None
|
Allowed target field names for alignment
queries, or |
allow_relations |
bool
|
If |
allowed_relations |
frozenset[str] | None
|
Allowed relation type names, or |
Example::
spec = CorpusSpec(open_attributes={"word"}, closed_attributes={"pos": {"NOUN", "VERB"}})
"pos" in spec.closed_attributes
# True
sorted(spec.closed_attributes["pos"])
# ['NOUN', 'VERB']
description
property
¶
A human-readable description of this spec. Can be overridden in subclasses. Potentially useful for error messages, debugging, or as information to LLM agents.
extend
¶
extend(
*,
open_attributes: Iterable[str] | None = None,
closed_attributes: Mapping[str, Iterable[str]]
| None = None,
allowed_span_tags: Iterable[str] | None = None,
allowed_span_attributes: Mapping[str, Iterable[str]]
| None = None,
allowed_alignment_fields: Iterable[str] | None = None,
allowed_relations: Iterable[str] | None = None,
strict_attributes: bool | None = None,
allow_alignment: bool | None = None,
allow_relations: bool | None = None,
) -> CorpusSpec
Return a new spec with the given additions/overrides merged in. Similar to merge(), but with a more granular API that allows adding specific entries without having to construct a full spec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
open_attributes
|
Iterable[str] | None
|
Extra open-class annotation names to union in. |
None
|
closed_attributes
|
Mapping[str, Iterable[str]] | None
|
Extra closed-class attributes; per-key values union. |
None
|
allowed_span_tags
|
Iterable[str] | None
|
Extra allowed span tag names. |
None
|
allowed_span_attributes
|
Mapping[str, Iterable[str]] | None
|
Extra per-tag attribute names. |
None
|
allowed_alignment_fields
|
Iterable[str] | None
|
Extra alignment target fields. |
None
|
allowed_relations
|
Iterable[str] | None
|
Extra relation type names. |
None
|
strict_attributes
|
bool | None
|
Override the strict-attributes flag. |
None
|
allow_alignment
|
bool | None
|
Override the alignment allowed flag. |
None
|
allow_relations
|
bool | None
|
Override the relations allowed flag. |
None
|
Returns:
| Type | Description |
|---|---|
CorpusSpec
|
A new CorpusSpec; the receiver is not modified. |
Example::
base = CorpusSpec(open_attributes={"word"})
extended = base.extend(open_attributes={"lemma"})
sorted(extended.open_attributes)
# ['lemma', 'word']
View source on GitHub: src/bcql_py/validation/spec.py lines 145–221
merge
¶
Return a new spec combining this spec with other. In case of conflict, other wins (except for boolean flags, see below).
Set-valued fields are unioned. For the nullable set-valued fields
(allowed_span_tags, allowed_alignment_fields, allowed_relations,
and the dict-shaped allowed_span_attributes), None means "no
constraint". A concrete set/dict is treated as more restrictive than
None, so when one side is None and the other lists entries, the
result is the listed entries: None survives only when both sides are
None. This mirrors the boolean rule below: a concrete restriction
always beats "no constraint".
WARNING: For boolean flags, other wins only when it is more restrictive
(False beats True) so that merging in a preset cannot silently
re-enable something the caller disabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
CorpusSpec
|
Another spec to merge into this one. |
required |
Returns:
| Type | Description |
|---|---|
CorpusSpec
|
A new CorpusSpec representing the union. |
Example::
spec1 = CorpusSpec(open_attributes={"word"}, allow_alignment=True)
spec2 = CorpusSpec(open_attributes={"lemma"}, closed_attributes={"pos": {"NOUN", "VERB"}}, allow_alignment=False)
merged = spec1.merge(spec2)
sorted(merged.open_attributes)
# ['lemma', 'word']
"pos" in merged.closed_attributes
# True
merged.allow_alignment
# False
View source on GitHub: src/bcql_py/validation/spec.py lines 223–304
has_annotation
¶
Return whether name is a known annotation on this spec.
An annotation is considered known when it is listed in either
open_attributes or closed_attributes. This method is
independent of strict_attributes: it only reports membership,
not whether an unknown annotation would raise during validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The annotation name to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
spec, |
Example::
spec = CorpusSpec(
open_attributes={"word"},
closed_attributes={"pos": {"NOUN", "VERB"}},
)
spec.has_annotation("word")
# True
spec.has_annotation("pos")
# True
spec.has_annotation("lemma")
# False
View source on GitHub: src/bcql_py/validation/spec.py lines 306–334
BCQLSyntaxError
¶
Bases: Exception
A syntax error with optional source and position, raised when tokenization or parsing of a BCQL query fails.
Attributes:
| Name | Type | Description |
|---|---|---|
error_message |
Human-readable parse or lexing error message. |
|
bcql_query |
Original BCQL source query. |
|
error_position |
0-based character position in |
View source on GitHub: src/bcql_py/exceptions.py lines 21–31
__str__
¶
Return a readable message including a caret position when available.
View source on GitHub: src/bcql_py/exceptions.py lines 33–41
BCQLValidationError
¶
Bases: Exception
Raised when an AST does not satisfy a CorpusSpec.
Collects one or more ValidationIssue instances so that callers can surface
every problem at once (when fail_fast=False) or just the first (default).
Attributes:
| Name | Type | Description |
|---|---|---|
issues |
List of ValidationIssue entries describing what went wrong. |
View source on GitHub: src/bcql_py/exceptions.py lines 93–99
__str__
¶
Return one issue or a multi-line list of all validation issues as a string.
View source on GitHub: src/bcql_py/exceptions.py lines 101–107
ValidationIssue
dataclass
¶
A single semantic validation problem found during validate(). In practice, there may be multiple issues collected in a BCQLValidationError to report them all at once instead of just the first one.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
IssueKind
|
A short machine-readable label identifying the issue category. |
message |
str
|
Human-readable description of the problem. |
node_type |
str
|
The |
context |
dict[str, Any]
|
Extra context (e.g. the offending annotation name, value, or tag). |
__str__
¶
Return this issue as a compact single-line message.
View source on GitHub: src/bcql_py/exceptions.py lines 75–80
tokenize
cached
¶
Tokenize a BCQL query string into a tuple of Tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The BCQL query to tokenize. |
required |
Returns:
| Type | Description |
|---|---|
tuple[Token, ...]
|
tuple[Token, ...]: The tuple of tokens. |
View source on GitHub: src/bcql_py/parser/__init__.py lines 14–25
parse
¶
Tokenize then parse a BCQL query string and return the root AST node.
When spec is given, the parsed AST is additionally run through validate() so that any corpus-specific semantic problems are surfaced immediately rather than at query-execution time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The BCQL query to parse. |
required |
spec
|
CorpusSpec | None
|
Optional CorpusSpec describing the target corpus. When provided, semantic validation runs after a successful parse. |
None
|
fail_fast
|
bool
|
Forwarded to validate(); only has
an effect when spec is provided. |
True
|
Returns:
| Type | Description |
|---|---|
BCQLNode
|
The root BCQLNode of the parsed AST. |
Raises:
| Type | Description |
|---|---|
BCQLSyntaxError
|
If the query cannot be parsed. |
BCQLValidationError
|
If spec is provided and the AST violates it. |
View source on GitHub: src/bcql_py/parser/__init__.py lines 35–63
parse_from_tokens
¶
parse_from_tokens(
tokens: Sequence[Token],
source: str,
*,
spec: CorpusSpec | None = None,
fail_fast: bool = True,
) -> BCQLNode
Parse a BCQL token list into an abstract syntax tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
Sequence[Token]
|
The list of tokens to parse (from tokenize()). |
required |
source
|
str
|
The original source string. |
required |
spec
|
CorpusSpec | None
|
Optional CorpusSpec; see parse(). |
None
|
fail_fast
|
bool
|
Forwarded to validate() when spec is provided. |
True
|
Returns:
| Type | Description |
|---|---|
BCQLNode
|
The root BCQLNode. |
View source on GitHub: src/bcql_py/parser/__init__.py lines 66–88
validate
¶
Validate a parsed BCQL AST against spec, raising on any issue.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ast
|
BCQLNode
|
required | |
spec
|
CorpusSpec
|
The CorpusSpec describing what the corpus allows. |
required |
fail_fast
|
bool
|
When |
True
|
Raises:
| Type | Description |
|---|---|
BCQLValidationError
|
If one or more validation issues are found. The
raised exception's |
Example::
from bcql_py import CorpusSpec, parse, validate
spec = CorpusSpec(
open_attributes={"word"},
closed_attributes={"pos": {"NOUN", "VERB"}},
)
validate(parse('[pos="NOUN"]'), spec) # passes silently
try:
validate(parse('[pos="ADJ"]'), spec)
except Exception as exc:
print(exc.issues[0].kind)
# invalid_annotation_value
View source on GitHub: src/bcql_py/validation/validator.py lines 435–467