bcql_py: Top-level API¶
The public interface of bcql_py is exposed directly from bcql_py.
CorpusSpec
¶
Bases: BaseModel
Immutable description of a corpus' semantic vocabulary.
All fields default to the most permissive setting ("anything goes") so that a
bare CorpusSpec() is a no-op validator. Narrow the spec by listing the
annotations, tags, and relations your corpus actually supports.
Attributes:
| Name | Type | Description |
|---|---|---|
open_attributes |
frozenset[str]
|
Annotation names whose value space is unconstrained
(e.g. |
closed_attributes |
dict[str, frozenset[str]]
|
Annotation names whose values are restricted to a
fixed set (e.g. |
strict_attributes |
bool
|
When |
allowed_span_tags |
frozenset[str] | None
|
Allowed XML span tag names (e.g. |
allowed_span_attributes |
dict[str, frozenset[str]] | None
|
Per-tag allowed XML attribute values. Missing
tags default to no constraint. Use |
allow_alignment |
bool
|
If |
allowed_alignment_fields |
frozenset[str] | None
|
Allowed target field names for alignment
queries, or |
allow_relations |
bool
|
If |
allowed_relations |
frozenset[str] | None
|
Allowed relation type names, or |
Example::
spec = CorpusSpec(open_attributes={"word"}, closed_attributes={"pos": {"NOUN", "VERB"}})
"pos" in spec.closed_attributes
# True
sorted(spec.closed_attributes["pos"])
# ['NOUN', 'VERB']
description
property
¶
A human-readable description of this spec. Can be overridden in subclasses. Potentially useful for error messages, debugging, or as information to LLM agents.
extend
¶
extend(
*,
open_attributes: Iterable[str] | None = None,
closed_attributes: Mapping[str, Iterable[str]]
| None = None,
allowed_span_tags: Iterable[str] | None = None,
allowed_span_attributes: Mapping[str, Iterable[str]]
| None = None,
allowed_alignment_fields: Iterable[str] | None = None,
allowed_relations: Iterable[str] | None = None,
strict_attributes: bool | None = None,
allow_alignment: bool | None = None,
allow_relations: bool | None = None,
) -> CorpusSpec
Return a new spec with the given additions/overrides merged in.
Similar to :meth:merge, but with a more granular API that allows adding
specific entries without having to construct a full spec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
open_attributes
|
Iterable[str] | None
|
Extra open-class annotation names to union in. |
None
|
closed_attributes
|
Mapping[str, Iterable[str]] | None
|
Extra closed-class attributes; per-key values union. |
None
|
allowed_span_tags
|
Iterable[str] | None
|
Extra allowed span tag names. |
None
|
allowed_span_attributes
|
Mapping[str, Iterable[str]] | None
|
Extra per-tag attribute names. |
None
|
allowed_alignment_fields
|
Iterable[str] | None
|
Extra alignment target fields. |
None
|
allowed_relations
|
Iterable[str] | None
|
Extra relation type names. |
None
|
strict_attributes
|
bool | None
|
Override the strict-attributes flag. |
None
|
allow_alignment
|
bool | None
|
Override the alignment allowed flag. |
None
|
allow_relations
|
bool | None
|
Override the relations allowed flag. |
None
|
Returns:
| Type | Description |
|---|---|
CorpusSpec
|
A new :class: |
Example::
base = CorpusSpec(open_attributes={"word"})
extended = base.extend(open_attributes={"lemma"})
sorted(extended.open_attributes)
# ['lemma', 'word']
View source on GitHub: src/bcql_py/validation/spec.py lines 144–220
merge
¶
Return a new spec combining this spec with other. In case of conflict, other wins (except for boolean flags, see below).
Set-valued fields are unioned. For the nullable set-valued fields
(allowed_span_tags, allowed_alignment_fields, allowed_relations,
and the dict-shaped allowed_span_attributes), None means "no
constraint". A concrete set/dict is treated as more restrictive than
None, so when one side is None and the other lists entries, the
result is the listed entries: None survives only when both sides are
None. This mirrors the boolean rule below: a concrete restriction
always beats "no constraint".
WARNING: For boolean flags, other wins only when it is more restrictive
(False beats True) so that merging in a preset cannot silently
re-enable something the caller disabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
CorpusSpec
|
Another spec to merge into this one. |
required |
Returns:
| Type | Description |
|---|---|
CorpusSpec
|
A new :class: |
Example::
spec1 = CorpusSpec(open_attributes={"word"}, allow_alignment=True)
spec2 = CorpusSpec(open_attributes={"lemma"}, closed_attributes={"pos": {"NOUN", "VERB"}}, allow_alignment=False)
merged = spec1.merge(spec2)
sorted(merged.open_attributes)
# ['lemma', 'word']
"pos" in merged.closed_attributes
# True
merged.allow_alignment
# False
View source on GitHub: src/bcql_py/validation/spec.py lines 222–303
has_annotation
¶
Return whether name is a known annotation on this spec.
An annotation is considered known when it is listed in either
:attr:open_attributes or :attr:closed_attributes. This method is
independent of :attr:strict_attributes: it only reports membership,
not whether an unknown annotation would raise during validation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The annotation name to check. |
required |
Returns:
| Type | Description |
|---|---|
bool
|
|
bool
|
spec, |
Example::
spec = CorpusSpec(
open_attributes={"word"},
closed_attributes={"pos": {"NOUN", "VERB"}},
)
spec.has_annotation("word")
# True
spec.has_annotation("pos")
# True
spec.has_annotation("lemma")
# False
View source on GitHub: src/bcql_py/validation/spec.py lines 305–333
BCQLSyntaxError
¶
BCQLValidationError
¶
Bases: Exception
Raised when an AST does not satisfy a :class:CorpusSpec.
Collects one or more :class:ValidationIssue instances so that callers can surface
every problem at once (when fail_fast=False) or just the first (default).
Attributes:
| Name | Type | Description |
|---|---|---|
issues |
One or more :class: |
View source on GitHub: src/bcql_py/exceptions.py lines 81–87
ValidationIssue
dataclass
¶
A single semantic validation problem found during :func:bcql_py.validate.
In practice, there may be multiple issues collected in a :class:BCQLValidationError
to report them all at once instead of just the first one.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
IssueKind
|
A short machine-readable label identifying the issue category. |
message |
str
|
Human-readable description of the problem. |
node_type |
str
|
The |
context |
dict[str, Any]
|
Extra context (e.g. the offending annotation name, value, or tag). |
tokenize
cached
¶
Tokenize a BCQL query string into a tuple of Tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The BCQL query to tokenize. |
required |
Returns:
| Type | Description |
|---|---|
tuple[Token, ...]
|
tuple[Token, ...]: The tuple of tokens. |
View source on GitHub: src/bcql_py/parser/__init__.py lines 12–23
parse
¶
Tokenize then parse a BCQL query string and return the root AST node.
When spec is given, the parsed AST is additionally run through
:func:bcql_py.validation.validate so that any corpus-specific semantic
problems are surfaced immediately rather than at query-execution time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
str
|
The BCQL query to parse. |
required |
spec
|
CorpusSpec | None
|
Optional :class: |
None
|
fail_fast
|
bool
|
Forwarded to :func: |
True
|
Returns:
| Type | Description |
|---|---|
BCQLNode
|
The root :class: |
Raises:
| Type | Description |
|---|---|
BCQLSyntaxError
|
If the query cannot be parsed. |
BCQLValidationError
|
If spec is provided and the AST violates it. |
View source on GitHub: src/bcql_py/parser/__init__.py lines 32–60
parse_from_tokens
¶
parse_from_tokens(
tokens: Sequence[Token],
source: str,
*,
spec: CorpusSpec | None = None,
fail_fast: bool = True,
) -> BCQLNode
Parse a BCQL token list into an abstract syntax tree.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
Sequence[Token]
|
The list of tokens to parse (from |
required |
source
|
str
|
The original source string. |
required |
spec
|
CorpusSpec | None
|
Optional :class: |
None
|
fail_fast
|
bool
|
Forwarded to :func: |
True
|
Returns:
| Type | Description |
|---|---|
BCQLNode
|
The root |
View source on GitHub: src/bcql_py/parser/__init__.py lines 63–86
validate
¶
Validate a parsed BCQL AST against spec, raising on any issue.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ast
|
BCQLNode
|
The root :class: |
required |
spec
|
CorpusSpec
|
The :class: |
required |
fail_fast
|
bool
|
When |
True
|
Raises:
| Type | Description |
|---|---|
BCQLValidationError
|
If one or more validation issues are found. The
raised exception's |
Example::
from bcql_py import CorpusSpec, parse, validate
spec = CorpusSpec(
open_attributes={"word"},
closed_attributes={"pos": {"NOUN", "VERB"}},
)
validate(parse('[pos="NOUN"]'), spec) # passes silently
try:
validate(parse('[pos="ADJ"]'), spec)
except Exception as exc:
print(exc.issues[0].kind)
# invalid_annotation_value
View source on GitHub: src/bcql_py/validation/validator.py lines 420–452