Models overview¶
All AST node types inherit from BCQLNode.
Each node carries a node_type literal discriminator, making the full tree
serializable to/from JSON.
BCQLNodeUnion
module-attribute
¶
BCQLNodeUnion = Union[
StringValue,
AnnotationConstraint,
IntegerRangeConstraint,
FunctionConstraint,
NotConstraint,
BoolConstraint,
TokenQuery,
SequenceNode,
RepetitionNode,
GroupNode,
SequenceBoolNode,
NegationNode,
UnderscoreNode,
LookaheadNode,
LookbehindNode,
SpanQuery,
PositionFilterNode,
CaptureNode,
AnnotationRef,
ConstraintLiteral,
ConstraintInteger,
ConstraintComparison,
ConstraintBoolean,
ConstraintNot,
ConstraintFunctionCall,
GlobalConstraintNode,
RelationOperator,
ChildConstraint,
RelationNode,
RootRelationNode,
AlignmentOperator,
AlignmentConstraint,
AlignmentNode,
FunctionCallNode,
]
Annotated union of every concrete BCQL AST node, discriminated by node_type.
Use this anywhere a field can hold any BCQL node (sub-queries, sequence
children, relation targets, etc.). For fields restricted to a smaller subset
of node types, prefer the narrower unions defined alongside their owners
(e.g. ConstraintExpr in :mod:bcql_py.models.token for token-level constraints,
CaptureConstraintExpr in :mod:bcql_py.models.capture for capture constraints).
Narrower unions give better validation errors and make the schema honest about
which nodes are actually legal in that position.
AlignmentConstraint
¶
Bases: BCQLNode
One alignment constraint: operator target
Multiple alignment constraints are separated by ;.
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
AlignmentOperator
|
The AlignmentOperator. |
target |
BCQLNodeUnion
|
The target sub-query. |
AlignmentNode
¶
Bases: BCQLNode
A parallel alignment query: source ==>field target [; ==>field target]*.
Attributes:
source: The source query in the primary field
alignments: One or more alignment constraints.
AlignmentOperator
¶
Bases: BCQLNode
The operator in an alignment query: =type=>field or ==>field?.
See https://github.com/instituutnederlandsetaal/BlackLab/blob/dev/site/docs/guide/040_query-language/030_parallel.md
Attributes:
| Name | Type | Description |
|---|---|---|
target_field |
str
|
The target field name (e.g. |
optional |
bool
|
|
relation_type |
str | None
|
Optional type filter (e.g. |
capture_name |
str | None
|
Override for the capture group name (default |
BCQLNode
¶
Bases: BaseModel, ABC
Abstract base for every node in the BCQL abstract syntax tree.
Sub-classes must override to_bcql and set node_type
to a unique Literal string so that discrimination works correctly.
Configuration
frozen = True: instances are immutable after creationuse_enum_values = True: enum fields store their.value
bcql
cached
property
¶
Convenience property to get the BCQL string representation of this node.
to_bcql
abstractmethod
¶
Reconstruct a BCQL query string from this AST node.
The returned string is functionally equivalent to the original query but may differ in trivial whitespace and formatting.
View source on GitHub: src/bcql_py/models/base.py lines 32–38
AnnotationRef
¶
Bases: BCQLNode
Reference to a captured token's annotation: label.annotation, or a bare capture label.
Examples:
- A.word refers to the word annotation of capture A.
- A as a bare label (typically used as a function argument, e.g. start(A)).
Attributes:
| Name | Type | Description |
|---|---|---|
label |
str
|
Capture group name. |
annotation |
str
|
Annotation name, or empty string for a bare label reference. |
CaptureNode
¶
Bases: BCQLNode
A capture label applied to a sub-query: label:body, e.g. A:[word="hello"].
Everything matched by body is captured under label in the match info.
Attributes:
| Name | Type | Description |
|---|---|---|
label |
str
|
The capture group name (e.g. |
body |
BCQLNodeUnion
|
The sub-query whose match is captured |
ConstraintBoolean
¶
Bases: BCQLNode
Boolean combination of capture constraints: left op right.
Operators: & (AND), | (OR), -> (implication). All three share the same precedence
per Bcql.g4's booleanOperator rule. The -> implication operator is most commonly
seen in capture constraints (e.g. A.word = "cat" -> B.word = "dog") but the grammar
allows it at every level.
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
Literal['&', '|', '->']
|
|
left |
CaptureConstraintExpr
|
Left operand. |
right |
CaptureConstraintExpr
|
Right operand. |
ConstraintComparison
¶
Bases: BCQLNode
A comparison in a capture constraint: left op right.
Supported operators: =, !=, <, <=, >, >=.
Operators here do not get their own class; should not be needed here.
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
Literal['=', '!=', '<', '<=', '>', '>=']
|
The comparison operator. |
left |
CaptureConstraintExpr
|
Left-hand operand (usually an AnnotationRef). |
right |
CaptureConstraintExpr
|
Right-hand operand (annotation ref, literal, or function call). |
ConstraintFunctionCall
¶
Bases: BCQLNode
A function call in a capture constraint.
Examples: start(A) or end(B) used in expressions like
start(B) < start(A).
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Function name (e.g. |
args |
list[CaptureConstraintExpr]
|
Function arguments (annotation refs, literals, etc.). |
ConstraintInteger
¶
Bases: BCQLNode
An integer literal in a capture constraint.
Example: the 5 in focus.pos > 5.
Attributes:
| Name | Type | Description |
|---|---|---|
value |
int
|
The integer value. |
ConstraintLiteral
¶
Bases: BCQLNode
A literal string value in a capture constraint.
Example: the "over" in A.word = "over".
Attributes:
| Name | Type | Description |
|---|---|---|
value |
str
|
The literal string (without quotes) |
quote_char |
Literal['"', "'"]
|
The quote character used in the original query, either |
ConstraintNot
¶
Bases: BCQLNode
Logical NOT in a capture constraint
Attributes:
| Name | Type | Description |
|---|---|---|
operand |
CaptureConstraintExpr
|
The constraint being negated. |
GlobalConstraintNode
¶
Bases: BCQLNode
A query with a global capture constraint.
The constraint expression follows the :: operator and relates captures defined in body.
Example: A:[] "by" B:[] :: A.word = B.word where A:[] "by" B:[] is the body and A.word = B.word is the constraint expression.
Attributes:
| Name | Type | Description |
|---|---|---|
body |
BCQLNodeUnion
|
The main query containing captures. |
constraint |
CaptureConstraintExpr
|
The constraint expression relating captures. |
FunctionCallNode
¶
Bases: BCQLNode
A built-in function call at the sequence level.
Function arguments can be sub-queries BCQLNode, tring values StringValue, or integers
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Function name |
args |
list[BCQLNodeUnion | int]
|
Positional arguments |
LookaheadNode
¶
Bases: BCQLNode
A lookahead assertion: (?=...) (positive) or (?!...) (negative).
Matches a position only if the enclosed query matches (or doesn't match) the tokens that follow.
Attributes:
| Name | Type | Description |
|---|---|---|
positive |
bool
|
|
body |
BCQLNodeUnion
|
The sub-query that must (or must not) match ahead. |
LookbehindNode
¶
Bases: BCQLNode
A lookbehind assertion: (?<=...) (positive) or (?<!...) (negative).
Matches a position only if the enclosed query matches (or doesn't match) the tokens that precede
Attributes:
| Name | Type | Description |
|---|---|---|
positive |
bool
|
|
body |
BCQLNodeUnion
|
The sub-query that must (or must not) match behind |
ChildConstraint
¶
Bases: BCQLNode
A single child constraint in a relation query.
Represents [-label:] -type-> target inside a relation expression.
Multiple child constraints are separated by ;. The target itself can be any BCQL sub-query,
including another relation query (e.g. _ -nsubj-> (_ -amod-> _)).
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
RelationOperator
|
The relation operator (type, negation, target field). |
target |
BCQLNodeUnion
|
The target sub-query. |
label |
str | None
|
Optional capture label on this child relation (e.g. |
RelationNode
¶
Bases: BCQLNode
A dependency relation query: source -type-> target [; -type-> target]*.
The source is specified once; one or more child constraints follow, separated by ;.
Attributes:
| Name | Type | Description |
|---|---|---|
source |
BCQLNodeUnion
|
The source of the relation. |
children |
list[ChildConstraint]
|
One or more target constraints. |
RelationOperator
¶
Bases: BCQLNode
The operator in a relation query: -type-> or !-type->.
See https://github.com/instituutnederlandsetaal/BlackLab/blob/dev/site/docs/guide/040_query-language/020_relations.md#negative-child-constraints
for details on negative relations.
Attributes:
| Name | Type | Description |
|---|---|---|
relation_type |
str | None
|
The relation type as a string or regex pattern (e.g. |
negated |
bool
|
|
target_field |
str | None
|
For cross-field relations (e.g. |
RootRelationNode
¶
Bases: BCQLNode
A root relation query: ^-type-> target or label:^-type-> target.
Usually this relation does not have a "type" (since ROOT is the dependency relation from the root), but some corpora may differ.
TODO: see if the Validator and CorpusSpec should account for "allowed root relations"
Root relations have no source, only a target. They match the root of a dependency tree.
Attributes:
| Name | Type | Description |
|---|---|---|
relation_type |
str | None
|
Optional relation type filter (usually |
target |
BCQLNodeUnion
|
The target sub-query. |
label |
str | None
|
Optional capture label. |
GroupNode
¶
Bases: BCQLNode
A parenthesized group of sub-queries.
Groups allow applying repetition operators or capture constraints to a complex sub-expression. We specify that there can only be one child node in a group, which typically would be a SequenceNode if there are multiple adjacent tokens or a token-level Node.
Attributes:
| Name | Type | Description |
|---|---|---|
child |
BCQLNodeUnion
|
The inner sub-query. |
NegationNode
¶
Bases: BCQLNode
Sequence-level negation (!).
Negation sits at the span level in the precedence chain (above repetition), so
!"man"+ parses as !("man"+) per Bcql.g4's sequencePartNoCapture rule.
The child is always a single span-level node (never a bare sequence), so
to_bcql just prepends ! without extra parentheses.
Attributes:
| Name | Type | Description |
|---|---|---|
child |
BCQLNodeUnion
|
The sub-query being negated. |
RepetitionNode
¶
Bases: BCQLNode
A repetition quantifier applied to a sub-query.
Supports + (1+), * (0+), ? (0 or 1), {n}, {n,m},
{n,}. Note that "up to" quantifiers like {0,m} are exported as
{,m} and may therefore be different in surface form from the original.
Attributes:
| Name | Type | Description |
|---|---|---|
child |
BCQLNodeUnion
|
The sub-query being repeated. |
min_count |
int
|
Minimum number of repetitions (inclusive, min. 0). |
max_count |
int | None
|
Maximum number of repetitions (inclusive), or |
SequenceBoolNode
¶
Bases: BCQLNode
Sequence-level boolean combination (&, |, ->).
Binary, left-associative node mirroring the booleanOperator rule in Bcql.g4:
all three operators share the same precedence. For example, "a" | "b" & "c"
parses as ("a" | "b") & "c".
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
Literal['&', '|', '->']
|
The boolean operator. |
left |
BCQLNodeUnion
|
The left operand. |
right |
BCQLNodeUnion
|
The right operand. |
SequenceNode
¶
Bases: BCQLNode
An ordered sequence of adjacent tokens / sub-queries. A very high-level node type that can represent an entire query or a sub-sequence
Attributes:
| Name | Type | Description |
|---|---|---|
children |
list[BCQLNodeUnion]
|
The ordered list of child nodes in the sequence. |
UnderscoreNode
¶
Bases: BCQLNode
The _ wildcard used in relation queries.
Distinct from [] (match-all token): _ means "any source or
target" in a relation expression without constraining token count.
PositionFilterNode
¶
Bases: BCQLNode
A position-filter operator: within, containing, or overlap.
Example: "baker" within <person/> means find "baker" inside a <person/> span.
These operators are right-associative, so A within B within C is parsed as A within (B within C).
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
Literal['within', 'containing', 'overlap']
|
One of |
left |
BCQLNodeUnion
|
The query whose hits are filtered. |
right |
BCQLNodeUnion
|
The span/query that defines the positional constraint. |
SpanQuery
¶
Bases: BCQLNode
A span (XML tag) query.
Three forms exist per Bcql.g4's tag rule:
- Whole span: <s/> or <ne type="PERS"/>
- Start tag: <s>
- End tag: </s>
The tag name can be a plain identifier (s, ne) or a quoted string
for regex patterns (<"person|location"/>).
Attributes:
| Name | Type | Description |
|---|---|---|
tag_name |
str | StringValue
|
The tag name as a plain string or |
position |
Literal['whole', 'start', 'end']
|
|
attributes |
dict[str, StringValue]
|
XML attributes as |
AnnotationConstraint
¶
Bases: BCQLNode
A single annotation comparison: annotation op "value".
Typically between an identifier, an operator, and a string value.
Note that the identifier is not semantically specified here! It fully depends
on the corpus which attributes (like word, lemma, pos) are available. So here
annotation is underspecified as just a string.
Example: word="man" or pos != "noun".
Attributes:
| Name | Type | Description |
|---|---|---|
annotation |
str
|
The annotation name (e.g. |
operator |
Literal['=', '!=', '<', '<=', '>', '>=']
|
|
value |
StringValue
|
The value being compared against. |
BoolConstraint
¶
Bases: BCQLNode
Boolean combination of token-level constraints: left op right.
The operator is & (AND), | (OR), or -> (implication). Per the BCQL spec / Bcql.g4,
all three share identical precedence and are left-associative. See the booleanOperator rule
in Bcql.g4. Naming-wise calling it "boolean" might be somewhat confusing for the implication case though
Not to be confused with sequence-level boolean operators (also &, |, and ->) which
combine whole sub-queries instead of token constraints. See sequence.SequenceBoolNode for those.
Attributes:
| Name | Type | Description |
|---|---|---|
operator |
Literal['&', '|', '->']
|
|
left |
ConstraintExpr
|
Left operand. |
right |
ConstraintExpr
|
Right operand. |
FunctionConstraint
¶
Bases: BCQLNode
A function-call constraint inside token brackets.
TODO: check for predefined functions in blacklab?
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The function / pseudo-annotation name. |
args |
list[StringValue]
|
The string arguments to the function. |
IntegerRangeConstraint
¶
Bases: BCQLNode
An integer range constraint, such as a parser's confidence: annotation=in[min,max].
Example: pos_confidence=in[50,100].
Note that we require both min and max vals to be given. No implicit "infinite" or "zero" bounds.
Attributes:
| Name | Type | Description |
|---|---|---|
annotation |
str
|
The annotation name. |
min_val |
int
|
Inclusive lower bound. |
max_val |
int
|
Inclusive upper bound. |
NotConstraint
¶
Bases: BCQLNode
Logical NOT on a token-level constraint: !expr.
Typically for a capture group: !(pos="noun" | pos="verb").
Attributes:
| Name | Type | Description |
|---|---|---|
operand |
ConstraintExpr
|
The constraint being negated. |
StringValue
¶
Bases: BCQLNode
A quoted string value inside a BCQL query.
Handles regular strings, literal strings (prefixed with l), and
sensitivity flags ((?-i) for sensitive, (?i) for insensitive).
Attributes:
| Name | Type | Description |
|---|---|---|
value |
str
|
The raw string content (without surrounding quotes). |
is_literal |
bool
|
|
sensitivity |
Literal['default', 'sensitive', 'insensitive']
|
|
Example::
StringValue(value="(?-i)Panama").to_bcql()
# '"(?-i)Panama"'
TokenQuery
¶
Bases: BCQLNode
A single token query: [...], "string" shorthand, or [].
Attributes:
| Name | Type | Description |
|---|---|---|
constraint |
ConstraintExpr | None
|
The constraint expression inside the brackets, or
|
negated |
bool
|
|
shorthand |
StringValue | None
|
When the query was written as a bare string like
|