Skip to content

Models overview

All AST node types inherit from BCQLNode. Each node carries a node_type literal discriminator, making the full tree serializable to/from JSON.

BCQLNodeUnion module-attribute

BCQLNodeUnion = Union[
    StringValue,
    AnnotationConstraint,
    IntegerRangeConstraint,
    FunctionConstraint,
    NotConstraint,
    BoolConstraint,
    TokenQuery,
    SequenceNode,
    RepetitionNode,
    GroupNode,
    SequenceBoolNode,
    NegationNode,
    UnderscoreNode,
    LookaheadNode,
    LookbehindNode,
    SpanQuery,
    PositionFilterNode,
    CaptureNode,
    AnnotationRef,
    ConstraintLiteral,
    ConstraintInteger,
    ConstraintComparison,
    ConstraintBoolean,
    ConstraintNot,
    ConstraintFunctionCall,
    GlobalConstraintNode,
    RelationOperator,
    ChildConstraint,
    RelationNode,
    RootRelationNode,
    AlignmentOperator,
    AlignmentConstraint,
    AlignmentNode,
    FunctionCallNode,
]

Annotated union of every concrete BCQL AST node, discriminated by node_type.

Use this anywhere a field can hold any BCQL node (sub-queries, sequence children, relation targets, etc.). For fields restricted to a smaller subset of node types, prefer the narrower unions defined alongside their owners (e.g. ConstraintExpr in :mod:bcql_py.models.token for token-level constraints, CaptureConstraintExpr in :mod:bcql_py.models.capture for capture constraints). Narrower unions give better validation errors and make the schema honest about which nodes are actually legal in that position.

AlignmentConstraint

Bases: BCQLNode

One alignment constraint: operator target

Multiple alignment constraints are separated by ;.

Attributes:

Name Type Description
operator AlignmentOperator
target BCQLNodeUnion

The target sub-query.

AlignmentNode

Bases: BCQLNode

A parallel alignment query: source ==>field target [; ==>field target]*. Attributes: source: The source query in the primary field alignments: One or more alignment constraints.

AlignmentOperator

Bases: BCQLNode

The operator in an alignment query: =type=>field or ==>field?.

See https://github.com/instituutnederlandsetaal/BlackLab/blob/dev/site/docs/guide/040_query-language/030_parallel.md

Attributes:

Name Type Description
target_field str

The target field name (e.g. "nl").

optional bool

True when alignment is optional (==>nl?).

relation_type str | None

Optional type filter (e.g. "word" in =word=>nl). None means any alignment relation

capture_name str | None

Override for the capture group name (default "rels"). Set by name:==>field syntax

BCQLNode

Bases: BaseModel, ABC

Abstract base for every node in the BCQL abstract syntax tree.

Sub-classes must override to_bcql and set node_type to a unique Literal string so that discrimination works correctly.

Configuration
  • frozen = True: instances are immutable after creation
  • use_enum_values = True: enum fields store their .value

bcql cached property

bcql: str

Convenience property to get the BCQL string representation of this node.

to_bcql abstractmethod

to_bcql() -> str

Reconstruct a BCQL query string from this AST node.

The returned string is functionally equivalent to the original query but may differ in trivial whitespace and formatting.

AnnotationRef

Bases: BCQLNode

Reference to a captured token's annotation: label.annotation, or a bare capture label.

Examples: - A.word refers to the word annotation of capture A. - A as a bare label (typically used as a function argument, e.g. start(A)).

Attributes:

Name Type Description
label str

Capture group name.

annotation str

Annotation name, or empty string for a bare label reference.

CaptureNode

Bases: BCQLNode

A capture label applied to a sub-query: label:body, e.g. A:[word="hello"].

Everything matched by body is captured under label in the match info.

Attributes:

Name Type Description
label str

The capture group name (e.g. "A").

body BCQLNodeUnion

The sub-query whose match is captured

ConstraintBoolean

Bases: BCQLNode

Boolean combination of capture constraints: left op right.

Operators: & (AND), | (OR), -> (implication). All three share the same precedence per Bcql.g4's booleanOperator rule. The -> implication operator is most commonly seen in capture constraints (e.g. A.word = "cat" -> B.word = "dog") but the grammar allows it at every level.

Attributes:

Name Type Description
operator Literal['&', '|', '->']

"&", "|", or "->".

left CaptureConstraintExpr

Left operand.

right CaptureConstraintExpr

Right operand.

ConstraintComparison

Bases: BCQLNode

A comparison in a capture constraint: left op right.

Supported operators: =, !=, <, <=, >, >=. Operators here do not get their own class; should not be needed here.

Attributes:

Name Type Description
operator Literal['=', '!=', '<', '<=', '>', '>=']

The comparison operator.

left CaptureConstraintExpr

Left-hand operand (usually an AnnotationRef).

right CaptureConstraintExpr

Right-hand operand (annotation ref, literal, or function call).

ConstraintFunctionCall

Bases: BCQLNode

A function call in a capture constraint.

Examples: start(A) or end(B) used in expressions like start(B) < start(A).

Attributes:

Name Type Description
name str

Function name (e.g. "start", "end").

args list[CaptureConstraintExpr]

Function arguments (annotation refs, literals, etc.).

ConstraintInteger

Bases: BCQLNode

An integer literal in a capture constraint.

Example: the 5 in focus.pos > 5.

Attributes:

Name Type Description
value int

The integer value.

ConstraintLiteral

Bases: BCQLNode

A literal string value in a capture constraint.

Example: the "over" in A.word = "over".

Attributes:

Name Type Description
value str

The literal string (without quotes)

quote_char Literal['"', "'"]

The quote character used in the original query, either " or '.

ConstraintNot

Bases: BCQLNode

Logical NOT in a capture constraint

Attributes:

Name Type Description
operand CaptureConstraintExpr

The constraint being negated.

GlobalConstraintNode

Bases: BCQLNode

A query with a global capture constraint.

The constraint expression follows the :: operator and relates captures defined in body.

Example: A:[] "by" B:[] :: A.word = B.word where A:[] "by" B:[] is the body and A.word = B.word is the constraint expression.

Attributes:

Name Type Description
body BCQLNodeUnion

The main query containing captures.

constraint CaptureConstraintExpr

The constraint expression relating captures.

FunctionCallNode

Bases: BCQLNode

A built-in function call at the sequence level.

Function arguments can be sub-queries BCQLNode, tring values StringValue, or integers

Attributes:

Name Type Description
name str

Function name

args list[BCQLNodeUnion | int]

Positional arguments

LookaheadNode

Bases: BCQLNode

A lookahead assertion: (?=...) (positive) or (?!...) (negative).

Matches a position only if the enclosed query matches (or doesn't match) the tokens that follow.

Attributes:

Name Type Description
positive bool

True for (?= ...), False for (?! ...).

body BCQLNodeUnion

The sub-query that must (or must not) match ahead.

LookbehindNode

Bases: BCQLNode

A lookbehind assertion: (?<=...) (positive) or (?<!...) (negative).

Matches a position only if the enclosed query matches (or doesn't match) the tokens that precede

Attributes:

Name Type Description
positive bool

True for (?<=...), False for (?<!...).

body BCQLNodeUnion

The sub-query that must (or must not) match behind

ChildConstraint

Bases: BCQLNode

A single child constraint in a relation query.

Represents [-label:] -type-> target inside a relation expression. Multiple child constraints are separated by ;. The target itself can be any BCQL sub-query, including another relation query (e.g. _ -nsubj-> (_ -amod-> _)).

Attributes:

Name Type Description
operator RelationOperator

The relation operator (type, negation, target field).

target BCQLNodeUnion

The target sub-query.

label str | None

Optional capture label on this child relation (e.g. rel:-obj-> _).

RelationNode

Bases: BCQLNode

A dependency relation query: source -type-> target [; -type-> target]*.

The source is specified once; one or more child constraints follow, separated by ;.

Attributes:

Name Type Description
source BCQLNodeUnion

The source of the relation.

children list[ChildConstraint]

One or more target constraints.

RelationOperator

Bases: BCQLNode

The operator in a relation query: -type-> or !-type->. See https://github.com/instituutnederlandsetaal/BlackLab/blob/dev/site/docs/guide/040_query-language/020_relations.md#negative-child-constraints for details on negative relations.

Attributes:

Name Type Description
relation_type str | None

The relation type as a string or regex pattern (e.g. "obj", "subj|obj"), or None for any type.

negated bool

True for !-type->.

target_field str | None

For cross-field relations (e.g. -->corrected), the target field name. None for same-field relations.

RootRelationNode

Bases: BCQLNode

A root relation query: ^-type-> target or label:^-type-> target.

Usually this relation does not have a "type" (since ROOT is the dependency relation from the root), but some corpora may differ.

TODO: see if the Validator and CorpusSpec should account for "allowed root relations"

Root relations have no source, only a target. They match the root of a dependency tree.

Attributes:

Name Type Description
relation_type str | None

Optional relation type filter (usually None meaning any root).

target BCQLNodeUnion

The target sub-query.

label str | None

Optional capture label.

GroupNode

Bases: BCQLNode

A parenthesized group of sub-queries.

Groups allow applying repetition operators or capture constraints to a complex sub-expression. We specify that there can only be one child node in a group, which typically would be a SequenceNode if there are multiple adjacent tokens or a token-level Node.

Attributes:

Name Type Description
child BCQLNodeUnion

The inner sub-query.

NegationNode

Bases: BCQLNode

Sequence-level negation (!).

Negation sits at the span level in the precedence chain (above repetition), so !"man"+ parses as !("man"+) per Bcql.g4's sequencePartNoCapture rule. The child is always a single span-level node (never a bare sequence), so to_bcql just prepends ! without extra parentheses.

Attributes:

Name Type Description
child BCQLNodeUnion

The sub-query being negated.

RepetitionNode

Bases: BCQLNode

A repetition quantifier applied to a sub-query.

Supports + (1+), * (0+), ? (0 or 1), {n}, {n,m}, {n,}. Note that "up to" quantifiers like {0,m} are exported as {,m} and may therefore be different in surface form from the original.

Attributes:

Name Type Description
child BCQLNodeUnion

The sub-query being repeated.

min_count int

Minimum number of repetitions (inclusive, min. 0).

max_count int | None

Maximum number of repetitions (inclusive), or None for unlimited.

SequenceBoolNode

Bases: BCQLNode

Sequence-level boolean combination (&, |, ->).

Binary, left-associative node mirroring the booleanOperator rule in Bcql.g4: all three operators share the same precedence. For example, "a" | "b" & "c" parses as ("a" | "b") & "c".

Attributes:

Name Type Description
operator Literal['&', '|', '->']

The boolean operator.

left BCQLNodeUnion

The left operand.

right BCQLNodeUnion

The right operand.

SequenceNode

Bases: BCQLNode

An ordered sequence of adjacent tokens / sub-queries. A very high-level node type that can represent an entire query or a sub-sequence

Attributes:

Name Type Description
children list[BCQLNodeUnion]

The ordered list of child nodes in the sequence.

UnderscoreNode

Bases: BCQLNode

The _ wildcard used in relation queries.

Distinct from [] (match-all token): _ means "any source or target" in a relation expression without constraining token count.

PositionFilterNode

Bases: BCQLNode

A position-filter operator: within, containing, or overlap.

Example: "baker" within <person/> means find "baker" inside a <person/> span.

These operators are right-associative, so A within B within C is parsed as A within (B within C).

Attributes:

Name Type Description
operator Literal['within', 'containing', 'overlap']

One of "within", "containing", "overlap".

left BCQLNodeUnion

The query whose hits are filtered.

right BCQLNodeUnion

The span/query that defines the positional constraint.

SpanQuery

Bases: BCQLNode

A span (XML tag) query.

Three forms exist per Bcql.g4's tag rule: - Whole span: <s/> or <ne type="PERS"/> - Start tag: <s> - End tag: </s>

The tag name can be a plain identifier (s, ne) or a quoted string for regex patterns (<"person|location"/>).

Attributes:

Name Type Description
tag_name str | StringValue

The tag name as a plain string or StringValue for regex.

position Literal['whole', 'start', 'end']

"whole" for <s/>, "start" for <s>, "end" for </s>.

attributes dict[str, StringValue]

XML attributes as name: StringValue pairs (e.g. type="PERS").

AnnotationConstraint

Bases: BCQLNode

A single annotation comparison: annotation op "value". Typically between an identifier, an operator, and a string value. Note that the identifier is not semantically specified here! It fully depends on the corpus which attributes (like word, lemma, pos) are available. So here annotation is underspecified as just a string.

Example: word="man" or pos != "noun".

Attributes:

Name Type Description
annotation str

The annotation name (e.g. "word", "lemma").

operator Literal['=', '!=', '<', '<=', '>', '>=']

"=" or "!=".

value StringValue

The value being compared against.

BoolConstraint

Bases: BCQLNode

Boolean combination of token-level constraints: left op right.

The operator is & (AND), | (OR), or -> (implication). Per the BCQL spec / Bcql.g4, all three share identical precedence and are left-associative. See the booleanOperator rule in Bcql.g4. Naming-wise calling it "boolean" might be somewhat confusing for the implication case though

Not to be confused with sequence-level boolean operators (also &, |, and ->) which combine whole sub-queries instead of token constraints. See sequence.SequenceBoolNode for those.

Attributes:

Name Type Description
operator Literal['&', '|', '->']

"&", "|", or "->".

left ConstraintExpr

Left operand.

right ConstraintExpr

Right operand.

FunctionConstraint

Bases: BCQLNode

A function-call constraint inside token brackets.

TODO: check for predefined functions in blacklab?

Attributes:

Name Type Description
name str

The function / pseudo-annotation name.

args list[StringValue]

The string arguments to the function.

IntegerRangeConstraint

Bases: BCQLNode

An integer range constraint, such as a parser's confidence: annotation=in[min,max].

Example: pos_confidence=in[50,100].

Note that we require both min and max vals to be given. No implicit "infinite" or "zero" bounds.

Attributes:

Name Type Description
annotation str

The annotation name.

min_val int

Inclusive lower bound.

max_val int

Inclusive upper bound.

NotConstraint

Bases: BCQLNode

Logical NOT on a token-level constraint: !expr.

Typically for a capture group: !(pos="noun" | pos="verb").

Attributes:

Name Type Description
operand ConstraintExpr

The constraint being negated.

StringValue

Bases: BCQLNode

A quoted string value inside a BCQL query.

Handles regular strings, literal strings (prefixed with l), and sensitivity flags ((?-i) for sensitive, (?i) for insensitive).

Attributes:

Name Type Description
value str

The raw string content (without surrounding quotes).

is_literal bool

True when prefixed with l (e.g. l"e.g.").

sensitivity Literal['default', 'sensitive', 'insensitive']

"default" follows the default value (unspecified), "sensitive" for (?-i), "insensitive" for (?i).

Example::

StringValue(value="(?-i)Panama").to_bcql()
# '"(?-i)Panama"'

TokenQuery

Bases: BCQLNode

A single token query: [...], "string" shorthand, or [].

Attributes:

Name Type Description
constraint ConstraintExpr | None

The constraint expression inside the brackets, or None for match-all ([]).

negated bool

True for the negated form ![...].

shorthand StringValue | None

When the query was written as a bare string like "man" (shorthand for [word="man"]), this stores the StringValue]. If set, constraint is None.