BCQL syntax cheatsheet¶

A comprehensive reference for BlackLab's flavour of the Corpus Query Language (BCQL).

Schema-dependent attributes

This cheatsheet uses common attribute names (word, lemma, pos) and tag values ("noun", "ADJ") for readability. The exact attributes and values you can query depend entirely on how your corpus was annotated. Your corpus might use [pos="NOU-C"] or [tag="NN"] instead.

1. Basic token matching & attributes¶

BlackLab defaults to case- and diacritics-insensitive search.

Query	Meaning
`"man"` or `[word="man"]`	Finds all occurrences of the word form man
`[lemma="search"]`	Finds all forms of the lemma search (e.g. search, searches)
`[lemma="run" & pos="noun"]`	AND operator: run only when tagged as a noun
`[pos != "noun"]`	Negation: all tokens except nouns
`[]`	Match-all pattern: matches exactly one of any token
`"(?-i)Apple"`	Forces case/diacritics-sensitive matching
`"e\.g\."` or `l"e.g."`	Literal string: backslash escaping or `l` prefix

2. Regular expressions within tokens¶

Query	Meaning
`"man\\|woman"`	Matches man or woman
`[lemma="under.*"]`	`.*`: any character 0 or more times
`[word="a?n"]`	`?`: preceding character is optional

3. Sequences, gaps, and repetition¶

Query	Meaning
`"the" "tall" "tree"`	Exact phrase search
`"an?\\|the" [pos="ADJ"] "man"`	Article + exactly one adjective + man
`"make" [] "big"`	Gap of exactly one token
`[pos="ADJ"]+`	One or more adjectives
`[pos="ADJ"]*`	Zero or more adjectives
`[pos="ADJ"]?`	Optional adjective
`[]{2,5}`	Gap of 2–5 arbitrary tokens
`("the"? [pos="noun"])+`	Sequence of nouns, each optionally preceded by the

4. Sequence-level logic & filtering¶

Query	Meaning
`"happy" "dog" \\| "sad" "cat"`	OR at sequence level
`("double" [] & [] "trouble")`	AND at sequence level: intersection (yields double trouble)

5. Context, lookarounds & punctuation¶

Query	Meaning
`"cat" (?= "in" "the" "hat")`	Positive lookahead
`(?<= "very" "good") "dog"`	Positive lookbehind
`"cat" (?! "call")`	Negative lookahead
`(?<! "bad") "dog"`	Negative lookbehind
`[word="dog" & punctAfter=","]`	dog immediately followed by a comma (pseudo-annotation)
`meet("cat", "fluffy", 5)`	cat within 5 tokens of fluffy

6. XML elements and spans¶

Query	Meaning
`<s/>`	Whole sentence spans
`<s>` / `</s>`	Start / end position of a span
`"baker" within <person/>`	baker inside a `<person>` span
`<person/> containing "baker"`	Entire `<person>` span containing baker
`<"person\\|location"/>`	Span matching a regex on the tag name
`([pos="ADJ"]+ containing "tall") "man"`	Adjective sequence containing tall, followed by man

7. Captures and global constraints¶

Query	Meaning
`A:[pos="ADJ"]`	Capture the matched adjective as group A
`A:[] "by" B:[] :: A.word = B.word`	Global constraint: A and B must be the same word
`<s/> containing A:[] []* B:[] :: A.word = "fluffy" & B.word = "cat"`	Both words occur in the sentence in that order

8. Relations querying¶

Supported from BlackLab v4.0

Query	Meaning
`_ -obj-> _`	Any object relation
`_ -obj-> "cat"`	Object relation where target is cat
`_ -subj-> _ ; -obj-> _`	Same source has both a subject and an object
`_ !-obj-> "dog"`	Source has no object relation with target dog
`^--> "have"`	Root relation pointing to have
`(_ -amod-> "fluffy") -subj-> _`	Multi-level relation chain
`rspan(_ -amod-> _, "full")`	Full span covering source and target of `amod`
`rcapture(<s/>)`	Capture all relations within the matched sentence

9. Parallel corpora querying¶

Supported from BlackLab v4.0

Query	Meaning
`"cat" ==>nl _`	English cat with its Dutch alignment
`"cat" ==>nl? _`	Same, but alignment is optional
`"fluffy" ==>nl "pluizig"`	English fluffy aligned to Dutch pluizig
`w1:"cat" ==>nl w2:_`	Capture source as `w1`, target as `w2`
`rfield("cat" ==>nl _, "nl")`	Return only hits from the Dutch field