Skip to content

BCQL syntax cheatsheet

A comprehensive reference for BlackLab's flavour of the Corpus Query Language (BCQL).

Schema-dependent attributes

This cheatsheet uses common attribute names (word, lemma, pos) and tag values ("noun", "ADJ") for readability. The exact attributes and values you can query depend entirely on how your corpus was annotated. Your corpus might use [pos="NOU-C"] or [tag="NN"] instead.


1. Basic token matching & attributes

BlackLab defaults to case- and diacritics-insensitive search.

Query Meaning
"man" or [word="man"] Finds all occurrences of the word form man
[lemma="search"] Finds all forms of the lemma search (e.g. search, searches)
[lemma="run" & pos="noun"] AND operator: run only when tagged as a noun
[pos != "noun"] Negation: all tokens except nouns
[] Match-all pattern: matches exactly one of any token
"(?-i)Apple" Forces case/diacritics-sensitive matching
"e\.g\." or l"e.g." Literal string: backslash escaping or l prefix

2. Regular expressions within tokens

Query Meaning
"man\|woman" Matches man or woman
[lemma="under.*"] .*: any character 0 or more times
[word="a?n"] ?: preceding character is optional

3. Sequences, gaps, and repetition

Query Meaning
"the" "tall" "tree" Exact phrase search
"an?\|the" [pos="ADJ"] "man" Article + exactly one adjective + man
"make" [] "big" Gap of exactly one token
[pos="ADJ"]+ One or more adjectives
[pos="ADJ"]* Zero or more adjectives
[pos="ADJ"]? Optional adjective
[]{2,5} Gap of 2–5 arbitrary tokens
("the"? [pos="noun"])+ Sequence of nouns, each optionally preceded by the

4. Sequence-level logic & filtering

Query Meaning
"happy" "dog" \| "sad" "cat" OR at sequence level
("double" [] & [] "trouble") AND at sequence level: intersection (yields double trouble)

5. Context, lookarounds & punctuation

Query Meaning
"cat" (?= "in" "the" "hat") Positive lookahead
(?<= "very" "good") "dog" Positive lookbehind
"cat" (?! "call") Negative lookahead
(?<! "bad") "dog" Negative lookbehind
[word="dog" & punctAfter=","] dog immediately followed by a comma (pseudo-annotation)
meet("cat", "fluffy", 5) cat within 5 tokens of fluffy

6. XML elements and spans

Query Meaning
<s/> Whole sentence spans
<s> / </s> Start / end position of a span
"baker" within <person/> baker inside a <person> span
<person/> containing "baker" Entire <person> span containing baker
<"person\|location"/> Span matching a regex on the tag name
([pos="ADJ"]+ containing "tall") "man" Adjective sequence containing tall, followed by man

7. Captures and global constraints

Query Meaning
A:[pos="ADJ"] Capture the matched adjective as group A
A:[] "by" B:[] :: A.word = B.word Global constraint: A and B must be the same word
<s/> containing A:[] []* B:[] :: A.word = "fluffy" & B.word = "cat" Both words occur in the sentence in that order

8. Relations querying

Supported from BlackLab v4.0

Query Meaning
_ -obj-> _ Any object relation
_ -obj-> "cat" Object relation where target is cat
_ -subj-> _ ; -obj-> _ Same source has both a subject and an object
_ !-obj-> "dog" Source has no object relation with target dog
^--> "have" Root relation pointing to have
(_ -amod-> "fluffy") -subj-> _ Multi-level relation chain
rspan(_ -amod-> _, "full") Full span covering source and target of amod
rcapture(<s/>) Capture all relations within the matched sentence

9. Parallel corpora querying

Supported from BlackLab v4.0

Query Meaning
"cat" ==>nl _ English cat with its Dutch alignment
"cat" ==>nl? _ Same, but alignment is optional
"fluffy" ==>nl "pluizig" English fluffy aligned to Dutch pluizig
w1:"cat" ==>nl w2:_ Capture source as w1, target as w2
rfield("cat" ==>nl _, "nl") Return only hits from the Dutch field