Skip to content

Base Client

llm_annotator.clients.base

Abstract interface for LLM provider clients.

ProviderRuntimeOptions dataclass

ProviderRuntimeOptions(
    max_tokens: int | None = None,
    json_schema: dict[str, Any] | None = None,
)

Shared generation options for provider calls; can be subclassed and extended.

Attributes:

Name Type Description
max_tokens int | None

Optional maximum output token count.

json_schema dict[str, Any] | None

Optional JSON schema dict for structured output. When provided, clients that support guided decoding (e.g. vLLM) will constrain generation to valid JSON matching the schema. Other clients will use the schema for post-processing / parsing only.

to_payload

to_payload() -> dict[str, Any]

Convert options to a provider-specific API request payload dict.

Subclasses override this to build the exact kwargs expected by their SDK. The default implementation returns an empty dict.

Returns:

Type Description
dict[str, Any]

A dict of provider-specific request parameters.

Response dataclass

Response(
    text: str,
    stop_reason: str | None = None,
    model: str | None = None,
    provider: Provider | None = None,
    num_output_tokens: int | None = None,
    full_response: object | None = None,
    error: str | None = None,
    error_type: str | None = None,
)

Structured response object returned by provider clients.

Client

Client(
    model: str,
    max_workers: int = 4,
    on_error: OnError = "warn",
)

Bases: ABC, Generic[T_Options]

Base client interface used by all provider adapters.

Initialize a provider client.

Parameters:

Name Type Description Default
model str

Provider-specific model name.

required
max_workers int

Maximum number of concurrent worker threads for batch_generate. Clients that support native batching may ignore this parameter.

4
on_error OnError

Error behavior for provider failures. - "raise": raise a :class:ProviderError (default). - "ignore": return a :class:Response with error set. - "warn": log a warning and return an error :class:Response.

'warn'

__enter__

__enter__() -> Self

Enter the context manager, returning this client instance.

__exit__

__exit__(exc_type: Any, exc: Any, tb: Any) -> None

generate abstractmethod

generate(
    *,
    messages: list[dict[str, str]],
    options: T_Options | None = None,
    gen_kwargs: dict[str, Any] | None = None,
) -> Response

Generate a response from the provider.

Parameters:

Name Type Description Default
messages list[dict[str, str]]

List of message dicts with "role" and "content" keys.

required
options T_Options | None

Provider-specific generation options. NOTE: using this over gen_kwargs is preferred and implemented to facilitate sub-classing and satisfying typing and code-hinting.

None
gen_kwargs dict[str, Any] | None

Additional provider-specific generation kwargs that are not covered by the standard options. Has precedence over options.

None

Returns:

Type Description
Response

A Response object containing the generated response.

batch_generate

batch_generate(
    *,
    messages: list[list[dict[str, str]]],
    options: T_Options | None = None,
    gen_kwargs: dict[str, Any] | None = None,
) -> list[Response]

Generate responses for a batch of inputs.

The default implementation calls :meth:generate sequentially. Override this method in subclasses that support native batching (e.g. vLLM offline and vLLM server) for better throughput.

Parameters:

Name Type Description Default
messages list[list[dict[str, str]]]

List of message lists, where each message dict has "role" and "content" keys.

required
options T_Options | None

Provider-specific generation options.

None
gen_kwargs dict[str, Any] | None

Additional provider-specific generation kwargs that are not covered by the standard options. Has precedence over options.

None

Returns:

Type Description
list[Response]

A list of Response objects containing the generated responses.

warm_up

warm_up(
    *,
    system_message: str | None = None,
    prompt_prefix: str | None = None,
    options: T_Options | None = None,
) -> None

Prime the client before the main workload (no-op by default).

Override in clients that benefit from a warm-up pass (e.g. :class:~llm_annotator.clients.VLLMOfflineClient uses this to prime the KV-cache with a shared prefix before the first real batch).

Parameters:

Name Type Description Default
system_message str | None

Optional system message shared across all requests.

None
prompt_prefix str | None

Optional fixed prefix that starts every user turn.

None
options T_Options | None

Optional generation options used to derive the warm-up params.

None

destroy

destroy() -> None