Base Client¶
llm_annotator.clients.base
¶
Abstract interface for LLM provider clients.
ProviderRuntimeOptions
dataclass
¶
Shared generation options for provider calls; can be subclassed and extended.
Attributes:
| Name | Type | Description |
|---|---|---|
max_tokens |
int | None
|
Optional maximum output token count. |
json_schema |
dict[str, Any] | None
|
Optional JSON schema dict for structured output. When provided, clients that support guided decoding (e.g. vLLM) will constrain generation to valid JSON matching the schema. Other clients will use the schema for post-processing / parsing only. |
to_payload
¶
Convert options to a provider-specific API request payload dict.
Subclasses override this to build the exact kwargs expected by their SDK. The default implementation returns an empty dict.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
A dict of provider-specific request parameters. |
View source on GitHub: src/llm_annotator/clients/base.py lines 41–50
Response
dataclass
¶
Response(
text: str,
stop_reason: str | None = None,
model: str | None = None,
provider: Provider | None = None,
num_output_tokens: int | None = None,
full_response: object | None = None,
error: str | None = None,
error_type: str | None = None,
)
Structured response object returned by provider clients.
Client
¶
Bases: ABC, Generic[T_Options]
Base client interface used by all provider adapters.
Initialize a provider client.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str
|
Provider-specific model name. |
required |
max_workers
|
int
|
Maximum number of concurrent worker threads for |
4
|
on_error
|
OnError
|
Error behavior for provider failures.
- |
'warn'
|
View source on GitHub: src/llm_annotator/clients/base.py lines 72–96
__enter__
¶
Enter the context manager, returning this client instance.
View source on GitHub: src/llm_annotator/clients/base.py lines 160–162
__exit__
¶
Exit the context manager cleanup.
View source on GitHub: src/llm_annotator/clients/base.py lines 164–166
generate
abstractmethod
¶
generate(
*,
messages: list[dict[str, str]],
options: T_Options | None = None,
gen_kwargs: dict[str, Any] | None = None,
) -> Response
Generate a response from the provider.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[dict[str, str]]
|
List of message dicts with "role" and "content" keys. |
required |
options
|
T_Options | None
|
Provider-specific generation options. NOTE: using this over gen_kwargs is preferred and implemented to facilitate sub-classing and satisfying typing and code-hinting. |
None
|
gen_kwargs
|
dict[str, Any] | None
|
Additional provider-specific generation kwargs that are not covered by the standard options.
Has precedence over |
None
|
Returns:
| Type | Description |
|---|---|
Response
|
A Response object containing the generated response. |
View source on GitHub: src/llm_annotator/clients/base.py lines 175–198
batch_generate
¶
batch_generate(
*,
messages: list[list[dict[str, str]]],
options: T_Options | None = None,
gen_kwargs: dict[str, Any] | None = None,
) -> list[Response]
Generate responses for a batch of inputs.
The default implementation calls :meth:generate sequentially. Override
this method in subclasses that support native batching (e.g. vLLM offline
and vLLM server) for better throughput.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
messages
|
list[list[dict[str, str]]]
|
List of message lists, where each message dict has "role" and "content" keys. |
required |
options
|
T_Options | None
|
Provider-specific generation options. |
None
|
gen_kwargs
|
dict[str, Any] | None
|
Additional provider-specific generation kwargs that are not covered by the standard options.
Has precedence over |
None
|
Returns:
| Type | Description |
|---|---|
list[Response]
|
A list of Response objects containing the generated responses. |
View source on GitHub: src/llm_annotator/clients/base.py lines 200–229
warm_up
¶
warm_up(
*,
system_message: str | None = None,
prompt_prefix: str | None = None,
options: T_Options | None = None,
) -> None
Prime the client before the main workload (no-op by default).
Override in clients that benefit from a warm-up pass (e.g.
:class:~llm_annotator.clients.VLLMOfflineClient uses this to prime
the KV-cache with a shared prefix before the first real batch).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
system_message
|
str | None
|
Optional system message shared across all requests. |
None
|
prompt_prefix
|
str | None
|
Optional fixed prefix that starts every user turn. |
None
|
options
|
T_Options | None
|
Optional generation options used to derive the warm-up params. |
None
|
View source on GitHub: src/llm_annotator/clients/base.py lines 231–248
destroy
¶
Clean up any resources used by the client.
View source on GitHub: src/llm_annotator/clients/base.py lines 250–251