Skip to content

OpenAI Client

llm_annotator.clients.openai_client

OpenAI provider implementation.

OpenAIRuntimeOptions dataclass

OpenAIRuntimeOptions(
    max_tokens: int | None = None,
    json_schema: dict[str, Any] | None = None,
    frequency_penalty: float | None = None,
    reasoning_effort: Literal[
        "none", "minimal", "low", "medium", "high", "xhigh"
    ]
    | None = None,
    temperature: float | None = None,
    top_p: float | None = None,
    presence_penalty: float | None = None,
)

Bases: ProviderRuntimeOptions

frequency_penalty class-attribute instance-attribute

frequency_penalty: float | None = None

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

reasoning_effort class-attribute instance-attribute

reasoning_effort: (
    Literal[
        "none", "minimal", "low", "medium", "high", "xhigh"
    ]
    | None
) = None

Only for supported reasoning models. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.

temperature class-attribute instance-attribute

temperature: float | None = None

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

top_p class-attribute instance-attribute

top_p: float | None = None

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

presence_penalty class-attribute instance-attribute

presence_penalty: float | None = None

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

OpenAIClient

OpenAIClient(
    model: str,
    max_workers: int = 4,
    base_url: str | None = None,
    api_key: str | None = None,
    on_error: OnError = "warn",
)

Bases: Client[T_OpenAIOptions]

Client wrapper for OpenAI APIs.

Initialize the OpenAI client.

Parameters:

Name Type Description Default
model str

OpenAI model identifier.

required
max_workers int

Maximum number of concurrent worker threads for batch_generate. Lower this value if you are getting rate limited.

4
base_url str | None

Base URL for the OpenAI API endpoint.

None
api_key str | None

OpenAI API key. If omitted, the SDK will use OPENAI_API_KEY from the environment.

None
on_error OnError

Error behavior when generation fails. Valid options are: - "raise": raise a :class:ProviderError (default). - "ignore": return a :class:Response with error set. - "warn": log a warning and return an error :class:Response.

'warn'

destroy

destroy() -> None

generate

generate(
    *,
    messages: list[dict[str, str]],
    options: T_OpenAIOptions | None = None,
    gen_kwargs: dict[str, Any] | None = None,
) -> Response

Generate a response using OpenAI.

Parameters:

Name Type Description Default
messages list[dict[str, str]]

List of message dictionaries.

required
options T_OpenAIOptions | None

Optional generation configuration.

None
gen_kwargs dict[str, Any] | None

Additional provider-specific generation kwargs that are not covered by the standard options. Has precedence over options.

None

Returns:

Type Description
Response

A Response object containing the generated response.

Raises:

Type Description
ProviderError

If the provider call fails.

batch_generate

batch_generate(
    *,
    messages: list[list[dict[str, str]]],
    options: T_OpenAIOptions | None = None,
    gen_kwargs: dict[str, Any] | None = None,
    use_batch_api: bool = False,
    poll_interval: float = 10.0,
) -> list[Response]

Generate responses for a batch of inputs.

By default, requests are dispatched in parallel using a thread pool. When use_batch_api=True, the OpenAI Batch API is used instead: all requests are submitted as a single batch job and results are retrieved once the job completes. The Batch API supports a completion window of up to 24 hours and offers lower cost, but adds latency.

Parameters:

Name Type Description Default
messages list[list[dict[str, str]]]

List of message lists, one per request.

required
options T_OpenAIOptions | None

Optional generation configuration.

None
gen_kwargs dict[str, Any] | None

Additional provider-specific generation kwargs that are not covered by the standard options. Has precedence over options.

None
use_batch_api bool

When True, use the OpenAI Batch API instead of concurrent individual requests. Defaults to False.

False
poll_interval float

Seconds between batch status polls. Only used when use_batch_api=True. Defaults to 10.0.

10.0

Returns:

Type Description
list[Response]

A list of Response objects in the same order as the input.

Raises:

Type Description
ProviderError

If any individual request fails.