Adds one or more columns to .data
that are produced by a Large-Language-Model.
Usage
llm_mutate(
.data,
output,
prompt = NULL,
.messages = NULL,
.config,
.system_prompt = NULL,
.before = NULL,
.after = NULL,
.return = c("columns", "text", "object"),
.structured = FALSE,
.schema = NULL,
.fields = NULL,
...
)
Arguments
- .data
A data.frame / tibble.
- output
Unquoted name that becomes the new column (generative) or the prefix for embedding columns.
- prompt
Optional glue template string for a single user turn; reference any columns in
.data
(e.g."{id}. {question}\nContext: {context}"
). Ignored if.messages
is supplied.- .messages
Optional named character vector of glue templates to build a multi-turn message, using roles in
c("system","user","assistant","file")
. Values are glue templates evaluated per-row; all can reference multiple columns. For multimodal, use role"file"
with a column containing a path template.- .config
An llm_config object (generative or embedding).
- .system_prompt
Optional system message sent with every request when
.messages
does not include asystem
entry.- .before, .after
Standard dplyr::relocate helpers controlling where the generated column(s) are placed.
- .return
One of
c("columns","text","object")
. For generative mode, controls how results are added."columns"
(default) adds text plus diagnostic columns;"text"
adds a single text column;"object"
adds a list-column ofllmr_response
objects.- .structured
Logical. If
TRUE
, enables structured JSON output with automatic parsing. Requires.schema
to be provided. When enabled, this is equivalent to callingllm_mutate_structured()
. Default isFALSE
.- .schema
Optional JSON Schema (R list). When
.structured = TRUE
, this schema is sent to the provider for validation and used for local parsing. WhenNULL
, only JSON mode is enabled (no strict schema validation).- .fields
Optional character vector of fields to extract from parsed JSON. Supports nested paths (e.g.,
"user.name"
or"/data/items/0"
). WhenNULL
and.schema
is provided, auto-extracts all top-level schema properties. Set toFALSE
to skip field extraction entirely.- ...
Passed to the underlying calls:
call_llm_broadcast()
in generative mode,get_batched_embeddings()
in embedding mode.
Details
Multi-column injection: templating is NA-safe (
NA
-> empty string).Multi-turn templating: supply
.messages = c(system=..., user=..., file=...)
. Duplicate role names are allowed (e.g., twouser
turns).Generative mode: one request per row via
call_llm_broadcast()
. Parallel execution follows the active future plan; seesetup_llm_parallel()
.Embedding mode: the per-row text is embedded via
get_batched_embeddings()
. Result expands to numeric columns namedpaste0(<output>, 1:N)
. If all rows fail to embed, a single<output>1
column ofNA
is returned.Diagnostic columns use suffixes:
_finish
,_sent
,_rec
,_tot
,_reason
,_ok
,_err
,_id
,_status
,_ecode
,_param
,_t
.
Shorthand
You can supply the output column and prompt in one argument:
df |> llm_mutate(answer = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer = c(system = "One word.", user = "{question}"), .config = cfg)
This is equivalent to:
df |> llm_mutate(answer, prompt = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer, .messages = c(system = "One word.", user = "{question}"), .config = cfg)
Examples
if (FALSE) { # \dontrun{
library(dplyr)
df <- tibble::tibble(
id = 1:2,
question = c("Capital of France?", "Author of 1984?"),
hint = c("European city", "English novelist")
)
cfg <- llm_config("openai", "gpt-4o-mini",
temperature = 0)
# Generative: single-turn with multi-column injection
df |>
llm_mutate(
answer,
prompt = "{question} (hint: {hint})",
.config = cfg,
.system_prompt = "Respond in one word."
)
# Generative: multi-turn via .messages (system + user)
df |>
llm_mutate(
advice,
.messages = c(
system = "You are a helpful zoologist. Keep answers short.",
user = "What is a key fact about this? {question} (hint: {hint})"
),
.config = cfg
)
# Multimodal: include an image path with role 'file'
pics <- tibble::tibble(
img = c("inst/extdata/cat.png", "inst/extdata/dog.jpg"),
prompt = c("Describe the image.", "Describe the image.")
)
pics |>
llm_mutate(
vision_desc,
.messages = c(user = "{prompt}", file = "{img}"),
.config = llm_config("openai","gpt-4.1-mini")
)
# Embeddings: output name becomes the prefix of embedding columns
emb_cfg <- llm_config("voyage", "voyage-3.5-lite",
embedding = TRUE)
df |>
llm_mutate(
vec,
prompt = "{question}",
.config = emb_cfg,
.after = id
)
# Structured output: using .structured = TRUE (equivalent to llm_mutate_structured)
schema <- list(
type = "object",
properties = list(
answer = list(type = "string"),
confidence = list(type = "number")
),
required = list("answer", "confidence")
)
df |>
llm_mutate(
result,
prompt = "{question}",
.config = cfg,
.structured = TRUE,
.schema = schema
)
# Structured with shorthand
df |>
llm_mutate(
result = "{question}",
.config = cfg,
.structured = TRUE,
.schema = schema
)
} # }