Adds one or more columns to .data that are produced by a Large-Language-Model.
Usage
llm_mutate(
.data,
output,
prompt = NULL,
.messages = NULL,
.config,
.system_prompt = NULL,
.before = NULL,
.after = NULL,
.return = c("columns", "text", "object"),
.structured = FALSE,
.schema = NULL,
.fields = NULL,
...
)Arguments
- .data
A data.frame / tibble.
- output
Unquoted name that becomes the new column (generative) or the prefix for embedding columns.
- prompt
Optional glue template string for a single user turn; reference any columns in
.data(e.g."{id}. {question}\nContext: {context}"). Ignored if.messagesis supplied.- .messages
Optional named character vector of glue templates to build a multi-turn message, using roles in
c("system","user","assistant","file"). Values are glue templates evaluated per-row; all can reference multiple columns. For multimodal, use role"file"with a column containing a path template.- .config
An llm_config object (generative or embedding).
- .system_prompt
Optional system message sent with every request when
.messagesdoes not include asystementry.- .before, .after
Standard dplyr::relocate helpers controlling where the generated column(s) are placed.
- .return
One of
c("columns","text","object"). For generative mode, controls how results are added."columns"(default) adds text plus diagnostic columns;"text"adds a single text column;"object"adds a list-column ofllmr_responseobjects.- .structured
Logical. If
TRUE, enables structured JSON output with automatic parsing. Requires.schemato be provided. When enabled, this is equivalent to callingllm_mutate_structured(). Default isFALSE.- .schema
Optional JSON Schema (R list). When
.structured = TRUE, this schema is sent to the provider for validation and used for local parsing. WhenNULL, only JSON mode is enabled (no strict schema validation).- .fields
Optional character vector of fields to extract from parsed JSON. Supports nested paths (e.g.,
"user.name"or"/data/items/0"). WhenNULLand.schemais provided, auto-extracts all top-level schema properties. Set toFALSEto skip field extraction entirely.- ...
Passed to the underlying calls:
call_llm_broadcast()in generative mode,get_batched_embeddings()in embedding mode.
Details
Multi-column injection: templating is NA-safe (
NA-> empty string).Multi-turn templating: supply
.messages = c(system=..., user=..., file=...). Duplicate role names are allowed (e.g., twouserturns).Generative mode: one request per row via
call_llm_broadcast(). Parallel execution follows the active future plan; seesetup_llm_parallel().Embedding mode: the per-row text is embedded via
get_batched_embeddings(). Result expands to numeric columns namedpaste0(<output>, 1:N). If all rows fail to embed, a single<output>1column ofNAis returned.Diagnostic columns use suffixes:
_finish,_sent,_rec,_tot,_reason,_ok,_err,_id,_status,_ecode,_param,_t.
Shorthand
You can supply the output column and prompt in one argument:
df |> llm_mutate(answer = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer = c(system = "One word.", user = "{question}"), .config = cfg)This is equivalent to:
df |> llm_mutate(answer, prompt = "{question} (hint: {hint})", .config = cfg)
df |> llm_mutate(answer, .messages = c(system = "One word.", user = "{question}"), .config = cfg)Examples
if (FALSE) { # \dontrun{
library(dplyr)
df <- tibble::tibble(
id = 1:2,
question = c("Capital of France?", "Author of 1984?"),
hint = c("European city", "English novelist")
)
cfg <- llm_config("openai", "gpt-4o-mini",
temperature = 0)
# Generative: single-turn with multi-column injection
df |>
llm_mutate(
answer,
prompt = "{question} (hint: {hint})",
.config = cfg,
.system_prompt = "Respond in one word."
)
# Generative: multi-turn via .messages (system + user)
df |>
llm_mutate(
advice,
.messages = c(
system = "You are a helpful zoologist. Keep answers short.",
user = "What is a key fact about this? {question} (hint: {hint})"
),
.config = cfg
)
# Multimodal: include an image path with role 'file'
pics <- tibble::tibble(
img = c("inst/extdata/cat.png", "inst/extdata/dog.jpg"),
prompt = c("Describe the image.", "Describe the image.")
)
pics |>
llm_mutate(
vision_desc,
.messages = c(user = "{prompt}", file = "{img}"),
.config = llm_config("openai","gpt-4.1-mini")
)
# Embeddings: output name becomes the prefix of embedding columns
emb_cfg <- llm_config("voyage", "voyage-3.5-lite",
embedding = TRUE)
df |>
llm_mutate(
vec,
prompt = "{question}",
.config = emb_cfg,
.after = id
)
# Structured output: using .structured = TRUE (equivalent to llm_mutate_structured)
schema <- list(
type = "object",
properties = list(
answer = list(type = "string"),
confidence = list(type = "number")
),
required = list("answer", "confidence")
)
df |>
llm_mutate(
result,
prompt = "{question}",
.config = cfg,
.structured = TRUE,
.schema = schema
)
# Structured with shorthand
df |>
llm_mutate(
result = "{question}",
.config = cfg,
.structured = TRUE,
.schema = schema
)
} # }