afnio.autodiff.evaluator#

Classes

DeterministicEvaluator(*args, **kwargs)

Evaluates predictions deterministically using a user-defined evaluation function within the afnio framework, supporting automatic differentiation.

ExactMatchEvaluator(*args, **kwargs)

Evaluates predictions using exact matching within the afnio framework, supporting automatic differentiation.

LMJudgeEvaluator(*args, **kwargs)

Implements an evaluation of a model prediction using a language model (LM) as the judge within the afnio framework, supporting automatic differentiation.

class afnio.autodiff.evaluator.DeterministicEvaluator(*args, **kwargs)[source]#

Bases: Function

Evaluates predictions deterministically using a user-defined evaluation function within the afnio framework, supporting automatic differentiation.

This class inherits from Function and requires both the forward and backward methods to be defined.

The DeterministicEvaluator function computes a score and an explanation based on the prediction and target inputs using a user-defined evaluation function (eval_fn). The evaluation function’s purpose is described by eval_fn_purpose. Outputs include a numerical or textual score and a textual explanation, both wrapped as Variable objects.

The prediction is a Variable. The target can be a string, a list of strings, or a Variable. Each Variable passed as an input argument can have either a scalar or a list .data field, supporting both individual samples and batch processing. For batch processing, the lengths of prediction and target must match.

The success_fn parameter is a user-defined function that returns True when all predictions evaluated by eval_fn are considered successful, and False otherwise. If success_fn returns True, the backward pass will skip gradient calculations and directly return an empty gradient, optimizing computational time.

The reduction_fn parameter specifies the aggregation function to use for scores across a batch of predictions and targets. When specified, the reduction function’s purpose is described using reduction_fn_purpose. If aggregation is not desired, set reduction_fn and reduction_fn_purpose to None.

Example with scalar inputs:

>>> prediction = Variable(
...     data="green",
...     role="color prediction",
...     requires_grad=True
... )
>>> target = "red"
>>> def exact_match_fn(p: str, t: str) -> int:
...     return 1 if p == t else 0
>>> score, explanation = DeterministicEvaluator.apply(
...     prediction,
...     target,
...     exact_match_fn,
...     "exact match",
... )
>>> score.data
0
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> field of the predicted variable ('green') with the <DATA> field of the target variable ('red'), resulting in a score: 0.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'

Example with batched inputs:

>>> prediction = Variable(
...     data=["green", "blue"],
...     role="color prediction",
...     requires_grad=True
... )
>>> target = ["red", "blue"]
>>> def exact_match_fn(p: str, t: str) -> int:
...     return 1 if p == t else 0
>>> score, explanation = DeterministicEvaluator.apply(
...     prediction,
...     target,
...     exact_match_fn,
...     "exact match",
...     reduction_fn=sum,
...     reduction_fn_purpose="summation"
... )
>>> score.data
1
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'
classmethod apply(*args, **kwargs)#

Applies the forward function of the custom Function class.

This method handles cases where setup_context is defined to set up the ctx (context) object separately or within the forward method itself.

static backward(ctx, score_grad_output, explanation_grad_output)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non variable outputs of the forward function), and it should return as many variables, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Variable or is a Variable not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve variables saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, prediction, target, eval_fn, eval_fn_purpose, success_fn, reduction_fn, reduction_fn_purpose)[source]#

Define the forward of the custom autodiff Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (variables or other types).

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the afnio.autodiff.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Variables should not be stored directly on ctx. Instead, variables should be saved either with ctx.save_for_backward() if they are intended to be used in backward.

static setup_context(ctx, inputs, output)#

There are two ways to define the forward pass of an autodiff.Function.

Either:

  1. Override forward with the signature forward(ctx, *args, **kwargs). setup_context is not overridden. Setting up the ctx for backward happens inside the forward.

  2. Override forward with the signature forward(*args, **kwargs) and override setup_context. Setting up the ctx for backward happens inside setup_context (as opposed to inside the forward)

class afnio.autodiff.evaluator.ExactMatchEvaluator(*args, **kwargs)[source]#

Bases: Function

Evaluates predictions using exact matching within the afnio framework, supporting automatic differentiation.

This class inherits from Function and requires both the forward and backward methods to be defined.

The ExactMatchEvaluator function computes a score and an explanation by comparing the data fields of a prediction and a target for an exact match. For each sample:

  • A score of 1 is assigned for an exact match.

  • A score of 0 is assigned otherwise.

The prediction is a Variable. The target can be a string, a list of strings, or a Variable. Each Variable passed as an input argument can have either a scalar or a list .data field, supporting both individual samples and batch processing. For batch processing, the lengths of prediction and target must match.

If batched inputs are provided, the scores can be aggregated using an optional reduction_fn, such as sum. The purpose of the reduction is described using reduction_fn_purpose. If aggregation is not desired, set reduction_fn and reduction_fn_purpose to None.

Example with scalar inputs:

>>> prediction = Variable(
...     data="green",
...     role="color prediction",
...     requires_grad=True
... )
>>> target = "red",
>>> score, explanation = ExactMatchEvaluator.apply(prediction, target)
>>> score.data
0
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> field of the predicted variable ('green') with the <DATA> field of the target variable ('red'), resulting in a score: 0.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'

Example with batched inputs:

>>> prediction = Variable(
...     data=["green", "blue"],
...     role="color prediction",
...     requires_grad=True
... )
>>> target = ["red", "blue"]
>>> score, explanation = ExactMatchEvaluator.apply(prediction, target)
>>> score.data
1
>>> explanation.data
'The evaluation function, designed for 'exact match', compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch, generating individual scores for each pair. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
>>> explanation.backward()
>>> prediction.grad[0].data
'Reassess the criteria that led to the initial prediction of 'green'.'
classmethod apply(*args, **kwargs)#

Applies the forward function of the custom Function class.

This method handles cases where setup_context is defined to set up the ctx (context) object separately or within the forward method itself.

static backward(ctx, score_grad_output, explanation_grad_output)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non variable outputs of the forward function), and it should return as many variables, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Variable or is a Variable not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve variables saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, prediction, target, reduction_fn=<built-in function sum>, reduction_fn_purpose='summation')[source]#

Define the forward of the custom autodiff Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (variables or other types).

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the afnio.autodiff.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Variables should not be stored directly on ctx. Instead, variables should be saved either with ctx.save_for_backward() if they are intended to be used in backward.

static setup_context(ctx, inputs, output)#

There are two ways to define the forward pass of an autodiff.Function.

Either:

  1. Override forward with the signature forward(ctx, *args, **kwargs). setup_context is not overridden. Setting up the ctx for backward happens inside the forward.

  2. Override forward with the signature forward(*args, **kwargs) and override setup_context. Setting up the ctx for backward happens inside setup_context (as opposed to inside the forward)

class afnio.autodiff.evaluator.LMJudgeEvaluator(*args, **kwargs)[source]#

Bases: Function

Implements an evaluation of a model prediction using a language model (LM) as the judge within the afnio framework, supporting automatic differentiation.

This class inherits from Function and requires both the forward and backward methods to be defined.

This function returns a score and an explanation, both as Variable objects, by comparing a prediction against a target (when present) using a composite prompt. The prompt is constructed from a list of messages and optional inputs, which can dynamically populate placeholders in the message templates. The evaluation process leverages the specified forward_model_client to perform the LM-based assessment.

The prediction is a Variable. The target can be a string, a list of strings, or a Variable. Similarly, the inputs dictionary can include strings, lists of strings, or Variable``s. Each ``Variable passed as an input argument can have either a scalar or a list .data field, supporting both individual samples and batch processing. For batch processing, the lengths of prediction, target, and any batched inputs must match.

The success_fn parameter is a user-defined function that returns True when all predictions evaluated by the LM as Judge are considered successful, and False otherwise. If success_fn returns True, the backward pass will skip gradient calculations and directly return an empty gradient, optimizing computational time.

If you are processing a batch of predictions and targets, you can use the reduction_fn to aggregate individual scores (e.g., using sum to compute a total score). The reduction_fn_purpose parameter is a brief description of the aggregation’s purpose (e.g., “summation”). If you don’t want any aggregation, set both reduction_fn and reduction_fn_purpose to None.

The function operates in two modes controlled by eval_mode:

  • eval_mode=True (default) – Computes gradients for prediction only. Use it for direct feedback on predictions.

  • eval_mode=False – Computes gradients for messages and inputs. Use it to optimize the evaluator or align with human evaluation datasets.

Additional model parameters, such as temperature, max tokens, or seed values, can be passed through completion_args to customize the LLM’s behavior.

Example with scalar inputs:

>>> task = Variable(
...     "Evaluate if the translation is accurate.",
...     role="evaluation task",
...     requires_grad=True
... )
>>> format = Variable(
...     "Provide 'score' (true/false) and 'explanation' in JSON.",
...     role="output format"
... )
>>> user = Variable(
...     "<PREDICTION>{prediction}</PREDICTION><TARGET>{target}</TARGET>",
...     role="user query"
... )
>>> prediction = Variable(
...     "Hola Mundo",
...     role="translated text",
...     requires_grad=True
... )
>>> target = Variable("Ciao Mondo", role="expected output")
>>> messages = [
...     {"role": "system", "content": [task, format]},
...     {"role": "user", "content": [user]}
... ]
>>> score, explanation = LMJudgeEvaluator.apply(
...     model,
...     messages,
...     prediction,
...     target,
...     temperature=0.5,
... )
>>> score.data
False
>>> explanation.data
'The translated text is in Spanish, but the expected is in Italian.'
>>> explanation.backward()
>>> prediction.grad[0].data
'The translated text should be in Italian.'

Example with batched inputs:

>>> task = Variable(
...     "Evaluate if the translation is accurate.",
...     role="evaluation task",
...     requires_grad=True
... )
>>> format = Variable(
...     "Provide 'score' (true/false) and 'explanation' in JSON.",
...     role="output format"
... )
>>> user = Variable(
...     "<PREDICTION>{prediction}</PREDICTION><TARGET>{target}</TARGET>",
...     role="user query"
... )
>>> prediction = Variable(
...     data=["Hola Mundo", "Salve a tutti"],
...     role="translated text",
...     requires_grad=True,
... )
>>> target = ["Ciao Mondo", "Salve a tutti"]
>>> score, explanation = LMJudgeEvaluator.apply(
...     model,
...     messages,
...     prediction,
...     target,
...     reduction_fn=sum,
...     reduction_fn_purpose="summation",
... )
>>> score.data
1
>>> explanation.data
'The evaluation function, designed using an LM as the judge, compared the <DATA> fields of the predicted variable and the target variable across all samples in the batch. These scores were then aggregated using the reduction function 'summation', resulting in a final aggregated score: 1.'
classmethod apply(*args, **kwargs)#

Applies the forward function of the custom Function class.

This method handles cases where setup_context is defined to set up the ctx (context) object separately or within the forward method itself.

static backward(ctx, score_grad_output, explanation_grad_output)[source]#

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non variable outputs of the forward function), and it should return as many variables, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Variable or is a Variable not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve variables saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, forward_model_client, messages, prediction, target=None, inputs=None, success_fn=None, reduction_fn=<built-in function sum>, reduction_fn_purpose='summation', eval_mode=True, **completion_args)[source]#

Define the forward of the custom autodiff Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (variables or other types).

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the afnio.autodiff.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Variables should not be stored directly on ctx. Instead, variables should be saved either with ctx.save_for_backward() if they are intended to be used in backward.

static setup_context(ctx, inputs, output)#

There are two ways to define the forward pass of an autodiff.Function.

Either:

  1. Override forward with the signature forward(ctx, *args, **kwargs). setup_context is not overridden. Setting up the ctx for backward happens inside the forward.

  2. Override forward with the signature forward(*args, **kwargs) and override setup_context. Setting up the ctx for backward happens inside setup_context (as opposed to inside the forward)