afnio.optim.optimizer#

Classes

Optimizer(params, defaults)

Base class for all optimizers.

class afnio.optim.optimizer.Optimizer(params, defaults)[source]#

Bases: object

Base class for all optimizers.

Warning

Parameters need to be specified as collections that have a deterministic ordering that is consistent between runs. Examples of objects that don’t satisfy those properties are sets and iterators over values of dictionaries.

Parameters:
  • params (iterable) – An iterable of afnio.Variable s or dict s. Specifies what Variables should be optimized.

  • defaults (Dict[str, Any]) – (dict): A dict containing default values of optimization options (used when a parameter group doesn’t specify them).

add_param_group(param_group)[source]#

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Parameters:

param_group (dict) – Specifies what Variables should be optimized along with group specific optimization options.

clear_grad()[source]#

Resets the gradients of all optimized Variable s by setting the .grad attribute of each parameter to an empty list.

defaults: Dict[str, Any] = {}#
load_state_dict(state_dict, model_clients=None)[source]#

Loads the optimizer state.

Parameters:
  • state_dict (dict) – Optimizer state. Should be an object returned from a call to state_dict().

  • model_clients (dict, optional) – A dictionary mapping model client keys (e.g., ‘fw_model_client’) to their respective instances of BaseModel. These instances will be used to reconstruct any model clients referenced within the optimizer state. If a required model client is missing, an error will be raised with instructions on how to provide the missing client.

Raises:
  • ValueError – If the provided state_dict is invalid, such as when the parameter groups or their sizes do not match the current optimizer configuration.

  • ValueError – If a required model client is missing from the model_clients dictionary, with details about the expected model client type and key.

Example

>>> openai_client = AsyncOpenAI()
>>> optimizer.load_state_dict(saved_state_dict, model_clients={
...     'model_client': openai_client)
... })
optimizer_id: Optional[str]#
param_groups: List[Dict[str, Any]] = []#
state: DefaultDict[Variable, Any] = {}#
state_dict()[source]#

Returns the state of the optimizer as a dict.

It contains two entries:

  • state: a Dict holding current optimization state. Its content

    differs between optimizer classes, but some common characteristics hold. For example, state is saved per parameter, and the parameter itself is NOT saved. state is a Dictionary mapping parameter ids to a Dict with state corresponding to each parameter.

  • param_groups: a List containing all parameter groups where each

    parameter group is a Dict. Each parameter group contains metadata specific to the optimizer, such as learning rate and momentum, as well as a List of parameter IDs of the parameters in the group.

NOTE: The parameter IDs may look like indices but they are just IDs associating state with param_group. When loading from a state_dict, the optimizer will zip the param_group params (int IDs) and the optimizer param_groups (actual cog.Parameter s) in order to match state WITHOUT additional verification.

A returned state dict might look something like:

{
    'state': {
        0: {
            'momentum_buffer': [
                (
                    Parameter(data='You are...', role='system prompt', requires_grad=True),
                    [Variable(data='The system prompt should...', role='gradient for system prompt')]
                )
            ]
        },
        1: {
            'momentum_buffer': [
                (
                    Parameter(data='Answer this...', role='instructin prompt', requires_grad=True),
                    [Variable(data='The instruction prompt must...', role='gradient to instruction prompt')]
                )
            ]
        }
    },
    'param_groups': [
        {
            'model_client': {'class_type': 'AsyncOpenAI'},
            'messages': [
                {
                    'role': 'system',
                    'content': [Variable(data='You are part of an optimization system...', role='optimizer system prompt', requires_grad=False)]
                },
                {
                    'role': 'user',
                    'content': [Variable(data='Here is the variable you need...', role='optimizer user prompt', requires_grad=False)]
                }
            ],
            'inputs': {},
            'constraints': [],
            'momentum': 2,
            'completion_args': {'model': 'gpt-4o'},
            'params': [0, 1]
        }
    ]
}
step(closure=None)[source]#

Performs a single optimization step (parameter update).

Parameters:

closure (Callable, optional) – A closure that reevaluates the model and returns the loss as a tuple containing a numerical score and a textual explanation. This closure is optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

Note

Some optimization algorithms need to reevaluate the function multiple times, so you have to pass in a closure that allows them to recompute your model. The closure should clear the gradients, compute the loss, and return it.

Example:

for input, target in dataset:
    def closure():
        optimizer.clear_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        return loss

    optimizer.step(closure)