afnio.autodiff#
Functions
|
Computes the sum of gradients of given variables with respect to graph leaves. |
- afnio.autodiff.backward(variables, grad_variables=None, retain_graph=None, create_graph=False, inputs=None)[source]#
Computes the sum of gradients of given variables with respect to graph leaves.
The graph is differentiated using the chain rule. If any of
variablesare non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifyinggrad_variables. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding variables (Noneis an acceptable value for all variables that don’t need gradient variables).This function accumulates gradients in the leaves - you might need to zero
.gradattributes or set them toNonebefore calling it.Note
Using this method with
create_graph=Truewill create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend usingautodiff.gradwhen creating the graph to avoid this. If you have to use this function, make sure to reset the.gradfields of your parameters toNoneafter use to break the cycle and avoid the leak.Note
When
inputsare provided, each input must be a leaf variable. If any input is not a leaf, aRuntimeErroris raised.- Parameters:
variables (Sequence[Variables] or Variable) – Variables of which the derivative will be computed.
grad_variables (Sequence[Variable or None] or Variable, optional) – The “vector” in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding variables. None values can be specified for scalar Variables or ones that don’t require grad. If a None value would be acceptable for all grad_variables, then this argument is optional.
retain_graph (bool, optional) – If
False, the graph used to compute the grads will be freed. Setting this toTrueretains the graph, allowing for additional backward calls on the same graph, useful for example for multi-task learning where you have multiple losses. However, retaining the graph is not needed in nearly all cases and can be worked around in a much more efficient way. Defaults to the value ofcreate_graph.create_graph (bool, optional) – If
True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults toFalse.inputs (Sequence[Variable] or Variable or Sequence[GradientEdge], optional) – Inputs w.r.t. which the gradient will be accumulated into
.grad. All other Variables will be ignored. If not provided, the gradient is accumulated into all the leaf Variables that were used to compute thevariables.
- afnio.autodiff.no_grad()[source]#
Context manager that disables gradient calculation. All operations within this block will not track gradients, making them more memory-efficient.
Disabling gradient calculation is useful for inference, when you are sure that you will not call
Variable.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True.In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True. There is an exception! All factory functions, or functions that create a new Variable and take a requires_grad kwarg, will NOT be affected by this mode.
This context manager is thread local; it will not affect computation in other threads.
Also functions as a decorator.
- Example::
>>> x = hf.Variable("abc", role="variable", requires_grad=True) >>> with hf.no_grad(): ... y = x + x >>> y.requires_grad False >>> @hf.no_grad() ... def doubler(x): ... return x + x >>> z = doubler(x) >>> z.requires_grad False >>> @hf.no_grad ... def tripler(x): ... return x + x + x >>> z = tripler(x) >>> z.requires_grad False >>> # factory function exception >>> with hf.no_grad(): ... a = hf.cognitive.Parameter("xyz") >>> a.requires_grad True
Modules