Automatic Differentiation
Tensors.curl
Tensors.divergence
Tensors.gradient
Tensors.hessian
Tensors.laplace
Tensors.@implement_gradient
Tensors
supports forward mode automatic differentiation (AD) of tensorial functions to compute first order derivatives (gradients) and second order derivatives (Hessians). It does this by exploiting the Dual
number defined in ForwardDiff.jl
. While ForwardDiff.jl
can itself be used to differentiate tensor functions it is a bit awkward because ForwardDiff.jl
is written to work with standard Julia Array
s. One therefore has to send the input argument as an Array
to ForwardDiff.jl
, convert it to a Tensor
and then convert the output Array
to a Tensor
again. This can also be inefficient since these Array
s are allocated on the heap so one needs to preallocate which can be annoying.
Instead, it is simpler to use Tensors
own AD API to do the differentiation. This does not require any conversions and everything will be stack allocated so there is no need to preallocate.
The API for AD in Tensors
is gradient(f, A)
and hessian(f, A)
where f
is a function and A
is a first or second order tensor. For gradient
the function can return a scalar, vector (in case the input is a vector) or a second order tensor. For hessian
the function should return a scalar.
When evaluating the function with dual numbers, the value (value and gradient in the case of hessian) is obtained automatically, along with the gradient. To obtain the lower order results gradient
and hessian
accepts a third arguement, a Symbol
. Note that the symbol is only used to dispatch to the correct function, and thus it can be any symbol. In the examples the symbol :all
is used to obtain all the lower order derivatives and values.
Tensors.gradient
— Functiongradient(f::Function, v::Union{SecondOrderTensor, Vec, Number})
gradient(f::Function, v::Union{SecondOrderTensor, Vec, Number}, :all)
Computes the gradient of the input function. If the (pseudo)-keyword all
is given, the value of the function is also returned as a second output argument.
Examples
julia> A = rand(SymmetricTensor{2, 2});
julia> ∇f = gradient(norm, A)
2×2 SymmetricTensor{2, 2, Float64, 3}:
0.434906 0.56442
0.56442 0.416793
julia> ∇f, f = gradient(norm, A, :all);
Tensors.hessian
— Functionhessian(f::Function, v::Union{SecondOrderTensor, Vec, Number})
hessian(f::Function, v::Union{SecondOrderTensor, Vec, Number}, :all)
Computes the hessian of the input function. If the (pseudo)-keyword all
is given, the lower order results (gradient and value) of the function is also returned as a second and third output argument.
Examples
julia> A = rand(SymmetricTensor{2, 2});
julia> ∇∇f = hessian(norm, A)
2×2×2×2 SymmetricTensor{4, 2, Float64, 9}:
[:, :, 1, 1] =
0.596851 -0.180684
-0.180684 -0.133425
[:, :, 2, 1] =
-0.180684 0.133546
0.133546 -0.173159
[:, :, 1, 2] =
-0.180684 0.133546
0.133546 -0.173159
[:, :, 2, 2] =
-0.133425 -0.173159
-0.173159 0.608207
julia> ∇∇f, ∇f, f = hessian(norm, A, :all);
Tensors.divergence
— Functiondivergence(f, x)
Calculate the divergence of the vector field f
, in the point x
.
Examples
julia> f(x) = 2x;
julia> x = rand(Vec{3});
julia> divergence(f, x)
6.0
Tensors.curl
— Functioncurl(f, x)
Calculate the curl of the vector field f
, in the point x
.
Examples
julia> f(x) = Vec{3}((x[2], x[3], -x[1]));
julia> x = rand(Vec{3});
julia> curl(f, x)
3-element Vec{3, Float64}:
-1.0
1.0
-1.0
Tensors.laplace
— Functionlaplace(f, x)
Calculate the laplacian of the field f
, in the point x
. If f
is a vector field, use broadcasting.
Examples
julia> x = rand(Vec{3});
julia> f(x) = norm(x);
julia> laplace(f, x)
1.7833701103136868
julia> g(x) = x*norm(x);
julia> laplace.(g, x)
3-element Vec{3, Float64}:
2.107389336871036
2.7349658311504834
2.019621767876747
Examples
We here give a few examples of differentiating various functions and compare with the analytical solution.
Norm of a vector
\[f(\mathbf{x}) = |\mathbf{x}| \quad \Rightarrow \quad \partial f / \partial \mathbf{x} = \mathbf{x} / |\mathbf{x}|\]
julia> x = rand(Vec{2});
julia> gradient(norm, x)
2-element Vec{2, Float64}:
0.6103600560550116
0.7921241076829584
julia> x / norm(x)
2-element Vec{2, Float64}:
0.6103600560550116
0.7921241076829584
Determinant of a second order symmetric tensor
\[f(\mathbf{A}) = \det \mathbf{A} \quad \Rightarrow \quad \partial f / \partial \mathbf{A} = \mathbf{A}^{-T} \det \mathbf{A}\]
julia> A = rand(SymmetricTensor{2,2});
julia> gradient(det, A)
2×2 SymmetricTensor{2, 2, Float64, 3}:
0.566237 -0.766797
-0.766797 0.590845
julia> inv(A)' * det(A)
2×2 SymmetricTensor{2, 2, Float64, 3}:
0.566237 -0.766797
-0.766797 0.590845
Hessian of a quadratic potential
\[\psi(\mathbf{e}) = 1/2 \mathbf{e} : \mathsf{E} : \mathbf{e} \quad \Rightarrow \quad \partial \psi / (\partial \mathbf{e} \otimes \partial \mathbf{e}) = \mathsf{E}^\text{sym}\]
where $\mathsf{E}^\text{sym}$ is the major symmetric part of $\mathsf{E}$.
julia> E = rand(Tensor{4,2});
julia> ψ(ϵ) = 1/2 * ϵ ⊡ E ⊡ ϵ;
julia> E_sym = hessian(ψ, rand(Tensor{2,2}));
julia> norm(majorsymmetric(E) - E_sym)
0.0
Inserting a known derivative
When conditionals are used in a function evaluation, automatic differentiation may yield the wrong result. Consider, the simplified example of the function f(x) = is_zero(x) ? zero(x) : sin(x)
. If evaluated at x=0
, the returning of zero(x)
gives a zero derivative because zero(x)
is constant, while the correct value is 1. In such cases, it is possible to insert a known derivative of a function which is part of a larger function to be automatically differentiated.
Another use case is when the analytical derivative can be computed much more efficiently than the automatically differentiatiated derivative.
Tensors.@implement_gradient
— Macro@implement_gradient(f, f_dfdx)
This macro allows specifying a function f_dfdx
that provides an analytical derivative of the function f
, and is invoked when f
is differentiated using automatic differentiation based on ForwardDiff.jl
(e.g. when using Tensors.jl
's gradient
or hessian
), or one of ForwardDiff.jl
's API). The function f_dfdx
must take the same argument as f
and should return both the value of f
and the gradient, i.e. fval, dfdx_val = f_dfdx(x)
. The following combinations of input and output types are supported:
x | f(x) | dfdx |
---|---|---|
Number | Number | Number |
Number | Vec | Vec |
Number | SecondOrderTensor | SecondOrderTensor |
Vec | Number | Vec |
Vec | Vec | Tensor{2} |
SecondOrderTensor | Number | SecondOrderTensor |
SecondOrderTensor | SecondOrderTensor | FourthOrderTensor |
Note that if one tensor if of symmetric type, then all tensors must be of symmetric type
Example
Lets consider the function $h(\mathbf{f}(\mathbf{g}(\mathbf{x})))$ where h(x)=norm(x)
, f(x)=x ⋅ x
, and g(x)=dev(x)
. For f(x)
we then have the analytical derivative
\[\frac{\partial f_{ij}}{\partial x_{kl}} = \delta_{ik} x_{lj} + x_{ik} \delta_{jl}\]
which we can insert into our known analytical derivative using the @implement_gradient
macro. Below, we compare with the result when the full derivative is calculated using automatic differentiation.
# Define functions
h(x) = norm(x)
f1(x) = x ⋅ x
f2(x) = f1(x)
g(x) = dev(x)
# Define composed functions
cfun1(x) = h(f1(g(x)))
cfun2(x) = h(f2(g(x)))
# Define known derivative
function df2dx(x::Tensor{2,dim}) where{dim}
println("Hello from df2dx") # Show that df2dx is called
fval = f2(x)
I2 = one(Tensor{2,dim})
dfdx_val = otimesu(I2, transpose(x)) + otimesu(x, I2)
return fval, dfdx_val
end
# Implement known derivative
@implement_gradient f2 df2dx
# Calculate gradients
x = rand(Tensor{2,2})
gradient(cfun1, x) ≈ gradient(cfun2, x)
# output
Hello from df2dx
true