Automatic Differentiation

Tensors supports forward mode automatic differentiation (AD) of tensorial functions to compute first order derivatives (gradients) and second order derivatives (Hessians). It does this by exploiting the Dual number defined in ForwardDiff.jl. While ForwardDiff.jl can itself be used to differentiate tensor functions it is a bit awkward because ForwardDiff.jl is written to work with standard Julia Arrays. One therefore has to send the input argument as an Array to ForwardDiff.jl, convert it to a Tensor and then convert the output Array to a Tensor again. This can also be inefficient since these Arrays are allocated on the heap so one needs to preallocate which can be annoying.

Instead, it is simpler to use Tensors own AD API to do the differentiation. This does not require any conversions and everything will be stack allocated so there is no need to preallocate.

The API for AD in Tensors is gradient(f, A) and hessian(f, A) where f is a function and A is a first or second order tensor. For gradient the function can return a scalar, vector (in case the input is a vector) or a second order tensor. For hessian the function should return a scalar.

When evaluating the function with dual numbers, the value (value and gradient in the case of hessian) is obtained automatically, along with the gradient. To obtain the lower order results gradient and hessian accepts a third arguement, a Symbol. Note that the symbol is only used to dispatch to the correct function, and thus it can be any symbol. In the examples the symbol :all is used to obtain all the lower order derivatives and values.

Tensors.gradientFunction
gradient(f::Function, v::Union{SecondOrderTensor, Vec, Number})
gradient(f::Function, v::Union{SecondOrderTensor, Vec, Number}, :all)

Computes the gradient of the input function. If the (pseudo)-keyword all is given, the value of the function is also returned as a second output argument.

Examples

julia> A = rand(SymmetricTensor{2, 2});

julia> ∇f = gradient(norm, A)
2×2 SymmetricTensor{2, 2, Float64, 3}:
 0.434906  0.56442
 0.56442   0.416793

julia> ∇f, f = gradient(norm, A, :all);
source
Tensors.hessianFunction
hessian(f::Function, v::Union{SecondOrderTensor, Vec, Number})
hessian(f::Function, v::Union{SecondOrderTensor, Vec, Number}, :all)

Computes the hessian of the input function. If the (pseudo)-keyword all is given, the lower order results (gradient and value) of the function is also returned as a second and third output argument.

Examples

julia> A = rand(SymmetricTensor{2, 2});

julia> ∇∇f = hessian(norm, A)
2×2×2×2 SymmetricTensor{4, 2, Float64, 9}:
[:, :, 1, 1] =
  0.596851  -0.180684
 -0.180684  -0.133425

[:, :, 2, 1] =
 -0.180684   0.133546
  0.133546  -0.173159

[:, :, 1, 2] =
 -0.180684   0.133546
  0.133546  -0.173159

[:, :, 2, 2] =
 -0.133425  -0.173159
 -0.173159   0.608207

julia> ∇∇f, ∇f, f = hessian(norm, A, :all);
source
Tensors.divergenceFunction
divergence(f, x)

Calculate the divergence of the vector field f, in the point x.

Examples

julia> f(x) = 2x;

julia> x = rand(Vec{3});

julia> divergence(f, x)
6.0
source
Tensors.curlFunction
curl(f, x)

Calculate the curl of the vector field f, in the point x.

Examples

julia> f(x) = Vec{3}((x[2], x[3], -x[1]));

julia> x = rand(Vec{3});

julia> curl(f, x)
3-element Vec{3, Float64}:
 -1.0
  1.0
 -1.0
source
Tensors.laplaceFunction
laplace(f, x)

Calculate the laplacian of the field f, in the point x. If f is a vector field, use broadcasting.

Examples

julia> x = rand(Vec{3});

julia> f(x) = norm(x);

julia> laplace(f, x)
1.7833701103136868

julia> g(x) = x*norm(x);

julia> laplace.(g, x)
3-element Vec{3, Float64}:
 2.107389336871036
 2.7349658311504834
 2.019621767876747
source

Examples

We here give a few examples of differentiating various functions and compare with the analytical solution.

Norm of a vector

\[f(\mathbf{x}) = |\mathbf{x}| \quad \Rightarrow \quad \partial f / \partial \mathbf{x} = \mathbf{x} / |\mathbf{x}|\]

julia> x = rand(Vec{2});

julia> gradient(norm, x)
2-element Vec{2, Float64}:
 0.6103600560550116
 0.7921241076829584

julia> x / norm(x)
2-element Vec{2, Float64}:
 0.6103600560550116
 0.7921241076829584

Determinant of a second order symmetric tensor

\[f(\mathbf{A}) = \det \mathbf{A} \quad \Rightarrow \quad \partial f / \partial \mathbf{A} = \mathbf{A}^{-T} \det \mathbf{A}\]

julia> A = rand(SymmetricTensor{2,2});

julia> gradient(det, A)
2×2 SymmetricTensor{2, 2, Float64, 3}:
  0.566237  -0.766797
 -0.766797   0.590845

julia> inv(A)' * det(A)
2×2 SymmetricTensor{2, 2, Float64, 3}:
  0.566237  -0.766797
 -0.766797   0.590845

Hessian of a quadratic potential

\[\psi(\mathbf{e}) = 1/2 \mathbf{e} : \mathsf{E} : \mathbf{e} \quad \Rightarrow \quad \partial \psi / (\partial \mathbf{e} \otimes \partial \mathbf{e}) = \mathsf{E}^\text{sym}\]

where $\mathsf{E}^\text{sym}$ is the major symmetric part of $\mathsf{E}$.

julia> E = rand(Tensor{4,2});

julia> ψ(ϵ) = 1/2 * ϵ ⊡ E ⊡ ϵ;

julia> E_sym = hessian(ψ, rand(Tensor{2,2}));

julia> norm(majorsymmetric(E) - E_sym)
0.0

Inserting a known derivative

When conditionals are used in a function evaluation, automatic differentiation may yield the wrong result. Consider, the simplified example of the function f(x) = is_zero(x) ? zero(x) : sin(x). If evaluated at x=0, the returning of zero(x) gives a zero derivative because zero(x) is constant, while the correct value is 1. In such cases, it is possible to insert a known derivative of a function which is part of a larger function to be automatically differentiated.

Another use case is when the analytical derivative can be computed much more efficiently than the automatically differentiatiated derivative.

Tensors.@implement_gradientMacro
@implement_gradient(f, f_dfdx)

This macro allows specifying a function f_dfdx that provides an analytical derivative of the function f, and is invoked when f is differentiated using automatic differentiation based on ForwardDiff.jl (e.g. when using Tensors.jl's gradient or hessian), or one of ForwardDiff.jl's API). The function f_dfdx must take the same argument as f and should return both the value of f and the gradient, i.e. fval, dfdx_val = f_dfdx(x). The following combinations of input and output types are supported:

xf(x)dfdx
NumberNumberNumber
NumberVecVec
NumberSecondOrderTensorSecondOrderTensor
VecNumberVec
VecVecTensor{2}
SecondOrderTensorNumberSecondOrderTensor
SecondOrderTensorSecondOrderTensorFourthOrderTensor

Note that if one tensor if of symmetric type, then all tensors must be of symmetric type

source

Example

Lets consider the function $h(\mathbf{f}(\mathbf{g}(\mathbf{x})))$ where h(x)=norm(x), f(x)=x ⋅ x, and g(x)=dev(x). For f(x) we then have the analytical derivative

\[\frac{\partial f_{ij}}{\partial x_{kl}} = \delta_{ik} x_{lj} + x_{ik} \delta_{jl}\]

which we can insert into our known analytical derivative using the @implement_gradient macro. Below, we compare with the result when the full derivative is calculated using automatic differentiation.

# Define functions
h(x) = norm(x)
f1(x) = x ⋅ x
f2(x) = f1(x)
g(x) = dev(x)

# Define composed functions
cfun1(x) = h(f1(g(x)))
cfun2(x) = h(f2(g(x)))

# Define known derivative
function df2dx(x::Tensor{2,dim}) where{dim}
    println("Hello from df2dx") # Show that df2dx is called
    fval = f2(x)
    I2 = one(Tensor{2,dim})
    dfdx_val = otimesu(I2, transpose(x)) + otimesu(x, I2)
    return fval, dfdx_val
end

# Implement known derivative
@implement_gradient f2 df2dx

# Calculate gradients
x = rand(Tensor{2,2})

gradient(cfun1, x) ≈ gradient(cfun2, x)

# output
Hello from df2dx
true