Differentiation#
This page describes how gradients are computed through the simulation, enabling gradient-based optimization of material properties, initial conditions, and other parameters.
Automatic Differentiation Overview#
torch-diffsim uses automatic differentiation (autodiff) via PyTorch’s autograd system to compute gradients of outputs with respect to inputs through the simulation.
Problem Setup#
Given:
Parameters \(\boldsymbol{\theta}\) (e.g., material properties \(E\), \(\nu\))
Initial state \(\mathbf{x}_0, \mathbf{v}_0\)
Loss function \(\mathcal{L}\) that depends on final or intermediate states
Goal: Compute \(\frac{\partial \mathcal{L}}{\partial \boldsymbol{\theta}}\)
This enables gradient-based optimization:
where \(\alpha\) is the learning rate.
Differentiable Simulation#
The simulation can be viewed as a computational graph:
For differentiation to work, every operation must:
Be differentiable (smooth)
Maintain gradient information (no detach operations)
Allow backward pass through PyTorch’s autograd
Gradient Flow Through Time Steps#
Each simulation step involves:
Backward Pass#
By the chain rule, gradients flow backward:
Computing Force Gradients#
The key computation is \(\frac{\partial \mathbf{f}}{\partial \mathbf{x}}\) (force Jacobian) and \(\frac{\partial \mathbf{f}}{\partial \boldsymbol{\theta}}\).
Since forces come from energy:
The gradient is:
where \(\mathbf{H}\) is the Hessian of the energy (stiffness matrix in FEM terminology).
Energy-Based Force Computation#
In torch-diffsim, forces are computed as:
# Compute elastic energy
E_elastic = sum(psi(F_e) * V_e for all elements)
# Forces as negative energy gradient
forces = -autograd.grad(E_elastic, positions, create_graph=True)[0]
This approach:
Automatically handles material gradients: \(\frac{\partial \mathbf{f}}{\partial \boldsymbol{\theta}}\) is computed by PyTorch
Ensures correctness: Forces are guaranteed to be energy-consistent
Enables higher-order derivatives:
create_graph=Truemaintains the graph
Material Parameter Gradients#
For learnable material parameters \(E\) and \(\nu\):
The force derivative \(\frac{\partial \mathbf{f}}{\partial E}\) comes from:
Since \(\mu\) and \(\lambda\) depend on \(E\):
PyTorch autograd handles this automatically when parameters are torch.nn.Parameter.
Differentiable Contact#
Contact forces must also be differentiable. The barrier function:
has gradient:
This is smooth (no discontinuities), enabling gradient flow even through contact events.
Smooth Operations for Differentiation#
Several operations are modified to maintain smoothness:
Velocity Clamping#
Instead of hard clamp:
Use smooth clamp via tanh:
Fixed Vertices#
Instead of hard assignment:
Use masking:
where \(\mathbf{m}\) is a binary mask (\(m_i = 1\) for fixed vertices).
Memory-Efficient Backpropagation#
For long simulations (many time steps), storing the entire computational graph is memory-intensive.
Gradient Checkpointing#
Gradient checkpointing trades computation for memory:
Forward: Store only every \(K\)-th intermediate state
Backward: Recompute intermediate states as needed
For a simulation with \(T\) steps:
Without checkpointing \(O(T)\) memory
With checkpointing \(O(\sqrt{T})\) memory (with optimal \(K\))
In torch-diffsim:
from torch.utils.checkpoint import checkpoint
for i in range(num_steps):
if i % checkpoint_every == 0:
state = checkpoint(step_fn, state, use_reentrant=False)
else:
state = step_fn(state)
Implicit Differentiation#
For very long simulations or when only the final state matters, implicit differentiation can be used.
For an equilibrium problem \(\mathbf{f}(\mathbf{x}^*, \boldsymbol{\theta}) = 0\), the gradient is:
This requires only:
Solving a linear system (CG or direct solve)
Computing \(\frac{\partial \mathbf{f}}{\partial \boldsymbol{\theta}}\) at the solution
Benefit: \(O(1)\) memory regardless of simulation length.
Tradeoff: Less accurate gradients, only works for steady-state problems.
Spatially Varying Materials#
For per-element material properties \(E_e\), the gradient is:
Since each element’s energy only depends on its own \(E_e\), gradients are computed independently per element.
Log-space parameterization: To ensure positivity, we parameterize:
Then optimize \(\log E_e\) instead of \(E_e\) directly. The gradient transforms as:
Optimization Example#
Material parameter optimization:
Algorithm:
Forward: Run simulation with current \(E, \nu\) → get \(\mathbf{x}_T\)
Loss: Compute \(\mathcal{L} = \|\mathbf{x}_T - \mathbf{x}_{\text{target}}\|^2\)
Backward: Compute \(\frac{\partial \mathcal{L}}{\partial E}, \frac{\partial \mathcal{L}}{\partial \nu}\)
Update: \(E \leftarrow E - \alpha \frac{\partial \mathcal{L}}{\partial E}\)
Practical Considerations#
Gradient Explosion
If gradients become too large:
Reduce learning rate
Use gradient clipping:
torch.nn.utils.clip_grad_norm_(params, max_norm)Increase damping in simulation
Reduce time step
Gradient Vanishing
If gradients become too small:
Increase learning rate
Normalize loss by number of steps
Use adaptive optimizers (Adam, AdamW)
Check for numerical instabilities
Numerical Stability
Clamp \(J\) to prevent inversion: \(J \in [0.1, 5.0]\)
Use stable material models (log terms instead of polynomials)
Apply velocity clamping to prevent blow-up
Use substepping for better stability
Verification#
To verify gradients are correct, use finite differences:
Compare with autograd gradients. They should match to numerical precision (\(\epsilon \sim 10^{-5}\)).