arcopt

R-CMD-check Lifecycle: experimental License: MIT

Robust nonlinear optimization for R

When optim() fails on your ill-conditioned model, arcopt is designed to succeed. It uses Adaptive Regularization with Cubics (ARC) to handle indefinite Hessians and escape saddle points automatically.

Installation

# Install from GitHub
pak::pak("marcus-waldman/arcopt")

# Or with devtools
devtools::install_github("marcus-waldman/arcopt")

Quick Start

library(arcopt)

# Rosenbrock function - a classic difficult optimization problem
result <- arcopt(
  x0 = c(-1.2, 1),
  fn = function(x) (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2,
  gr = function(x) c(
    -2 * (1 - x[1]) - 400 * x[1] * (x[2] - x[1]^2),
    200 * (x[2] - x[1]^2)
  ),
  hess = function(x) matrix(c(
    1200 * x[1]^2 - 400 * x[2] + 2, -400 * x[1],
    -400 * x[1], 200
  ), 2, 2)
)

result$par
#> [1] 1 1

Two Modes: Exact Hessian vs Quasi-Newton

arcopt offers two optimization strategies:

Mode 1: Exact Hessian (Default)

Best when you can compute the Hessian analytically or via automatic differentiation.

# Provide fn, gr, and hess
result <- arcopt(x0, fn, gr, hess)

Mode 2: Quasi-Newton (No Hessian Required)

Best when computing the Hessian is expensive or unavailable. Uses BFGS/SR1 approximations. arcopt seeds the initial approximation \(B_0\) with a one-time finite-difference Hessian computed from the supplied gradient (cost: \(2n\) gradient evaluations at startup), which dramatically improves saddle-escape behavior compared to the identity-initialization convention used in classical BFGS/SR1.

# Just provide fn and gr - no Hessian needed
result <- arcopt(x0, fn, gr, control = list(use_qn = TRUE))

Which Mode Should I Use?

Scenario Recommendation
Analytic Hessian available Use exact Hessian (default)
Hessian via autodiff (torch, Stan) Use exact Hessian
Hessian expensive (n > 100) Use use_qn = TRUE
No Hessian function Use use_qn = TRUE

Benchmark Results

Tested on 8 standard functions with 10 starting points each (80 test cases):

Method Convergence Avg Iterations Hessian Evals
Exact Hessian 100% 22 1,443
BFGS QN 100% 28 0
SR1 QN 99% 61 0

BFGS Quasi-Newton achieves the same 100% convergence as exact Hessian with only 26% more iterations and zero Hessian evaluations.

On problems with symmetric saddle points (e.g., growth-mixture models, factor models with rotational symmetry), qn_method = "hybrid" combines SR1’s ability to represent indefinite curvature with BFGS’s fast local convergence and matches full-Hessian solvers’ saddle-escape reliability using only gradient information.

Box Constraints

# Optimize with bounds: 0 <= x <= 0.5
result <- arcopt(
  x0 = c(0.1, 0.1),
  fn = fn, gr = gr, hess = hess,
  lower = c(0, 0),
  upper = c(0.5, 0.5)
)

Why ARC?

Standard optimizers struggle when the Hessian is indefinite or nearly singular—common in:

Cubic regularization automatically handles negative curvature without line searches or trust-region adjustments. It provably escapes saddle points and converges to local minima.

The Key Idea

Instead of the quadratic model used by Newton’s method:

m(s) = f + g's + (1/2) s'Hs

ARC adds a cubic term:

m(s) = f + g's + (1/2) s'Hs + (σ/3)||s||³

This cubic term prevents unbounded steps when the Hessian has negative eigenvalues.

Adaptive Solver-Mode Dispatch

arcopt() automatically adapts its subproblem solver to the local landscape:

  1. Cubic (default). Uses adaptive cubic regularization with a Newton-first fast path. Handles saddles and indefinite Hessians naturally.
  2. Trust-region (enabled by default). Fires a one-way switch from cubic when four runtime signals indicate flat-ridge or stuck-indefinite-saddle stagnation. Replaces the cubic penalty with a hard step-norm constraint so the step can walk off a near-singular ridge.
  3. QN-polish (opt-in). Fires a bidirectional switch from cubic when the iterate enters the quadratic attraction basin of a strict minimum. Replaces the cubic subproblem with Wolfe line-search BFGS warmstarted from the current exact Hessian; skips further hess() calls until convergence or revert, saving wall-clock time on expensive Hessians (Stan AD, finite differences).
result <- arcopt(x0, fn, gr, hess, control = list(
  tr_fallback_enabled = TRUE,    # cubic->TR on flat-ridge (default)
  qn_polish_enabled = TRUE       # cubic<->BFGS polish in basin (opt-in)
))

result$diagnostics$solver_mode_final   # "cubic", "tr", or "qn_polish"
result$diagnostics$ridge_switches      # count of cubic->TR transitions
result$diagnostics$qn_polish_switches  # count of cubic->polish transitions
result$diagnostics$qn_polish_reverts   # count of polish->cubic reversions

See design/design-principles.qmd §3a for the σ↔︎r duality framing and todo/tr-fallback-hybrid-briefing.md / todo/qn-polish-mode-briefing.md for the detector designs.

Common Options

arcopt(x0, fn, gr, hess,
  control = list(
    # Quasi-Newton mode (no exact Hessian needed)
    use_qn = TRUE,
    qn_method = "hybrid",    # recommended for saddle-prone problems;
                             # also "bfgs", "sr1"

    # Convergence tolerances
    gtol_abs = 1e-5,         # Gradient norm
    maxit = 1000,            # Max iterations

    # Hybrid solver-mode dispatch
    tr_fallback_enabled = TRUE,   # cubic->TR on stagnation
    qn_polish_enabled   = FALSE,  # cubic<->polish in quadratic basin

    # Output
    trace   = 1,             # Depth of saved trace data (0/1/2/3)
    verbose = FALSE          # Print one line per iteration to console
  )
)

The full set of advanced controls (cubic regularization tuning, TR-fallback thresholds, polish-mode thresholds, QN routing) lives under ?arcopt_advanced_controls.

Comparison with optim()

Feature arcopt optim(BFGS) optim(L-BFGS-B)
Handles indefinite Hessian
Escapes saddle points
Box constraints
Hessian-free mode
Best for Complex likelihoods Smooth convex Large bound-constrained

When NOT to Use arcopt

arcopt is a local optimizer. Use something else for:

Best for: 2-500 parameters, nonconvex objectives, need robust convergence.

Return Value

result <- arcopt(x0, fn, gr, hess)

result$par                 # Optimal parameters
result$value               # Optimal function value
result$gradient            # Gradient at solution
result$hessian             # Hessian at solution (exact in cubic/tr mode;
                           # BFGS approximation if solver_mode_final == "qn_polish")
result$converged           # TRUE if converged
result$iterations          # Number of iterations
result$evaluations         # List: fn, gr, hess counts
result$message             # Why it stopped

# Solver-mode diagnostics (nested sublist; most users do not inspect this).
# See ?arcopt_advanced_controls for the full schema.
result$diagnostics$solver_mode_final           # "cubic", "tr", or "qn_polish"
result$diagnostics$ridge_switches              # count of cubic->TR transitions
result$diagnostics$radius_final                # final TR radius (NA if no switch)
result$diagnostics$qn_polish_switches          # count of cubic->qn_polish transitions
result$diagnostics$qn_polish_reverts           # count of qn_polish->cubic reversions
result$diagnostics$hess_evals_at_polish_switch # baseline for Hessian-eval savings

References

Adaptive Regularization with Cubics (ARC)

Cartis, C., Gould, N. I. M., & Toint, P. L. (2011). Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2), 245-295.

Nesterov, Y., & Polyak, B. T. (2006). Cubic regularization of Newton method and its global performance. Mathematical Programming, 108(1), 177-205.

Quasi-Newton with Cubic Regularization

Kamzolov, D., Ziu, K., Agafonov, A., & Takáč, M. (2025). Accelerated Adaptive Cubic Regularized Quasi-Newton Methods. Journal of Optimization Theory and Applications, 208. https://doi.org/10.1007/s10957-025-02804-3

Kamzolov, D., Ziu, K., Agafonov, A., & Takáč, M. (2023). Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of O(k^-2) for Convex Functions. arXiv:2302.04987.

Benson, H. Y., & Shanno, D. F. (2018). Cubic regularization in symmetric rank-1 quasi-Newton methods. Mathematical Programming Computation, 10(4), 457-494.

License

MIT © 2025 Marcus Waldman

Contributing

Issues and PRs welcome at: https://github.com/marcus-waldman/arcopt/issues