(This discussion follows Chapter 15 of *Numerical Recipes in C++ 2nd Edition* by W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Cambridge University Press, 2002. )

Suppose we have $N$ data points of the form $x_i,y_i$ where the uncertainty in each $y_i$ is $\sigma_i$. If we believe this data can be modeled as a straight line, $f(x) = a + b(x)$, then how do we determine the best estimates for the fit parameters $a$ and $b$? In the method of least-squares, we say the best fit is the one which minimizes the sum of the squared, weighted residuals, also known as the chi-squared function,

$\chi^2 = \displaystyle\sum_{t=0}^{N-1}\left( \dfrac{y_i-(a + bx_i)}{\sigma_i}\right)^2$

At minimum, the derivatives with respect to $a$ and $b$ vanish giving $0 = \dfrac{\partial\chi^2}{\partial a} = -2 \displaystyle\sum_{t=0}^{N-1} \dfrac{y_i-a + bx_i}{\sigma_i^2}$

and $0 = \dfrac{\partial\chi^2}{\partial b} = -2 \displaystyle\sum_{t=0}^{N-1} \dfrac{x_i(y_i-a + bx_i)}{\sigma_i^2}$

If we establish some notation,

$S \equiv \displaystyle\sum_{t=0}^{N-1} \dfrac{1}{\sigma_i^2}$ | $S_x \equiv \displaystyle\sum_{t=0}^{N-1} \dfrac{x_i}{\sigma_i^2}$ | $S_y \equiv \displaystyle\sum_{t=0}^{N-1} \dfrac{y_i}{\sigma_i^2}$ |

$S_{xx} \equiv \displaystyle\sum_{t=0}^{N-1} \dfrac{x_i^2}{\sigma_i^2}$ | $S_{xy} \equiv \displaystyle\sum_{t=0}^{N-1} \dfrac{x_iy_i}{\sigma_i^2}$ | $\Delta = SS_{xx} - (S_x)^2$ |

we can solve the derivative equations for the fit parameters, giving $a = \dfrac{S_{xx}S_y - S_x S_{xy}}{\Delta}$ and $b = \dfrac{SS_{xy} - S_x S_{y}}{\Delta}$

Uncertainties in these values are derived from the definition of the variance in each parameter $f = \{a,b\}$ , $\sigma^2_f = \displaystyle\sum_{t=0}^{N-1} \sigma^2_i\left(\dfrac{\partial f}{\partial y_i}\right)^2$

where $\dfrac{\partial a}{\partial y_i} = \dfrac{S_{xx} - S_xx_i}{\sigma^2_i \Delta}$ and $\dfrac{\partial b}{\partial y_i} = \dfrac{Sx_i - S_x}{\sigma^2_i \Delta}$

This leads to $\sigma_a^2 = S_{xx}/\Delta$ and $\sigma_b^2 = S/\Delta$

In order to judge the quality of the fit, we need a quantitative measure of the goodness-of-fit. The most obvious choice (and the one we will concentrate on in this class) is the reduced chi-squared, $\chi^2_{red} = \dfrac{\chi^2}{\nu}$ where $\nu$ is number of degrees of freedom, or the total number of points minus the number of fit parameters. (Here, $\nu = N-2$). A good fit is one where the reduced chi-squared is close to one; much larger than one there is great discrepancy between the fit and the data points, while much less than one there is better agreement than is expected given the size of the uncertainties. (For more on chi-squared, see the “Goodness of Fit” Section above.)

Suppose, however, that we do not have independent estimates for the uncertainties in each data point. In that case, we CANNOT use the reduced chi-squared as a measure of goodness of fit, but may instead use it to estimate the mean deviation on each data point and the uncertainty on our fit parameters.

To proceed without uncertainties on each data point, start by *assuming* a uniform uncertainty on each point, $\sigma_i = 1$. Compute the fit and parameter uncertainties as normal. Now, the reduced chi-squared does not represent the goodness-of-fit, but instead the square of the mean separation (residual) between fit and data. In other words, the square root of the reduced chi-squared is an estimate of the uncertainty in each point if the fit function is indeed the correct model.
With this new reduced chi-squared in hand, we can rescale our initial guess for the uncertainty in each point

$\sigma_i^2 = \chi^2/\nu$ and rescale our estimates for the uncertainty in each fit parameter,

$\sigma^2_a = \dfrac{S_{xx}}{\Delta}\dfrac{\chi^2}{\nu}$ and $\sigma^2_b = \dfrac{S}{\Delta}\dfrac{\chi^2}{\nu}$

The analytic method described above yields exact fit parameters for data modeled by a linear function. However, we may not want to be limited to fitting only linear functions, so a more general technique is necessary.

The `scipy.optimize.leastsq()`

function uses a numerical algorithm to determine the best parameter values which minimize the sum of the squared-residuals for any generic fit function $f(x_i)$, $\chi^2 = \displaystyle \sum_{i=1}^{N-1}\left(\dfrac{y_i - f(x_i)}{\sigma_i}\right)^2$
The algorithm it uses is the so-called Levenberg-Marquardt method which is itself a mash-up of the Gauss-Newton and gradient descent methods. The algorithm requires an initial guess for the fit parameters $(a, b, c, \dots)$ and computes numerical partial derivatives to gradually modify these values until it arrives at the minimum of the $\chi^2$. The algorithm is robust (meaning it can often find a solution even when the initial guesses are not close to the minimum), but it is slower than many other numerical methods. (Slow here means seconds rather than milliseconds… so not slow enough to be a problem unless you're doing thousands of fits.)
Like most numerical optimization algorithms, this algorithm will find a local minimum, which may or may not be the true global minimum. It is good practice to run the algorithm several times with different initial guesses to make sure that the result is always the same.

Here is a PDF discussing the chi-squared goodness-of-fit parameter:

Here is a PDF discussing one method for handling uncertainties in both *x* and *y:*