Abstract

The stress loss function in metric multidimensional scaling is differentiable at local minima. In this note we generalize this result to more general functions of the distances.**Note:** This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory deleeuwpdx.net/pubfolders/distdiff has a pdf version, the bib file, and the complete Rmd file.

In multidimensional scaling we minimize the stress measure proposed by Kruskal (1964a), Kruskal (1964b) and defined as \[\begin{equation}
\sigma(x)=\sum_{k=1}^m w_k(\delta_k-\sqrt{x'A_kx})^2.
\end{equation}\] The \(w_k\) are known positive *weights*, the \(\delta_k\) are known positive *dissimilarities*. The \(A_k\) are known positive semi-definite matrices. In multidimensional scaling the \(A_k\) are defined in such a way that the \(x'A_kx\) are squared Euclidean distances between \(n\) points in \(\mathbb{R}^p\), but this results in this note are valid for any positive semie-definite \(A_k\).

Clearly stress is not differentiable at those \(x\) for which one or more of the \(x'A_kx\) are zero. This can create problem for gradient-based algorithms, because the relevant derivatives may not exist and/or they may not vanish at stationary points.

It is shown, however, in De Leeuw (1984) that the directional derivative \[\begin{equation} \mathbf{D}\sigma(x,y)=\lim_{\epsilon\downarrow 0}\frac{\sigma(x+\epsilon y)-\sigma(x)}{\epsilon} \end{equation}\] exists for all \(x\) and \(y\). At a local minimum, i.e. an \(x\) for which \(\mathbf{D}\sigma(x,y)\geq 0\) for each \(y\), De Leeuw (1984), and more generally De Leeuw (2018), shows that \(x'A_kx>0\), which implies that stress is differentiable at local minima.

We now generalize the result from De Leeuw (1984) to a much more general class of functions of the \(\sqrt{x'A_kx}\).

**Theorem 1: [Differentiability]** Suppose \(\sigma(x)=F(\sqrt{x'A_1x},\cdots,\sqrt{x'A_mx})\) with \(F\) continuously differentiable and \(\mathcal{D}_kF(z)<0\) if \(z_k=0\). Then \(\sigma\) is differentiable at local minima, and at local minima \(x'A_kx>0\).

**Proof:**

For the \(\sqrt{x'A_kx}\) we have

\[\begin{equation}\label{E:dist} \sqrt{(x+\epsilon y)'A_k(x+\epsilon y)}=\sqrt{x'A_kx}+\epsilon\left.\begin{cases}\frac{1}{\sqrt{x'A_kx}}x'A_ky& \text{ if }x'A_kx>0\\ \sqrt{y'A_ky}&\text{ if }x'A_kx=0\end{cases}\right\}+o(\epsilon). \end{equation}\]

Thus we find for the directional derivative of \(\sigma\)

\[\begin{multline}\label{E:dir} \mathbf{D}\sigma(x,y)= \sum_{x'A_kx>0}\mathcal{D}_kF(\sqrt{x'A_1x},\cdots,\sqrt{x'A_mx})\frac{1}{\sqrt{x'A_kx}}x'A_ky+\\\sum_{x'A_kx=0}\mathcal{D}_kF(\sqrt{x'A_1x},\cdots,\sqrt{x'A_mx})\sqrt{y'A_ky}. \end{multline}\]

If the assumption of the theorem is true there is an \(y\) such that the second term on the right in \(\eqref{E:dir}\) is negative. If for that \(y\) the first term is positive we change \(y\) to \(-y\). The first term becomes negative, the second term remains negative. Thus the derivative in the direction \(y\) is negative and \(x\) cannot be a local minimum. \(\blacksquare\)

**Corollary 1: [Separable]** Suppose \(\sigma(x)=\sum_{k=1}^m f_k(\sqrt{x'A_kx})\) with the \(f_k\) continuously differentiable and \(Df_k(0)<0\). Then \(\sigma\) is differentiable at local minima, and at local minima \(x'A_kx>0\).

**Proof:** In this case \[
\sum_{x'A_kx=0}\mathcal{D}_kF(\sqrt{x'A_1x},\cdots,\sqrt{x'A_mx})\sqrt{y'A_ky}=\sum_{x'A_kx=0}\mathcal{D}f_k(0)\sqrt{y'A_ky},
\] which can be made negative for some \(y\). \(\blacksquare\)

**Corollary 2: [Least Squares]** Suppose \(\sigma(x)=\sum_{k=1}^m w_k(\delta_k-\phi(\sqrt{x'A_kx}))^2\) with \(\phi\) continuously differentiable, \(\phi(0)=0\), and \(\mathcal{D}\phi(0)>0\). Then \(\sigma\) is differentiable at local minima, and at local minima \(x'A_kx>0\).

**Proof:** In this case \[
\sum_{x'A_kx=0}\mathcal{D}_kF(\sqrt{x'A_1x},\cdots,\sqrt{x'A_mx})\sqrt{y'A_ky}=-\sum_{x'A_kx=0}w_k\delta_k\mathcal{D}\phi(0)\sqrt{y'A_ky},
\] which can be made negative for some \(y\). \(\blacksquare\)

**Remark 1: [RStress]** The result of corollary 2 can be applied to rStress (De Leeuw, Groenen, and Mair (2016)), defined as \[
\sigma_r(x)=\sum_{k=1}^m w_k(\delta_k-(\sqrt{x'A_kx})^r)^2
\] with \(r>0\). Thus \(\phi(z)=z^r\) and \(\mathcal{D}\phi(z)=rz^{r-1}\). But if \(r<1\) then \(\phi\) is not differentiable at zero and our results do not apply. If \(r>1\) then \(\mathcal{D}\phi(0)=0\), rStress is differentiable everywhere, and again our results do not apply (although obviously if \(r>1\) then rStress is differentiable at local minima, even if \(x'A_kx=0\)). The only rStress covered by our result is Kruskal’s stress, with \(r=1\).

De Leeuw, J. 1984. “Differentiability of Kruskal’s Stress at a Local Minimum.” *Psychometrika* 49: 111–13. http://deleeuwpdx.net/janspubs/1984/articles/deleeuw_A_84f.pdf.

———. 2018. “Differentiability of Stress at Local Minima.” 2018. http://deleeuwpdx.net/pubfolders/zero/zero.pdf.

De Leeuw, J., P. Groenen, and P. Mair. 2016. “Differentiability of rStress at a Local Minimum.” 2016. http://deleeuwpdx.net/pubfolders/rStressDiff/rStressDiff.pdf.

Kruskal, J.B. 1964a. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis.” *Psychometrika* 29: 1–27.

———. 1964b. “Nonmetric Multidimensional Scaling: a Numerical Method.” *Psychometrika* 29: 115–29.