Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory has a pdf version, the bib file, and the complete Rmd file.

1 Introduction

Suppose \(\mathcal{E}\) is a finite-dimensional Euclidean space with inner product \(\langle\bullet,\bullet\rangle\) and corresponding norm \(\|\bullet\|\). Also \(\mathcal{S}\) is the unit sphere in \(\mathcal{E}\), and \(\mathcal{B}\) is the unit ball.

The Gifi framework for descriptive multivariate analysis (Gifi (1990), Michailidis and De Leeuw (1998), De Leeuw and Mair (2009)) covers both linear and bilinear nonmetric multivariate methods. In linear nonmetric techniques such as nonmetric regression, discriminant analysis, and additivity analysis we have to solve the problem of minimizing \(\sigma(y,x)=\|x-y\|^2\) over \(y\in\mathcal{L}\) and over \(x\in \mathcal{K}\cap\mathcal{S}\), where \(\mathcal{K}\subseteq\mathcal{E}\) is a closed and pointed polyhedral convex cone and \(\mathcal{L}\subseteq\mathcal{E}\) is a linear subspace. In bilinear nonmetric methods such as principal component and canonical analysis this minimization problems has to be solved many times.

The standard way to solve this minimization problem is to use alternating least squares. We alternate finding the optimal \(y\) for given \(x\) and the optimal \(x\) for given \(y\) until convergence. The optimal \(y\) for given \(x\) is just a simple linear least squares problem. The optimal \(x\) for given \(y\) is a normalized cone regression problem (De Leeuw (1975)). It can usually be solved by first projecting \(y\) on the cone \(\mathcal{K}\) and then normalizing the projection to unit length.

It has typically not been emphasized in the data analysis literature that normalized cone regression can go astray, especially in early iterations and in situations where the least squares fit is very bad. In these, admittedly rare, cases the projection of \(y\) on the cone \(\mathcal{K}\) is the zero vector, and we cannot normalize a zero vector.

2 History

Gifi (1990) briefly discusses the problem on pages 529-530, which happen to be the very last pages of the book, right before the references and the index. We give the relevant quotation, for those who do not want to spend the $ 330.95 to buy the book.

In the book we sometimed use normalized cone regression. This can be either one of tow things. In the first place minimization of \((y-x)'W(y-x)/x'Wx\) over \(x\) in a cone \(C\). This called implicit normalization. This name suggests there is also something like explicit normalization. This is minimization of \((y-x)'W(y-x)\) over all \(x\in C\) that satisfy in addition x’Wx=1. It is basic result of Kruskal and Carroll (1969) that in this simple case implicit normalization, explicit normalization, and no normalization all give essentially the same solution. All solutions are proportional to the projection of \(y\) on the cone, with only the proportionality constant different for the different problems. The result does not rely on convexity. There is one exception which should be noted. If \(y\) is in \(C^\ominus\), i.e \(y'Wz\leq 0\) for all \(z\in C\), then the origin is the projection of \(y\) on \(C\). In the normalized problems the solution of \(x\) is one of the extreme rays of \(C\), suitable normalized. An extreme ray is any ray in the cone that cannot be written as a nonnegative linear combination of two other rays in the cones. If \(C\) is a subspace and \(y\) is in the \(W\)-orthogonal complement, then the infimum in the implicitly normalized problem is one - not attained, but approached by letting \(x\rightarrow\infty\). The infimum in the explicitly normalized problem is attained for any \(x\in C\) with \(x'Wx=1\); it is equal to \(1+y'Wy\). If \(C\) is the cone used in monotone regression, then the extreme rays are the vectors with the first \(k\) element equal to zero and the last \(n-k\) elements equal to one \((k=1,\cdots,n-1)\) together with the vectors \(u\) and \(-u\) which span the intersection of \(C\) and \(C^\ominus\). Thus the cone is not pointed. The normalized regression problem must be solved by testing all these rays and by keeping the best one.

Example 5.5.2 in Lange (2016) discusses the special case in which \(K\) is the non-negative orthant. On page 142 we see

The constraint set \(C\) in question is the intersection of the unit sphere and the nonnegative orthant. Projection of an external point \(y\) onto \(C\) splits into several cases. When all components of \(y\) are negative, \(P_C(y)=e_i\), where \(y_i\) is the least negative component of \(y\), and \(e_i\) is the standard unit vector along coordinate direction \(i\). The origin \(\mathcal{O}\) is equidistant from all all points of \(C\). If any component of \(y\) is positive then the projection is constructed by setting the negative components of \(y\) equal to 0 and standardicing the truncated version of \(y\) to have norm 1.

Neither Gifi nor Lange provide an actual proof for their statements. Exercise 5.6.14 in Lange (2016) asks the reader to come up with a proof. All the machinery and proofs one can possibly want were recently made available in Bauschke, Bui, and Wang (2018). That article provides a very thorough and complete analysis of projection on the intersection of a cone and a sphere (or a ball). On page 2158 the authors say

Inspired by an example in the recent and charming book [12], our aim in this paper is to systematically study the case in which \(K\) is a closed convex cone and \(S\) is either the (convex) unit ball or (nonconvex) unit sphere centered at the origin.

The “recent and charming” book in the quotation is, of course, Lange (2016). The authors do not refer to Gifi, which is probably because Gifi’s book is neither recent nor charming.

3 Unnormalized Cone Projection

We review some basic facts about unnormalized cone projection problems. Any book on convex analysis (for example Rockafellar (1970)) will provide the necessary discussion and proofs.

Definition 1: [Projection] The projection \(P_A(x)\) of \(x\) on a set \(A\subseteq\mathcal{E}\) is the set of all \(y\in A\) such that \(\|x-y\|=\min_{z\in A}\|x-z\|\). Projections need not be singletons and can be empty.

Definition 2: [Polar Cone] The polar cone \(K^\ominus\) of a cone \(K\) is the set of all \(y\) such that \(\langle x,y\rangle\leq 0\) for all \(x\in K\).

Result 1: [Cone Projection]

  1. If \(K\) is convex the projection is a singleton. In that case we also use \(P_K(x)\) for the unique element of the projection.

  2. \(y=P_K(x)\) if and only if \(\langle y,x-y\rangle=0\) and \(\langle x,x-y\rangle\leq 0\) for all \(x\in K\). Thus \(y=P_K(x)\) if and only if \(x-y=P_{K^\ominus}(x)\).

  3. \(x=P_K(x)+P_{K^\ominus}(x)\) or \(P_{K^\ominus}(x)=x-P_K(x)\).

  4. \(\|x\|^2=\|P_K(x)\|^2+\|P_{K^\ominus}(x)\|^2\)

  5. The projection \(P_K(x)\) of \(x\) on \(K\) is zero if and only if \(x\in K^\ominus\).

Here are two more definitions for later use.

Definition 3: [Negative of a Cone] The negative \(K^-\) of a cone \(K\) is the set of all \(y\) such that \(y=-x\) for some \(x\in K\).

Definition 4: [Double of a Cone] The double \(K^\star\) of a cone \(K\) is the double cone \(K\cup K^-\).

In the figure below the cone is red, the polar is the union of green and blue, the negative is blue, and the double is the union of red and blue.