Key Points

- The paper examines the use of cosine similarity in quantifying semantic similarity between high-dimensional objects through learned low-dimensional feature embeddings. It highlights how the application of cosine similarity can result in arbitrary and potentially meaningless similarities, contrary to unnormalized dot products between embedded vectors.

- The study focuses on linear models due to their closed-form solutions, allowing for theoretical understanding of the limitations of cosine similarity applied to learned embeddings. It specifically looks at matrix factorization models and linear autoencoders to analyze the implications of cosine similarity.

- The paper explores the impact of regularization on the utility of cosine similarity, discussing two commonly used regularization schemes and their effects on the models. It illustrates how the choice of regularization can lead to arbitrary results and non-uniqueness in cosine similarities.

- Analytical solutions derived for linear matrix factorization models reveal the dependence of cosine similarities on arbitrary diagonal matrices and the influence of normalization on learned embeddings, leading to non-unique cosine similarities.

- The study experimentally demonstrates the variability and non-uniqueness of cosine similarities in linear matrix factorization models based on different modeling choices and regularization techniques, cautioning against blindly using cosine similarity.

- The paper proposes potential remedies and alternatives to mitigate the issues associated with cosine similarity, including training models with respect to cosine similarity, layer normalization, and avoiding the embedding space.

- Experimental results on simulated data illustrate the dependency of cosine similarities on the method and regularization technique, emphasizing the need for caution when using cosine similarity and suggesting approaches to address the limitations.

- The authors caution that while the paper focuses on linear models, similar problems may exist in deep models due to the combination of various regularization methods applied in deep learning, potentially leading to even more opaque effects on cosine similarities.

- The study highlights the importance of understanding the limitations and potential drawbacks of using cosine similarity for semantic similarity in learned embeddings, offering insights and cautions for future research and practical applications.

Summary

Application of Cosine-similarity in Quantifying Semantic Similarity
The paper addresses the application of cosine-similarity in quantifying semantic similarity between high-dimensional objects, particularly in the context of learned low-dimensional feature embeddings. It explores the potential advantages and disadvantages of using cosine-similarity compared to unnormalized dot-products. The authors caution against blindly using cosine-similarity and suggest possible alternatives.

Impact of Regularized Linear Models on Meaningfulness of Similarities
The paper delves into the impact of regularized linear models on the meaningfulness of similarities. It discusses how cosine-similarity can yield arbitrary and meaningless similarities, especially in embeddings derived from regularized linear models. The authors highlight that for some linear models, the similarities are not even unique, while for others, they are implicitly controlled by the regularization. The study focuses on linear models due to their closed-form solutions and provides analytical insights into the limitations of the cosine-similarity metric applied to learned embeddings.

The implications of the paper caution against blindly using cosine-similarity, as the learned embeddings have a degree of freedom that can render arbitrary cosine-similarities, even though their unnormalized dot-products are well-defined and unique. The authors outline remedies and alternatives to improve the meaningfulness of similarities, such as training the model with respect to cosine similarity and layer normalization. Additionally, the paper emphasizes the importance of considering normalization before or during learning to enhance semantic similarities.

The experimental findings on simulated data illustrate the variability of item-item cosine similarities based on different modeling choices and regularization techniques. The paper concludes by cautioning against blindly using cosine-similarity, particularly in deep models with various regularization methods, as the implicit scaling of the learned embeddings may affect the resulting cosine similarities opaquely.

Reference: https://arxiv.org/abs/2403.054...