The algorithm first converts high-dimensional Euclidean distances between data points into conditional probabilities that represent similarities. It then defines a similar probability distribution in a lower-dimensional space and minimizes the difference between these two distributions using gradient descent. This process emphasizes the local structure of the data, making clusters more apparent.