Machine learning/Classification algorithms

Classification is a subcategory of supervised learning problems.

k-nearest neighbor

a simple classification algorithm
Intuition: Find the majority vote in the training data
This is a discriminative model, meaning that there is no way to generate the training data points

Algorithm

Define some distance metric or similarity metric. The simplest case is Euclidean distance.
Given some input point $x$ , find the $k$ 'th nearest neighbors from the training set.
Do a majority vote between these nearest neighbor list and classify the input point as the category with highest number of vote.

Probabilistic interpretation

Consider the classification output as a random variable $y$ . Define probability of $y$ given input $x$ and training data $D$ is

$P(y|x,D)={\text{fraction of points }}x_{i}{\text{ in }}k{\text{-th nearest neighbor points to }}x{\text{ such that }}y_{i}=y$ The output of the classification is

${\hat {y}}=\arg \max _{y}P(y|x,D)$ Read more about probabilistic interpretation here:

https://www.cc.gatech.edu/~afb/classes/CS7616-Spring2014/slides/CS7616-13a-PKNN.pdf