Distance of a point from a plane. Half-spaces. Checking in which half-space of a hyperplane a point lies based on the sign of the distance. (~10 mins)

Vector addition. Dot product (wen don't use cross product much in ML). Transpose of column vectors. (a.b = a^T x b). Dot product is a scalar quantity. Representation of length of vector 'a' from origin (||a||). Angle between two vectors (a.b = ||a|| x ||b|| x cos(theta)) If you are given the components of two vectors, you can easily find the angle between them. Orthogonal vectors (90 degree angle; cos(90) = 0). If dot product between two vectors is 0, they are perpendicular to each other. Dot product of vector 'a' with itself. (~13 mins)

Projection of vectors. Unit vectors. (~4 mins)

Vector projection

Column/feature standardization. (~15 mins)

Column standardization more used than column normalization. Mean is set to zero and standard deviation set to 1. Geometric interpretation of column-standardization. Move the mean to origin and squish/expand the data so as to have the SD as 1.

Line, plane, hyperplane, vector notation of hyperplane, hyperplane passing through origin. Interpretation of vector ''w' in the hyperplane equation. (~22 mins)

Concept of point / vector in linear algebra. Components of a vector. Distance of a point from the origin in an n-dimensional space. Distance between two points in n-dimensional space. Row and column vectors. (~ 13 mins)

Explanation of the MNIST dataset (~19 mins)

Useful blog of Christopher Olah: http://colah.github.io/posts/2014-10-Visualizing-MNIST/

Each image of the digit is 28 x 28 pixels. We have 60K images for training data set and 10K for test data set. Flattening the input image matrix (28 x 28) into a column matrix (784 x 1).

Mean of data matrix (~6 mins)

Mean vector; geometrically, equivalent to a central data point.

Dimensionality Reduction and visualization-introduction (~2 mins)

Row and Column vector representation of an n-dimensional data point. (~4 mins)

Covariance of a data matrix. (~23 mins)

If the data matrix has dimension (n x d), where d is the number of features, the the covariance matrix will have the dimension (d x d).

Two important properties: cov(x, x) = variance(x), and, cov(x, y) = cov(y, x).

If the features are standardized with mean as 0 and standard-deviation as 1, then the covariance of two features f1 and f2 is cov(f1, f2) = (1/n) * dot_product(f1, f2).

If the data matrix 'X' is standardized (mean 0 and sd as 1), then covariance matrix of X can be found out as (1/n) * (X^T) * (X).

Ellipse (2d), Ellipsoid (3d), Hyper-ellipsoid (n-dimension). (~5 mins)

Squares and Rectangles. Axis parallel rectangles. (~5 mins)

Cube, Cuboid, Hyper-cuboid. (~2 mins)

Data preprocessing - column normalization. (~20 mins)
Getting rid of scale; putting all values in the range between 0 and 1.
Squishing all of the data points into a unit hyper-cuboid.

PCA - Principal Component Analysis (~5 mins)

Geometric interpretation of PCA (~14 mins)

If you want to get rid of a dimension from data, chose the one with less variabiliy. Preserve the dimension with the more spread. Because spread or variance is a measure of information.

Rotate the original axis such that it is in the direction of maximum variance of data. Then you can project all the points to this new axis and discard other axis perpendicular to it. Basically, we want to find a direction 'd' such that the variance of data points projected on to the new axis in this direction is maximal.

Machine Learning - Part 1

Why use linear algebra with Machine learning; An example data set with two set of features. How to generalize the categorization of features in 2D, in 3D and higher dimenstions. (~3 mins)

Circle in 2D. How to determine if a point p(x1, x2) lies inside the circle or outside the circle. Generalization to Sphere and Hypersphere. (~6 mins)

Representing a dataset as a set of points. (~3 mins)

Representing a dataset as a matrix. (~6 mins)

Fisher's Iris dataset Wikipedia page.

Python code to load the MNIST dataset (~11 mins)

Data: https://www.kaggle.com/c/digit-recognizer/data

Mathematical Objective function of PCA (Principal Component Analysis) (~13 mins)