Determinant (~10 mins)

  • The factor by which a linear transformation changes any area is called the Determinant of that transformation.
  • If the determinant is 0, it shows that the associated transformation has squished the area into a lower dimension with resultant area as zero.
  • If determinant is negative, it shows that the space gets flipped over.
  • For 3 dimensions, determinant gives you the factor by which the volume gets scaled.

Median, Percentiles, Quantiles (~9 mins)

  • 20th Percentile means there is roughly 20% of points less than this value and 90% of points greater than this value.
  • 50th percentile is same as median.
  • 25th, 50th, 75th and 100th percentiles are called Quantiles.
  • 1st Quantile is 25th Percentile, 2nd Quantile is 50th percentile, etc.

Python code for finding percentiles, quantiles and MAD (Median Absolute Deviation) of iris dataset and other plots.

IQR (Inter Quartile Range) and MAD (Median Absolute Deviation) (~6 mins)
  • MAD (Median Absolute Deviation) is one idea that is equivalent to the Standard Deviation.
  • Some people use Inter Quantile Range as an approximation for standard deviation; but MAD is a better substitute for standard deviation.

Median Absolute Deviation wiki page.
Pair plot is an example for bivariate analysis (looking at two variables and comparing them).

Box Plot (~9 mins)
Violin Plot (~4 mins)
Contour plot and multivariate prob. density (~9 mins)

What are vectors? (~10 mins)

  • Vector - think about it as an arrow that sits in a coordinate system with the tail at the origin.
  • For a physics student perspective, you can have a vector anywhere in the space and they are all the same as long as its magnitude and direction are the same. In linear algebra, it's almost always the case that the vector will be rooted at the origin.
  • Scaling of vectors and scalars.
  • Almost all of linear algebra topics tend to revolve around two topics: vector additions and scalar multiplications.

Cumulative Distribution Function (~15 mins)
Python code for generating cdfs for iris dataset.

Dot products and duality (~15 mins)

  • v dot w: (length of projected w on v) * (length of v).
  • When v and w are perpendicular, the length of projection of w on to v is 0 and so dot product is 0 * (length of v) = 0.
  • If you have some kind of linear transformation whose output space is the number line, there is going to be a unique vector corresponding to that transformation, in the sense that applying the transformation is the same thing as taking a dot product with that vector.
  • Dot product is useful to decide whether the vectors tend to point in the same direction.

Gaussian/Normal Distribution (~26 mins)
CDF of Gaussian distribution (~11 mins)

  • 68-95-99.7 rule of guassian distribution
  • Normal distribution is important because of the Central Limit Theorem (take samples of some definite size from a distribution of any type; find the mean of these fixed set of samples that you collected; plot this mean as a point on another graph. Keep repeating this proceudre many times; and you will find that the new graph follows normal distribution, with its mean value closer to the mean of the orginial arbitrary distribution that we sampled from).
  • Distribution sampling.
  • Standard normal distribution: mean as 0 and standard deviation as 1.

Nonsquare matrices as transformations between dimensions (~5 mins)

Linear combinations, span, and basis vectors (~10 mins)

  • Think of each coordinate as scalar, which stretches or squishes the fundamental vectors i_hat along x direction and j_hat along y direction (basis vectors).
  • Any time you describe vectors numerically, it depends on an implicit choise of what basis vectors you are using.
  • 'linear' combination: if you fix one of the scalars and let the other one change its value freely, the tip of the resulting vector draws a straight line.
  • If you let both scalars (in a*v_cap + b*w_cap) change freely, you can reach any point on the two dimensional vector space --- called the 'span' of those two vectors.
  • Span of two vectos is basically a way of asking what are all the possible vectors you can reach using the two fundamental operations of vector addition and scalar multiplication.
  • If you are dealing with a collection of vectors, it is convenient to think of them as just points.
  • Linearly dependent and linearly independent vectors.

Note: This video is awesome. 

Mean, Variance and Standard Deviation (~15 mins)

  • Mean tells about the central tendency of the data
  • A single outlier can corrupt the mean value of the entire set of numbers.
  • Setosa petal-length spread (standard deviation) is much lesser than those of Versicolor and Virginica.
  • A good property of median is, if you have a small number of erroneous outliers, it wouldn't corrupt your median by a large value.
  • If more than fifty-percent of your observation get corrupted, your median gets corrupted.

Meadian (~10 mins)
  • Median values tends to be similar to mean values and central tendency of data set.
  • Unlike Mean, Median doesn't change much in the presence of outliers.

Python code for finding mean, variance, standard deviation and median of iris dataset.

Matrix multiplication as composition (~10 mins)

  • Composition of two (or more) linear transformations.
  • Multiplying two matrices have the geometric meaning of applying one transformation (represented by the first matrix) and then followed by the second transformation (represented by the second matrix).
  • Order of matrix multiplication matters. AB not equal to BA.

Machine Learning - Part 3 

Entropy, cross-entropy, kl-divergence (~11 mins)

  • When you communicate a message, we want as much useful information to pass through.
  • By Shannon's theory, to transmit one bit of information means to reduce the recipient's uncertainity by a factor of 2.

3D linear transformation (~4 mins)

Linear System of Equations - using matrices (~12 mins)

  • Linear system of equations when converted to the from Ax = v, can be interpreted as find vector 'x' such that when 'x' gets transformed by the matrix A, it coincides with vector 'v'.
  • When determinant is non-zero, there will be only one vector 'x' that lands on 'v' when transformed using 'A'.
  • A-inverse is the unique transformation with the property that when you first apply the transformation 'A' and then follow it with the tranformation represented with A-inverse, you end up back where you started.
  • When the determinant is zero and the transformation associated with this system of equations squishes space into a smaller dimension, there is no inverse. You cannot unsquish a line to turn it into a plane.
  • When the output of a transformation is a line, meaning it is one dimensional, we say that the tranformation has a rank of 1.
  • If the output of the tranformation is two-dimensional plane, we say that the transformation has a rank of two.
  • Rank is the number of dimensions in the column space.
  • For a 2x2 matrix, rank 2 is the best that it can be in and for a 3x3 matrix, rank 3 is the best that it can have.
  • When we say 3x3 matrix has a rank 2, it means after transformation, it has collapsed to a plane. But not as much as it would have collapsed in a rank 1 situation (which is just a line).
  • Null space / Kernel: it's the space of all vectors that become null (or become squished on to the origin after transformation).

Linear Transformations and matrices (~10 mins)

  • Transformation -- fancy word for a function that takes in some vector and spits out some other vector. Imagine the input vector moving (transforming) over to the output vector.
  • Linear algebra limits itself to a special type of transformation - the linear transformations.
    • all lines must remain lines without getting curved after the transformation
    • the orgin should remain fixed.
  • A linear transformation should keep the grid lines parallel and evenly spaced.
  • Find out where the i_hat and j_hat lands after the transformation. The rest of the vectors that gets trasformed will be the same linear transformation of this new_i_hat and new_j_hat.
  • Matrix multiplication is basically doing linear transformation on the vectors represented as matrices.
  • Every time you see a matrix, you can interpret it as a certain transformation of space.