Overview, 'the sapience', interdisciplinary, Turing Test, the setbacks in AI, the conference at Dartmouth College (1956), an AI Koan, Arthur Samuel, Allen Newell, John McCarthy, Herbert Simoann, AI Winter, expert systems, 'Can a machine be intelligent?', philosophy and ethics of AI, knowledge representation, planning, learning, NLP, perception, approaches, cybernetics and brain simulation, symbolic AI, cognitive simulation, logic-based, knowledge-based, syb-symbolic, neuroscience, neurons, Nets vs. Scruffies, soft computing, statistical, intelligent agent paradigm, search and optimization, logic, probabilistic methods for uncertain reasoning, bayesian networks, probabilistic algorithms, classifiers and statistical learning methods, neural networks, perceptron, Paul Werbos and backpropogation of errors, progress in AI, CAPTCHA, spam fighting, robotics, machine translation, game playing, logistic planning, speech recognition, robotic vehicles.
Overview, pathfinding and graph traversal, Peter Hart, Nilsson, Raphael, description, cost function, heuristic, history, process, algorithm in detail, comaparing dijkstra's algorithm and BFS, A* algorithm, A*s use of heuristic, speed or accuracy, scale, a patfinding cat example, the open and closed list, path scoring, more about g() and h(), algorithm description in words, the cat's path step-by-step, a non-visionary cat, the formal algorithm definition, example: route finding from Arad to Bucharest, straight-line-distance heurestic, conditions for optimality, admissible heuristic, consistency/monotonicity, special cases, complexity of A*, applications. You can find the Code for the Adpative A* here.
Introduction - random process, memorylessness property, discrete time Markov chain, V property, Drunkard's walk, 'the transition probabilities depend only on the current position, and not in the manner in which the position was reached', Markov chain example - dietary habits of creatures, Example 1 - Board games played with dice, Example 2 - A Centre biased random walk, Example 3 - A very simple weather model, Example 3 (Error :( )- Citation Ranking - Google's page ranking algorithm, Example 4 - A hypothetical Stock market, Example - Ornage juice company, Regular Markov Chains, Stationary matrices, Steady state Markov chain, Example - a company brands A & A' using advertising campaign and market share, "Does every Markov Chain have a unique stationary matrix?", Regular Markov chain, example; properties of regular Markov chains. Absorbing Markov Chains, definition, recognizing absorbing Markov chains, standard forms, Limiting matrices for absorbing Markov chains; properties of limiting matrix, Fundamental matrix; Example - 'A credit union classifies automobile loans...', Empirical Probability, Law of Large Numbers (LLN); average dice value against number of rolls, Diffusion - LLN in Chemistry; history of LLN; Simulation illustrating law of large numbers; Borel's LLN
Monte Carlo - An approximation method based on sampling, Monte Carlo simulations; 'Monte Carlo not a single method but a whole set of stochastic methods', Comparison with traditional method and method using probability distributions, Problem classes where MC method is used - optimization, numerical integration & generation of draws from a probability distribution, space and Oil exploration problems, physics-related problems; History of Monte Carlo, Buffon's needle experiment - geometric probability, Enrico Fermi - in neutron diffusion, Stanislaw Ulam - at Los Alamos National Laboratory; Monte Carlo Casino; Manhattan project; John Von Neumann, Pictures and Quotes; Solitaire and Monte Carlo; Klara Von Neumann, Calculating the value of Pi using Monte Carlo (46 pages); instead of a whole cirle, we may also use quarter of a circle, Law of Large Numbers (LLN) (16 pages); Monte carlo simulation for events having two outcomes - an example using coin tosses; 100 coin tosses, C++ program to emulate the coin toss experiment, Why is sampling important? Pros and cons of sampling, Monte Carlo and Random Numbers; pseudo-random sequences; middle-square method, 'Anyone who considers arithmetical methods of producing random digits is, of course, in the state of sin' -- J. V. Neumann, Applications of Monte Carlo class of Algorithms - physical sciences, engineering, computational biology, computer graphics, applied statistics, artificial intelligence for games, design and visuals, finance and business, use in mathematics, monte carlo integration, curse of dimensionality, markov chain-monte carlo.
Introduction, History, Strictly Determined Games, Saddle Value, Problems on Strictly Determined Games, Non-strictly Determined Games, Finding the optimal strategies, Example, The Expected Value of a Game, Fundamental Theorem of Game Theory, Solution to a 2x2 non-strictly determined matrix game, Non-strictly determined matrix games - Recessive rows & Recessive columns.
Overview, belief state, automated taxi system example, qualification problem, Toothache example, uncertainty and rational decisions, utility theory, decision theory = probability theory + utility theory, basic probability notations, sample space, the basic axioms of probability, events, propositions, unconditional/prior probabilities, conditional/posterior probabilities, P(cavity|tootchache), language of propositions in probability assertions, random variables, domains of random variables, connectives of propositional logic, probability distribution, probability density function, joint probability distribution, product rule, probability axioms and their reasonableness, inclusion-exclusion principle, Kolmogrov's axioms, where do probabilities come from?, requentist interpretation, objectivist view, subjectivist view, reference class problem, "every single thing or event has an indefinite number of properties or attributes observable in it and might therefore be considered as belonging to an indefinite number of different classes of things...", principle of indifference (principle of insufficient reason), inference using full joint distributions, unconditional/marginal probability, maginalization rule, product rule, conditioning rule, normalization constant (alpha), table size of full joint distributions can get to 2^n (for boolean random variables), independence, marginal/absolute independence, bayes' rule, P(Y|X)=P(X|Y)*P(Y)/P(X), applying bayes' rule, P(cause|effect)=P(effect|cause)*P(cause)/P(effect), medical diagnosis example - meningitis and stiff neck example, 'why one might have available the conditional probability in one direction but not the other?', another example for bayes' rule - Weather-Rain-Dry, combining evidence, conditional independence between 'catch' and 'toothache' given 'cavity', tables being decomposed into smaller ones using conditional independence rules, naive bayes' model.
Definition, History, Derivation, Example 1: Red and Blue balls in 3 bowls, >A more real life example: Applying Bayes' theorem & Breast Cancer., Example 2: Probability that the person on the train was speaking to a man or a women, Example 3: A Frequentist Example, Example 4: Coin flip example, Example 5: Drug Testing, Example 6: Archaeologist example (Continuous Random variable), Computer applications of Bayes' model, International Society for Bayesian analysis, Bayesian Search Theory
Conditional Independence, conditional probability distribution, example 1, example 2, applying chain rule to the traffic example, use in bayesian inference, probabilistic models, chain rule, bayesian network, full joint distribution tables, example 1 - car insurance, example 2 - diagnosing a car not starting, compact representation, sparse set of interactions, graphical model notation, conditional independence, example, traffic example, alarm network, joint distribution for conditionally dependent variables, probabilities in Bayesian network, probability distribution for rain-traffic example, alarm network, traffic example again, 'a given joint distribution can be represented by more than one baye's net', causal models, cusality, 'what arrows really mean is conditional independence', example - rain-sprinkler-wetGrass, finding P(G, S, R), P(A)=Products_of(p(Ai|parents(Ai))), inference complexity and approximation algorithms, applications, history, Judea Pearl (Israeli American, MS from Rutgers :)), naive Bayesian net.
Alarm network, size of baye's net, exponenetial in the terms of the number of parents, conditional independence, example, causal chain configuration, common cause configuration, common effect configuration, general case, example 1, example 2, example 3, example 4, structure implications, d-separation algorithm, computing all independences, topology limits distributions, Bayesian inference, inference by enumeration - will contain 2^H summation terms (where H is the number of hidden variables), how to improve?, inference by variable elimination, example, joining factors, eliminate operation, multiple elimination, interleaving join and elimination operations, example, what if you have evidence variable included in the query?, the general variable elimination algorithm, example, computational complexity of the variable elimination technique depends critically on the size (no. of rows) of the largest factor, examples 1, example 2, Is it possible that the best ordering is still bad?, inference by Bayes' net is NP-complete.
A two-player adversarial game where each player moves his token to an open adjacent space in either direction. If an opponent occupies an adjacent space, then the player may jump if needed.
Overview, 'despite their naive design and apparently oversimplified assumptions, naive bayes classifiers have worked quite well in many compless real-world situations', probabilistic model, conditional probability, joint probability, derivation, constructing the classfier from the probability model, Maximum A Priori (MAP), log of probabilities, curse of dimensionality, example - sex classfication into males and females, gaussian naive bayes, multinomial distribution, handwritten digit recognition, features - aspect ratio, nummber of non-zero pixels, percentage of pixels above the horizontal split and vertical split, row and pixel count feature, character profile feature, slope detail feature, number of breaks in horizontal spaces and vertical spaces, face recognition and the features that can be used.
The code for handwritten digit classifier for naive bayesian can be found here. The training image set can be found here. It contains 5000 training images in text file format. The testing image set can be found here. It contains 1000 digits in text file format. The validation image set (with 1000 digits) can be found here. The success rate was 85.6% (for 1000 test images).
Introduction, why neural networks, advantages, comparison with conventional computers, learn by example, background, adaptive weights, biological neural networks, neuroscience, Santiago Ramon y Cajal, remarkable capabilities of the brain, computational neuroscience, cognitive psychology, cognitive science, neuron, structure of neuron, depolarization, refractory period, synapse, neurotransmittors, dendrites, axon, how the human brain learns, excitory/inhibitory inputs, neural circuit, synapse, Hebbian theory, synaptic plasticity, 'Organization of Behavior', by Hebb, 'Cells are fire together, wire together', engram, Walter Pitts and Warren McCulloch model, neural summation, hyperpolarization, threshold potential, action potential, resting potential, recovery period, LTP (Long Term Potential), history of artificial neural networks, perceptron, backpropagation algorithm (Werbos in 1975), incapability of processing the XOR circuit, recurrent neural networks, Long short-term memory (LSTM, Alex Graves), Yan LeCun, Artificial Neurons, a simple artificial neuron, weigted sum of inputs, y_i = f(Sigma(w_ij * y_i), activation function, identity function, collective behavior, feedforward configuration, intput-outpu-intermediate layers, classifier/prediction ANN, training and recognition phases, example, application of ANN to memory design, firing rules, hamming distnace technique, example, pattern recognition example, MCP neuron, architecture of neural networks, network layers, perceptrons, mammalian visual system, learning process, associative mapping, fixed and adpative networks, supervised and unsupervised learning, error convergence, transfer function, linear units, threshold units, sigmoid units, example, backpropagation algorithm, error derivative of weights (EW), computing EA values, applications of Neural Networks
Neuron model, activation function, fitting model to data - an example, weight of the car and fuel consumption, loss function, linear model, parabolic error surface, non-linear models, gradient descent, local minima, an analogy, steepness of the hill, 'it's a neural net', linear neural nets, learning as an optimization problem, activation function, sigmoid function, logistic function, transfer function, prediction problems into two categories: classfication and regression, multilayer perceptrons (MLP), layered feedforward topology, number of input units, output units and hidden units, gathering data for neural nets, non-numeric data, nominal valued variables, number of cases required, curse of dimensionality, noise tolerancy of neural nets, training multilayer perceptrons, error functions, saddle point, global and local minimum/maximum, random initial configuration of the network, the backpropagation algorithm, overstepping the solution using large learning rate, momentum, epochs, over-learning and generalization, over-fitting and over-learning, low-order and high-order polynomials, test-set, selection error, training error, 'simple model is always preferable over complex models', validation set - used only once, re-sampling, monte-carlo and bootstrap, data selection, 'garbage in, garbage out', future is not the past, all eventualities must be covered, 'the network learns the easiest features it can', unbalanced data sets, false positives, insights into MLP training, sigmoid cliff response, steepness and orientation of the sigmoid cliff, steep slope corresponds to large weights, plateau structure, linear discriminant, confidence level for accept/reject, network output as probabilities, Other MLP training algorithms (BFGS, Scaled conjugate gradient algorithms), where do weights come from?, perceptron example, XOR network, Example network to explain the backpropagation algorithm.
Introduction, vlfeat.org/matconvnet, installation and setup, sample program with pre-configured imagenet and googlenet, MatConvNet at a glance, CNN computational blocks, CNN wrappers, speed comparisons, CNN toplogies - simple networks and DAGs, SimpleNN, pretrained models, difference between - semantic segmentation, object recognition/detection, semantic segmentation, ILSVRC, cnn_train.m, training a network from the scratch, convolution layer, maxpool layer, ReLU layer and fully connected layer, stride, filter depth, what is pooling?, average and max-pooling, sub-sampling ratio, softmax and loss layers, vl_simplenn_display(), 'in general, it is not always obvious what the right network architecture is', data depth, rf (receptive field) size, feed-forward nerual net, tutorial, vl_nnconv, vl_imarraysc, viewing the filters created, downsampling with stride option, padding the input, design a filter by hand, vl_relu, output after the relu operation, max and sum-pooling, pooling with a filter size of 15, vl_nnpool, normalization operation, Local Response Normalization (LRN) operator, vl_nnnormalize, back-propagation and derivatives, tutorial exercises, learning a tiny cnn, detecting blob-like structures, training the data and the labels, vl_imsmooth, image pre-processing steps, subtracting the median value from the image, 'centering the data usually makes learning problems much better conditioned', changing the learning rate and the momentum, learning a character cnn, training the network using input from Google fonts project, training the network with and without jitter, visualizing the learned filters, loading a pre-trained model, imagenet model - 37 layers, imagenet-vgg-verydeep-16.mat, tensorflow, ReLU non-linearity, reducing overfitting, importance of batch-size while training, epoch and batch-size, top-1 and top-5 error rate, epoch -- 'randomly shuffles the training data and partitions it into mini-batches of appropriate size', spiking neural nets (SNN)
Introduction, minimax algorithm, ypes of games, policy, deterministic games, zero sum games, general games, single agent game, single agent game tree, adversarial game trees, tic-tac-toe game tree, minimax search, playing opponenet plays optimally, minimize the maximum loss, Nim game, expectimax, tree pruning, formal definition of a game, ply, recursive definition of minimax, efficiency of minimax, depth-limited search, utility function, evaluation functions, starving of pacman agents, mini-max pruning, alpha-beta pruning, optimal decisions in multiplayer games, uncertainity and utility, worst-case vs. average case, expectimax search, expectimax pruning, depth-limited expectimax, modeling assumptions, mixed layer type game, backgammon, constraint satisfaction problem example.
Poisson, derivation, classic example - chance of Prussian cavalryman being killed by the kick of a horse, common probability distributions - the data scientist's crib sheet (sean owen), Guiness beer and Students-P, probable error of a mean, William Gosset, Weibull distribution, pdf, log-normal distribution, data transformations, gamma function and gamma distribution, Gamma(1, Lambda) = Exponential(Lambda), Normal distribution, linear transformation of normal random variable, exponential distribution -- time between events in poisson point process, Central limit theorem, 3-parameter Weibull distribution - significance, identifying distribution of data using minitab.
Feeforward networks, logistic function, universal approximation theorem, gradient descent, representation theorem, no-free-lunch theorem, generalization error, overfitting, bias-variance tradeoff, cross-validation, rprop - resilient backpropagation, example for backpropagation for a specific feeforward network, momentum, softmax function, rectifier in neural nets, beginner's guide to activation functions, PReLU, RReLU, ELU, Scaled Exponential Linear Unit (SELU), SReLU, SoftPlus, Bent Identity, when does deep learning works better than SVMs?, problems with sigmoid acitivation function, ReLU has 6 times more convergence rate than sigmoid, repeated matrix multiplications intervowen with activation function, setting the number of layers and their sizes while designing NNs, article on 'why deep learning is suddenly changing your life', cat experiment by Google, self-taught learning, human brain has over 100 trillion connections.
The loneliest neuron -- why every neuron doesn't know about everything; dopamine neurons, open-source tools for data science workflow, TensorFlow intro, dataflow programming, installing TF, Anaconda, what is a tensor?, rank of a tensor, shape of a tensor, data types in TF, evaluating tensors in TF with tensor.eval(), printing tensors, tf.contrib.layers.flatten, tf.contrib.layers.fully_connected, tf.reduce_mean, meaning of word 'logits' in tensorflow, sparse_cross_entropy_with_logits, logit value, sparse_softmax_cross_entropy_with_logits, one-hot encoding, softmax vs. sigmoid function in logistic classifier, multiclass vs multilabel problem, cross-entropy loss for two class neural networks with a single output, tf.argmax, dimension argument in tf functions, tf.global_variable_initializer, tf.Session, random.sample() method in python, zip lists in Python, numpy.concatenate, why normalize images before doing cnn, building a simple image recognition system with TensorFlow, cifar-10 dataset, tf.variable, making a fully connected layer with weight-matrix multiplication followed by bias-matrix addition, testing model's accuracy on the training batch, overfitting, tensorflow graph, Belgian traffic sign detection using a simple tensorflow model (just a fully connected layer) -- 62 categories/classes of traffic signs, using scikit-image package, visualizing input image using histogram, matplotlib.figure, rescaling images, converting to grayscale, rgb2gray(), tf.contrib.layers.fully_connected, tf.train.adamOptimizer.