The Clever Machine

Posts

p-Hacking 101: Data Peeking Dec 13, 2023 Dustin Stansbury
statistics hypothesis-testing ab-testing false-positive type-I-error p-hacking
“Data peeking” is the process of prematurely running statistical tests on your AB experiment data before data collection has reached the required sample size prescribed by power analysis. You may have heard of the dangers of data peeking, but may not have an intuition as to how dramatically it can inflate your False Positive rate, and thus mislead statistical inferences. In this post we’ll use simulations to demonstrate just how much data peeking can inflate false positives.
p-Hacking 101: N Chasing Oct 4, 2020 Dustin Stansbury
statistics hypothesis-testing ab-testing false-positive type-I-error p-hacking
”\(N\) Chasing,” or adding new observations to an already-analyzed experiment can increase your experiment’s false positive rate. As an experimenter or analyst, you may have heard of the dangers of \(N\) chasing, but may not have an intuition as to why or how it increases Type I Error. In this post we’ll demonstrate \(N\) chasing using some simulations, and show that, under certain settings, adding just a single data point to your experiment can dramatically increase false positives.
Who Needs Backpropagation? Computing Word Embeddings with Linear Algebra Sep 11, 2020 Dustin Stansbury
natural-language-processing word-embeddings information-theory pointwise-mutual-information linear-algebra singular-value-decomposition
Word embeddings provide numerical representations of words that carry useful semantic information about natural language. This has made word embeddings an integral part of modern Natural Language Processing (NLP) pipelines and language understanding models. Common methods used to compute word embeddings, like word2vec, employ predictive, neural network frameworks. However, as we’ll show in this post, we can also compute word embeddings using a some basic frequency statistics, a little information theory, and our good old friend from linear algebra, Singular Value Decomposition.
SVD and Data Compression Using Low-rank Matrix Approximation Aug 16, 2020 Dustin Stansbury
linear-algebra singular-value-decomposition low-rank-approximation data-compression image-compression
In a previous post we introduced the Singular Value Decomposition (SVD) and its many advantages and applications. In this post, we’ll discuss one of my favorite applications of SVD: data compression using low-rank matrix approximation (LRA). We’ll start off with a quick introduction to LRA and how it relates to data compression. Then we’ll demonstrate how SVD provides a convenient and intuitive method for image compression using a LRA.
Singular Value Decomposition: The Swiss Army Knife of Linear Algebra Aug 10, 2020 Dustin Stansbury
linear-algebra matrix-diagonalization singular-value-decomposition
Linear algebra provides a number powerful computational methods that are used throughout the sciences. However, I would say that hands-down the most versatile of these methods is singular value decomposition, or SVD. In this post we’ll dive into a little theory behind matrix diagonalization and show how SVD generalizes matrix diagonalization. Then we’ll go into a few of the properties of SVD and cover a few (of many!) cool and useful applications of SVD in the real world. In addition, each application will have its own dedicated post.
Efficient Matrix Power Calculation via Diagonalization Aug 8, 2020 Dustin Stansbury
linear-algebra matrix-diagonalization
Taking the power of a matrix is an important operation with applications in statistics, machine learning, and engineering. For example, solving linear ordinary differential equations, identifying the state of a Markov chain at time \(t\), or identifying the number of paths between nodes in a graph can all be solved using powers of matrices. In this quick post we’ll show how Matrix Diagonalization can be used to efficiently compute the power of a matrix.
Common Linear Algebra Identities Aug 5, 2020 Dustin Stansbury
derivation linear-algebra matrix-identities
This post provides a convenient reference of Linear Algebra identities used in The Clever Machine Blog.
Derivation: Ordinary Least Squares Solution and the Normal Equations Jul 23, 2020 Dustin Stansbury
ordinary-least-squares derivation normal-equations
Have you ever performed linear regression involving multiple predictor variables and run into this expression \(\hat \beta = (X^TX)^{-1}X^Ty\)? It’s called the OLS solution via Normal Equations. To find out where it comes from, read on!
Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff Jul 20, 2020 Dustin Stansbury
statistics classification regression bias-variance-tradeoff model-selection
In machine learning and pattern recognition, there are many ways (an infinite number, really) of solving any one problem. Thus it is important to have an objective criterion for assessing the accuracy of candidate approaches and for selecting the right model for a data set at hand. In this post we’ll discuss the concepts of under- and overfitting and how these phenomena are related to the statistical quantities bias and variance. Finally, we will discuss how these concepts can be applied to select a model that will accurately generalize to novel scenarios/data sets.
Supplemental Proof: The Expected Value of a Squared Random Variable Jul 19, 2020 Dustin Stansbury
statistics derivation expected-value
We want to show the following relationship:
A Gentle Introduction to Artificial Neural Networks Jul 13, 2020 Dustin Stansbury
neural-networks gradient-descent backpropagation classification regression deep-learning
Though many phenomena in the world can be well-modeled using basic linear regression or classification, there are also many interesting phenomena that are nonlinear in nature. In order to deal with nonlinear phenomena, there have been a diversity of nonlinear models developed.
Cutting Your Losses: Loss Functions & the Sum of Squared Errors Loss Jun 30, 2020 Dustin Stansbury
statistics least-squares-regression loss-functions parameter-optimization r-squared
In this post we’ll introduce the notion of the loss function and its role in model parameter estimation. We’ll then focus in on a common loss function–the sum of squared errors (SSE) loss–and give some motivations and intuitions as to why this particular loss function works so well in practice.
Derivation: Derivatives for Common Neural Network Activation Functions Jun 29, 2020 Dustin Stansbury
neural-networks gradient-descent derivation
When constructing Artificial Neural Network (ANN) models, one of the primary considerations is choosing activation functions for hidden and output layers that are differentiable. This is because calculating the backpropagated error signal that is used to determine ANN parameter updates requires the gradient of the activation function gradient . Three of the most commonly-used activation functions used in ANNs are the identity function, the logistic sigmoid function, and the hyperbolic tangent function. Examples of these functions and their associated gradients (derivatives in 1D) are plotted in Figure 1.
Derivation: Error Backpropagation & Gradient Descent for Neural Networks Jun 29, 2020 Dustin Stansbury
neural-networks gradient-descent derivation
Artificial neural networks (ANNs) are a powerful class of models used for nonlinear regression and classification tasks that are motivated by biological neural computation. The general idea behind ANNs is pretty straightforward: map some input onto a desired target value using a distributed cascade of nonlinear transformations (see Figure 1). However, for many, myself included, the learning algorithm used to train ANNs can be difficult to get your head around at first. In this post I give a step-by-step walkthrough of the derivation of the gradient descent algorithm commonly used to train ANNs–aka the “backpropagation” algorithm. Along the way, I’ll also try to provide some high-level insights into the computations being performed during learning¹.
1. Though, I guess these days with autograd, who really needs to understand how the calculus for gradient descent works, amiright? (hint: that is a joke) ↩

Posts

p-Hacking 101: Data Peeking Dec 13, 2023 Dustin Stansbury statistics hypothesis-testing ab-testing false-positive type-I-error p-hacking

p-Hacking 101: N Chasing Oct 4, 2020 Dustin Stansbury statistics hypothesis-testing ab-testing false-positive type-I-error p-hacking

Who Needs Backpropagation? Computing Word Embeddings with Linear Algebra Sep 11, 2020 Dustin Stansbury natural-language-processing word-embeddings information-theory pointwise-mutual-information linear-algebra singular-value-decomposition

SVD and Data Compression Using Low-rank Matrix Approximation Aug 16, 2020 Dustin Stansbury linear-algebra singular-value-decomposition low-rank-approximation data-compression image-compression

Singular Value Decomposition: The Swiss Army Knife of Linear Algebra Aug 10, 2020 Dustin Stansbury linear-algebra matrix-diagonalization singular-value-decomposition

Efficient Matrix Power Calculation via Diagonalization Aug 8, 2020 Dustin Stansbury linear-algebra matrix-diagonalization

Common Linear Algebra Identities Aug 5, 2020 Dustin Stansbury derivation linear-algebra matrix-identities

Derivation: Ordinary Least Squares Solution and the Normal Equations Jul 23, 2020 Dustin Stansbury ordinary-least-squares derivation normal-equations

Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff Jul 20, 2020 Dustin Stansbury statistics classification regression bias-variance-tradeoff model-selection

Supplemental Proof: The Expected Value of a Squared Random Variable Jul 19, 2020 Dustin Stansbury statistics derivation expected-value

A Gentle Introduction to Artificial Neural Networks Jul 13, 2020 Dustin Stansbury neural-networks gradient-descent backpropagation classification regression deep-learning

Cutting Your Losses: Loss Functions & the Sum of Squared Errors Loss Jun 30, 2020 Dustin Stansbury statistics least-squares-regression loss-functions parameter-optimization r-squared

Derivation: Derivatives for Common Neural Network Activation Functions Jun 29, 2020 Dustin Stansbury neural-networks gradient-descent derivation

Derivation: Error Backpropagation & Gradient Descent for Neural Networks Jun 29, 2020 Dustin Stansbury neural-networks gradient-descent derivation

p-Hacking 101: Data Peeking Dec 13, 2023 Dustin Stansbury
statistics hypothesis-testing ab-testing false-positive type-I-error p-hacking

p-Hacking 101: N Chasing Oct 4, 2020 Dustin Stansbury
statistics hypothesis-testing ab-testing false-positive type-I-error p-hacking

Who Needs Backpropagation? Computing Word Embeddings with Linear Algebra Sep 11, 2020 Dustin Stansbury
natural-language-processing word-embeddings information-theory pointwise-mutual-information linear-algebra singular-value-decomposition

SVD and Data Compression Using Low-rank Matrix Approximation Aug 16, 2020 Dustin Stansbury
linear-algebra singular-value-decomposition low-rank-approximation data-compression image-compression

Singular Value Decomposition: The Swiss Army Knife of Linear Algebra Aug 10, 2020 Dustin Stansbury
linear-algebra matrix-diagonalization singular-value-decomposition

Efficient Matrix Power Calculation via Diagonalization Aug 8, 2020 Dustin Stansbury
linear-algebra matrix-diagonalization

Common Linear Algebra Identities Aug 5, 2020 Dustin Stansbury
derivation linear-algebra matrix-identities

Derivation: Ordinary Least Squares Solution and the Normal Equations Jul 23, 2020 Dustin Stansbury
ordinary-least-squares derivation normal-equations

Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff Jul 20, 2020 Dustin Stansbury
statistics classification regression bias-variance-tradeoff model-selection

Supplemental Proof: The Expected Value of a Squared Random Variable Jul 19, 2020 Dustin Stansbury
statistics derivation expected-value

A Gentle Introduction to Artificial Neural Networks Jul 13, 2020 Dustin Stansbury
neural-networks gradient-descent backpropagation classification regression deep-learning

Cutting Your Losses: Loss Functions & the Sum of Squared Errors Loss Jun 30, 2020 Dustin Stansbury
statistics least-squares-regression loss-functions parameter-optimization r-squared

Derivation: Derivatives for Common Neural Network Activation Functions Jun 29, 2020 Dustin Stansbury
neural-networks gradient-descent derivation

Derivation: Error Backpropagation & Gradient Descent for Neural Networks Jun 29, 2020 Dustin Stansbury
neural-networks gradient-descent derivation