Getting Started with Linear Regression in Matlab (From Scratch)

2022-04-18 20:07:23

matlablinear-regressionmachine-learningmathematics

In this article, I illustrate how to use Matlab to implement a simple Linear Regression algorithm using a single variable—from scratch, without relying on built-in machine learning toolboxes.

Initialization

Let’s start with proper script initialization:

% Linear Regression with One Variable
% Clear environment
clear;
clc;
close all;

% Add library path for our custom functions
addpath('lib');

It’s always good practice to clear the heap from any variables and the command window before starting. The lib folder will contain our custom functions.

What is Linear Regression?

From Wikipedia:

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables).

Simply put, linear regression is used to model the relationship between two continuous variables. Often, the objective is to predict the value of an output variable (response) based on the value of an input (predictor) variable.

The Hypothesis Function

The idea is to fit a linear function to a given dataset. This function is called the Hypothesis:

$$h_\theta(x) = \theta_0 + \theta_1 x$$

Where:

$\theta_0$ is the y-intercept (bias)
$\theta_1$ is the slope
$x$ is the input variable

The Cost Function

The cost function measures the average error of n-samples in the data:

$$J(\theta_0, \theta_1) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2$$

Our goal is to minimize the cost function to get the best approximation possible to the n-samples in the dataset.

Implementing the Cost Function

Let’s implement the cost function and save it as lib/compute_cost.m:

function J = compute_cost(X, y, theta)
    % COMPUTE_COST Compute cost for linear regression
    %   J = COMPUTE_COST(X, y, theta) computes the cost of using theta as the
    %   parameter for linear regression to fit the data points in X and y

    m = length(y);  % number of training examples

    predictions = X * theta;
    sqrErrors = (predictions - y).^2;

    J = 1/(2*m) * sum(sqrErrors);
end

Visualizing the Data

Let’s create a basic dataset and visualize it:

% Sample dataset
X = [1; 2; 3; 4; 5];
y = [1; 2; 2.5; 4; 5];

% Plot data
figure;
plot(X, y, 'rx', 'MarkerSize', 10);
xlabel('x');
ylabel('y');
title('Training Data');

Testing Different Hypotheses

Let’s try a simple hypothesis: $f(x) = x$ (i.e., $\theta_0 = 0$, $\theta_1 = 1$):

% Add column of ones for theta_0
X_with_ones = [ones(length(X), 1), X];
theta = [0; 1];  % theta_0 = 0, theta_1 = 1

cost = compute_cost(X_with_ones, y, theta);
% cost ≈ 0.2750

This line is close to most data points, but the error is still significant. Let’s try another hypothesis: $f(x) = \frac{1}{2}x + \frac{1}{2}$:

theta = [0.5; 0.5];  % theta_0 = 0.5, theta_1 = 0.5
cost = compute_cost(X_with_ones, y, theta);
% cost ≈ 0.1042

The cost is lower! This is exactly what minimization is about. But manually adjusting parameters is impractical—enter Gradient Descent.

Gradient Descent

From Wikipedia:

Gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps in the opposite direction of the gradient of the function at the current point.

The Update Rule

For each iteration, we update the parameters:

$$\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta_0, \theta_1)$$

Where $\alpha$ is the learning rate.

The partial derivatives work out to:

$$\frac{\partial}{\partial \theta_0} J(\theta) = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})$$$$\frac{\partial}{\partial \theta_1} J(\theta) = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)}$$

Implementing the Derivatives

Save this as lib/compute_derivatives.m:

function [d_theta0, d_theta1] = compute_derivatives(X, y, theta)
    % COMPUTE_DERIVATIVES Compute partial derivatives for gradient descent

    m = length(y);
    predictions = X * theta;
    errors = predictions - y;

    d_theta0 = (1/m) * sum(errors);
    d_theta1 = (1/m) * sum(errors .* X(:, 2));
end

The Gradient Descent Function

Save this as lib/gradient_descent.m:

function [theta, J_history] = gradient_descent(X, y, theta, alpha, num_iters)
    % GRADIENT_DESCENT Performs gradient descent to learn theta

    m = length(y);
    J_history = zeros(num_iters, 1);

    for iter = 1:num_iters
        [d_theta0, d_theta1] = compute_derivatives(X, y, theta);

        theta(1) = theta(1) - alpha * d_theta0;
        theta(2) = theta(2) - alpha * d_theta1;

        J_history(iter) = compute_cost(X, y, theta);
    end
end

Putting It All Together

% Clear environment
clear; clc; close all;
addpath('lib');

% Dataset
X = [1; 2; 3; 4; 5];
y = [1; 2; 2.5; 4; 5];

% Add ones column
X_with_ones = [ones(length(X), 1), X];

% Initialize parameters
theta = [0; 0];
alpha = 0.01;        % Learning rate
iterations = 1000;

% Run gradient descent
[theta, J_history] = gradient_descent(X_with_ones, y, theta, alpha, iterations);

% Display results
fprintf('Theta found: %f, %f\n', theta(1), theta(2));
fprintf('Final cost: %f\n', J_history(end));

% Plot results
figure;
subplot(1, 2, 1);
plot(X, y, 'rx', 'MarkerSize', 10);
hold on;
plot(X, X_with_ones * theta, 'b-', 'LineWidth', 2);
xlabel('x'); ylabel('y');
title('Linear Regression Fit');
legend('Training data', 'Linear regression');

subplot(1, 2, 2);
plot(1:iterations, J_history, 'b-', 'LineWidth', 2);
xlabel('Iterations'); ylabel('Cost J');
title('Cost Function Convergence');

Results

The gradient descent algorithm converges to optimal values for $\theta_0$ and $\theta_1$, resulting in a line that best fits the training data while minimizing the cost function.

Conclusion

I hope this was helpful for anyone reading out of curiosity. There’s no doubt that these implementations can and should be further optimized—vectorization, adaptive learning rates, and regularization are natural next steps.

The beauty of implementing machine learning algorithms from scratch is understanding what’s happening “under the hood” before relying on high-level libraries.

Getting Started with Linear Regression in Matlab (From Scratch)

Implementing machine learning fundamentals without toolboxes.

Achraf SOLTANI — April 18, 2022

The Sanctuary