Locally Weighted Linear Regression

Abhimanyu Saxena

Data Science

Regression techniques are widely used in data science to model relationships between variables. Traditional regression methods, such as linear regression, assume a global relationship across the entire dataset. While effective for linear patterns, they often fail to capture complex, non-linear relationships.

This is where non-parametric methods like Locally Weighted Linear Regression (LWLR) come into play. Unlike global models, LWLR adapts to local variations in the data, making it highly flexible and accurate for datasets with varying patterns. In this article, we’ll explore the fundamentals of LWLR, how it works, and when to use it.

What is Locally Weighted Linear Regression?

Locally Weighted Linear Regression (LWLR) is a non-parametric, memory-based algorithm designed to capture non-linear relationships in data. Unlike traditional regression models that fit a single global line across the dataset, LWLR creates localized models for subsets of data points near the query point. Each query point has its own regression line based on weighted contributions from nearby data points.

How Does It Work?

LWLR assigns weights to data points based on their proximity to the query point:

  • Points closer to the query point have higher weights.
  • Points farther away have lower weights.

This approach allows LWLR to adapt to local data structures, making it effective for modeling non-linear relationships.

Comparison with Global Linear Regression

  • Global Linear Regression: Fits a single line to the entire dataset, assuming a uniform relationship across all data points.
  • Locally Weighted Linear Regression: Fits multiple localized lines, adapting to variations in different parts of the dataset.

For instance, in predicting housing prices, LWLR can handle neighborhoods with distinct pricing trends better than global linear regression, which might oversimplify the relationship.

Example of Locally Weighted Linear Regression

Illustrative Scenario

Let’s consider a scenario where we want to predict housing prices based on the size of the house. In a dataset, the relationship between size and price might vary across different neighborhoods due to local factors like amenities or location.

  • Global Linear Regression: Fits a single line to the entire dataset, assuming a uniform relationship between size and price across all neighborhoods. This may lead to inaccurate predictions in areas where the relationship deviates.
  • Locally Weighted Linear Regression: Focuses on the specific neighborhood by giving more weight to houses closer in size to the query house. This results in a better prediction tailored to the local trends.

Visualization

Imagine plotting housing prices against house sizes:

  • A global regression line might cut straight through the dataset.
  • LWLR would fit smaller localized lines that closely follow the variations in data for each neighborhood.

This adaptability allows LWLR to model more complex relationships, such as sharp changes in housing prices in specific regions.

Steps Involved in Locally Weighted Linear Regression

The process of Locally Weighted Linear Regression involves several key steps, ensuring the model captures local patterns effectively.

1. Data Collection and Preparation

  • Gather a dataset with relevant features and a target variable.
  • Preprocess the data by handling missing values and normalizing features to ensure a consistent scale, which improves the weighting process.

2. Choose the Kernel and Bandwidth (Tau)

  • Kernel Function: Determines how weights are assigned to data points based on their distance from the query point. Common choices include:
    • Gaussian Kernel: Assigns weights using the formula:

$$w_i = e^{-\frac{(x – x_i)^2}{2\tau^2}}$$

  • Here, $w_i$​ is the weight for data point $x_i$​, and τ\tauτ (bandwidth) controls the rate of weight decay.
  • Bandwidth (Tau): A critical parameter that governs how localized the regression is:
    • Small $\tau$: Focuses on nearby points, capturing finer details but risks overfitting.
    • Large $\tau$: Includes more distant points, reducing variance but increasing bias.

3. Weight Calculation

For a given query point $x$, compute weights for all data points using the chosen kernel function. Points closer to $x$ will have higher weights.

4. Model Fitting

Using the computed weights, fit a weighted least squares regression to the data. The goal is to minimize the weighted sum of squared errors:

$$\text{Minimize } \sum_{i} w_i \left(y_i – (\theta_0 + \theta_1 x_i)\right)^2$$

Here:

  • $w_i$​: Weight for data point iii.
  • $y_i$​: Actual value for data point iii.
  • $\theta_0, \theta_1$: Regression coefficients.

5. Prediction

Once the localized model is fitted, use it to predict the target value for the query point.

Locally Weighted Linear Regression Using Python

Implementing Locally Weighted Linear Regression (LWLR) in Python provides a practical understanding of how the algorithm works. Below is a step-by-step guide with code examples.

1. Import Necessary Libraries

import numpy as np
import matplotlib.pyplot as plt

2. Define the Kernel Function

The kernel function computes weights for data points based on their distance from the query point.

def kernel(point, xmat, tau):
    m = xmat.shape[0]
    weights = np.eye(m)
    for i in range(m):
        diff = point - xmat[i]
        weights[i, i] = np.exp(-(diff @ diff.T) / (2 * tau ** 2))
    return weights

3. Fit the LWLR Model

This function calculates the parameters of the locally weighted linear model.

def lwlr(point, xmat, ymat, tau):
    xmat = np.mat(xmat)
    ymat = np.mat(ymat).T
    weights = kernel(point, xmat, tau)
    theta = (xmat.T * weights * xmat).I * (xmat.T * weights * ymat)
    return point * theta

4. Predict Using LWLR

Create a function to generate predictions for all query points.

def predict(xmat, ymat, query_points, tau):
    predictions = []
    for point in query_points:
        prediction = lwlr(point, xmat, ymat, tau)
        predictions.append(prediction)
    return np.array(predictions)

5. Example Application

Generate a dataset and visualize LWLR predictions.

# Create sample data
x = np.linspace(0, 10, 100)
y = 2 * x + np.random.normal(0, 1, 100)
xmat = np.c_[np.ones(x.shape[0]), x]
ymat = y

# Query points
query_points = np.c_[np.ones(100), x]

# Apply LWLR
tau = 0.5  # Bandwidth parameter
predictions = predict(xmat, ymat, query_points, tau)

# Plot results
plt.scatter(x, y, color='blue', label='Data')
plt.plot(x, predictions, color='red', label='LWLR Predictions')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()

Output

The plot shows how LWLR adapts to the local structure of the data, providing predictions tailored to specific regions.

When Should We Use Locally Weighted Linear Regression?

Locally Weighted Linear Regression (LWLR) is not always the best fit for every scenario. It is most effective in specific situations where its unique properties shine. Here’s when you should consider using LWLR:

1. Non-Linear Data Patterns

  • LWLR excels in datasets with non-linear relationships between variables.
  • Example: Predicting housing prices where local factors like neighborhoods heavily influence prices.

2. Local Variations Across the Dataset

  • If the relationship between variables varies significantly across subsets of the dataset, LWLR captures these localized patterns better than global models.
  • Example: Sales data where customer behavior differs across regions.

3. Small to Medium Datasets

  • Since LWLR requires fitting a model for each query point, it can become computationally expensive for large datasets.
  • Use it effectively on small to medium-sized datasets where computation time is manageable.

4. When Interpretability is Key

  • LWLR provides insight into how local data structures influence predictions, making it ideal for situations where understanding these relationships is important.

Comparison with Other Methods

  • LWLR vs Global Linear Regression: LWLR handles non-linear and locally varying data better.
  • LWLR vs k-Nearest Neighbors (k-NN): Both are non-parametric, but LWLR provides a continuous prediction by fitting a local regression line, while k-NN uses discrete averages or votes.

When Not to Use LWLR

  • For large datasets, due to computational overhead.
  • When a global model sufficiently captures the relationship, as simpler models are easier to deploy and maintain.

Effect of Tau/Bandwidth Parameter

The bandwidth parameter (τ\tauτ) in Locally Weighted Linear Regression (LWLR) plays a critical role in determining how much influence nearby data points have on the prediction. It directly impacts the weights assigned to data points and controls the balance between bias and variance.

Role of Tau

  • Small Tau ($\tau$):
    • Focuses on data points very close to the query point.
    • Produces a model that is highly sensitive to local variations.
    • Effect: Captures finer details but may lead to overfitting, where the model is too specific to the local data.
  • Large Tau ($\tau$):
    • Includes more distant data points in the model.
    • Reduces sensitivity to local variations, leading to a smoother curve.
    • Effect: Reduces overfitting but risks underfitting, as local nuances may be ignored.

Bias-Variance Trade-off

  • Small Tau: Low bias but high variance, as predictions vary significantly with changes in the data.
  • Large Tau: High bias but low variance, as the model generalizes more and misses local patterns.

Parameter Selection

Choosing the optimal value for τ\tauτ is essential for achieving the right balance between bias and variance. Common methods include:

  • Cross-Validation: Test different values of τ\tauτ and select the one that minimizes prediction error.
  • Grid Search: Systematically search through a range of τ\tauτ values to find the best fit.

Impact Analysis with Examples

  • Small Tau $(\tau = 0.1)$: The regression curve closely follows the data, capturing all local variations but may overfit noise.
  • Large Tau $(\tau = 2.0)$: The curve smooths out, generalizing the relationship but potentially missing important local trends.

Advantages and Disadvantages of Locally Weighted Linear Regression

Advantages

  1. Models Non-Linear Relationships
    • LWLR captures complex, non-linear patterns in data that global linear regression often misses.
    • Example: Predicting sales where seasonal trends create non-linear variations.
  2. Localized Adaptability
    • Fits models to subsets of data near the query point, making it highly flexible in handling datasets with varying relationships across different regions.
  3. No Fixed Global Formula
    • Unlike parametric models, LWLR does not assume a global form, making it robust for datasets where no clear global pattern exists.
  4. Intuitive Understanding of Local Effects
    • By focusing on local data points, LWLR provides insights into how individual data regions contribute to predictions.

Disadvantages

  1. Computationally Intensive
    • Requires fitting a model for each query point, leading to high computational costs, especially with large datasets.
    • Example: Running LWLR on a dataset with millions of data points can be time-consuming.
  2. Sensitivity to Bandwidth (τ\tauτ)
    • The performance heavily depends on the choice of the bandwidth parameter. Poor selection can result in either overfitting or underfitting.
  3. Requires Normalized Data
    • Data normalization is critical to ensure accurate weight calculation. Without it, features with larger scales may dominate the model.
  4. Not Scalable for Large Datasets
    • Due to its memory-based nature, LWLR struggles with scalability when applied to big data.

Conclusion

Locally Weighted Linear Regression (LWLR) is a powerful non-parametric regression technique that excels in capturing complex, non-linear relationships in data. By assigning weights to data points based on their proximity to the query point, LWLR adapts to local variations, making it a flexible and effective method for modeling datasets with varying patterns.

While LWLR offers significant advantages, such as localized adaptability and the ability to model non-linear relationships, it also has its limitations. Its computational intensity and sensitivity to the bandwidth parameter ($\tau$) make it less suited for large datasets or situations where global models suffice.

In summary, LWLR is an excellent choice for tasks requiring fine-grained, localized predictions, especially in smaller datasets with intricate patterns. Understanding its principles and limitations ensures that it is applied effectively, unlocking its full potential in data science applications.