Let me know if you find this valuable and I'll keep going!
One of the simplest yet powerful concepts is the linear model.
In ML, one of our primary goals is to make predictions based on data. The linear model is like the "Hello World" of machine learning - it's straightforward but forms the foundation for understanding more complex models.
Let's build a model to predict home prices. In this example, the output is the expected "home price", and your inputs will be things like "sqft", "num_bedrooms", etc...
def prediction(sqft, num_bedrooms, num_baths): weight_1, weight_2, weight_3 = .0, .0, .0 home_price = weight_1*sqft, weight_2*num_bedrooms, weight_3*num_baths return home_price
You'll notice a "weight" for each input. These weights are what create the magic behind the prediction. This example is boring as it will always output zero since the weights are zero.
So let's discover how we can find these weights.
The process for finding the weights is called "training" the model.
data = [ {"sqft": 1000, "bedrooms": 2, "baths": 1, "price": 200000}, {"sqft": 1500, "bedrooms": 3, "baths": 2, "price": 300000}, # ... more data points ... ]
home_price = prediction(1000, 2, 1) # our weights are currently zero, so this is zero actual_value = 200000 error = home_price - actual_value # 0 - 200000 we are way off. # let's square this value so we aren't dealing with negatives error = home_price**2
Now that we have a way to know how off (error) we are for one data point, we can calculate the average error across all of the data points. This is commonly referred to as the mean squared error.
We could, of course, choose random numbers and keep saving the best value as we go along- but that's inefficient. So let's explore a different method: gradient descent.
Gradient descent is an optimization algorithm used to find the best weights for our model.
The gradient is a vector that tells us how the error changes as we make small changes to each weight.
Sidebar intuition
Imagine standing on a hilly landscape, and your goal is to reach the lowest point (the minimum error). The gradient is like a compass that always points to the steepest ascent. By going against the direction of the gradient, we're taking steps towards the lowest point.
Here's how it works:
How do we calculate the gradient for each error?
One way to calculate the gradient is to make small shifts in the weight, see how that impacted our error, and see where we should move from there.
def calculate_gradient(weight, data, feature_index, step_size=1e-5): original_error = calculate_mean_squared_error(weight, data) # Slightly increase the weight weight[feature_index] += step_size new_error = calculate_mean_squared_error(weight, data) # Calculate the slope gradient = (new_error - original_error) / step_size # Reset the weight weight[feature_index] -= step_size return gradient
Step-by-Step Breakdown
Input Parameters:
Calculate Original Error:
original_error = calculate_mean_squared_error(weight, data)
We first calculate the mean squared error with our current weights. This gives us our starting point.
weight[feature_index] += step_size
We increase the weight by a tiny amount (step_size). This allows us to see how a small change in the weight affects our error.
new_error = calculate_mean_squared_error(weight, data)
We calculate the mean squared error again with the slightly increased weight.
gradient = (new_error - original_error) / step_size
This is the key step. We're asking: "How much did the error change when we slightly increased the weight?"
The magnitude tells us how sensitive the error is to changes in this weight.
weight[feature_index] -= step_size
We put the weight back to its original value since we were testing what would happen if we changed it.
return gradient
We return the calculated gradient for this weight.
This is called "numerical gradient calculation" or "finite difference method". We're approximating the gradient instead of calculating it analytically.
Now that we have our gradients, we can push our weights in the opposite direction of the gradient by subtracting the gradient.
weights[i] -= gradients[i]
If our gradient is too large, we could easily overshoot our minimum by updating our weight too much. To fix this, we can multiply the gradient by some small number:
learning_rate = 0.00001 weights[i] -= learning_rate*gradients[i]
And so here is how we do it for all of the weights:
def gradient_descent(data, learning_rate=0.00001, num_iterations=1000): weights = [0, 0, 0] # Start with zero weights for _ in range(num_iterations): gradients = [ calculate_gradient(weights, data, 0), # sqft calculate_gradient(weights, data, 1), # bedrooms calculate_gradient(weights, data, 2) # bathrooms ] # Update each weight for i in range(3): weights[i] -= learning_rate * gradients[i] if _ % 100 == 0: error = calculate_mean_squared_error(weights, data) print(f"Iteration {_}, Error: {error}, Weights: {weights}") return weights
Finally, we have our weights!
Once we have our trained weights, we can use them to interpret our model:
For example, if our trained weights are [100, 10000, 15000], it means:
Linear models, despite their simplicity, are powerful tools in machine learning. They provide a foundation for understanding more complex algorithms and offer interpretable insights into real-world problems.
The above is the detailed content of Machine Learning for Software Engineers. For more information, please follow other related articles on the PHP Chinese website!