Regression Analysis in Python
Regression analysis is one of the most widely used statistical techniques in data science and machine learning. It helps us understand relationships between variables and make predictions. In this blog, we’ll walk through the basics of regression analysis using Python and its popular libraries.
🔍 What is Regression Analysis?
Regression analysis is a predictive modeling technique used to examine the relationship between a dependent (target) variable and one or more independent (predictor) variables. The most common type is linear regression, which assumes a linear relationship between variables.
For example, predicting house prices based on area, location, and number of bedrooms is a regression problem.
🛠 Libraries Required
Before we start, install the required Python libraries:
pip install numpy pandas matplotlib seaborn scikit-learn
Now, let's import them:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
📊 Sample Example: Predicting House Prices
Let's create a simple regression model using a sample dataset:
# Sample dataset
data = pd.DataFrame({
'Area': [1000, 1500, 2000, 2500, 3000],
'Price': [200000, 250000, 300000, 350000, 400000]
})
# Splitting input and output
X = data[['Area']]
y = data['Price']
# Splitting into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Training the model
model = LinearRegression()
model.fit(X_train, y_train)
# Predicting
y_pred = model.predict(X_test)
# Evaluating
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))
print("R² Score:", r2_score(y_test, y_pred))
📈 Visualizing the Regression Line
plt.scatter(X, y, color='blue')
plt.plot(X, model.predict(X), color='red')
plt.xlabel('Area')
plt.ylabel('Price')
plt.title('Linear Regression')
plt.show()
✅ Conclusion
Regression analysis in Python is straightforward and powerful. With just a few lines of code, you can build predictive models and gain insights into data. Whether you're analyzing business trends or forecasting future sales, regression is a must-have tool in any data scientist’s toolkit.
Start with simple linear regression and gradually explore advanced techniques like polynomial regression, ridge, and lasso for better accuracy and performance.
Learn Data Science Training Course
Read More
Cleaning Messy Datasets: Best Practices
How to Work with Time Series Data
Creating Dashboards with Power BI
Understanding Data Pipelines in Data Science
Visit Quality Thought Training Institute
Comments
Post a Comment