Improving Car Price Predictions by Identifying Key Features
Abstract
A prime application of data science and machine learning is the car price prediction. Which is the estimation of the market value of a vehicle based on several manipulating factors such as make, model, mileage, and age. This study emphasizes the feature selection and engineering in order to enhance the accuracy and efficiency of the predictive models. Data collection and preprocessing include cleaning, encoding, and scaling for quality data. This selection is performed by several feature selection techniques: Correlation Analysis, Recursive Feature Elimination (RFE), Random Forest Feature Importance, and Lasso Regression. In order to identify and retain the most important predictors and remove irrelevant and redundant attributes. Then a refined dataset is used to train a Random Forest Regressor, which is a strong ensemble learning model. By looking at the metrics used for evaluating Mean Absolute Error, Mean Squared Error, and R-squared, a clear improvement can be seen: The MAE reduced by 28.9%, MSE decreased by 29.2%, and R² increased by 10.3%. These findings reflect the successful utilization of feature selection as a technique that reduces overfitting and, accordingly, increases model generalization and lowers the computational complexity of the algorithm. This study focuses on the promise that machine learning presents to resolve the problem of multicollinearity and to tackle imbalanced datasets while focusing on the utility of domain knowledge for further improvement of the predictiveness of a model. Advanced deep learning techniques combined with domain-specific features and adaptive algorithms might help enhance the robustness and applicability of models that predict the prices of cars.
Key words: Car Price Prediction, Feature Selection, Recursive Feature Elimination (RFE), Dimensionality Reduction, Multicollinearity Analysis