The Taxi Fare Predictor project aims to predict the fare of a taxi ride in New York City based on various features such as pickup and dropoff locations, the number of passengers, and the date and time of the ride. The project uses a machine learning model to perform the predictions, with a focus on data preprocessing, feature engineering, and model training.
The dataset used in this project is TaxiFare.csv
, which includes the following columns:
unique_id
: Unique identifier for each ride (dropped during preprocessing)amount
: The fare amount for the ridedate_time_of_pickup
: The date and time when the ride was initiatedlongitude_of_pickup
: The longitude of the pickup locationlatitude_of_pickup
: The latitude of the pickup locationlongitude_of_dropoff
: The longitude of the dropoff locationlatitude_of_dropoff
: The latitude of the dropoff locationno_of_passenger
: The number of passengers in the ride
The following Python libraries are required to run the notebook:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
You can install the necessary libraries using:
pip install pandas numpy matplotlib seaborn scikit-learn
The project is structured as follows:
-
Data Loading and Initial Exploration:
- Import necessary libraries.
- Load the dataset and perform initial exploration.
-
Data Cleaning and Preprocessing:
- Drop unnecessary columns.
- Check the dataset's shape and data types.
-
Feature Engineering and Model Training:
- Perform feature engineering to prepare data for model training.
- Train a
RandomForestRegressor
model to predict taxi fares.
- Clone the repository or download the
TaxiFarePredictor.ipynb
notebook. - Ensure you have the required libraries installed.
- Run the notebook cells sequentially to load the data, preprocess it, and train the model.
- The model's performance metrics will be displayed towards the end of the notebook.
This project demonstrates how to build a machine learning model to predict taxi fares using various features. It involves data cleaning, preprocessing, feature engineering, and model training using scikit-learn's RandomForestRegressor
.
- Explore additional features that could improve the model's accuracy, such as weather conditions or traffic data.
- Experiment with other machine learning algorithms and compare their performance.
- Deploy the model as a web service for real-time fare prediction.
Abhishek Kumar
Feel free to customize the README further based on additional details or personal preferences.