This project is an exploratory data analysis (EDA) of the Airbnb dataset for New York City.
The goal is to gain insights into the factors affecting rental prices and customer satisfaction, and to provide recommendations to Airbnb hosts on how to improve customer satisfaction and increase rental prices.
Tasks
-
Import and clean the dataset: Handle missing values, remove duplicates, and perform necessary data transformations.
-
Conduct descriptive statistics analysis: Calculate mean, median, mode, variance, and standard deviation for numerical variables
-
Visualize the data: Create appropriate visualizations (e.g., histograms, boxplots, bar charts) to analyze the distribution of numerical variables and the relationships between categorical and numerical variables.
-
Analyze geographical data: Create a heatmap to visualize the density of listings across New York City neighborhoods. Identify areas with the highest concentration of Airbnb listings.
-
Investigate the relationship between room type, neighborhood group, and price. Perform appropriate statistical tests (e.g., t-test, ANOVA) to determine if there are significant differences in rental prices based on room type and neighborhood group.
-
Analyze the relationship between customer satisfaction (as measured by the number of reviews and reviews_per_month) and factors such as price, room type, and neighborhood group.
-
Based on your findings, provide recommendations to Airbnb hosts on how to improve customer satisfaction and increase rental prices. Consider potential strategies such as offering different room types, targeting specific neighborhoods, or adjusting pricing based on demand and competition.
Dataset
The dataset used for this analysis is the "New York City Airbnb Open Data" available on Kaggle.
It contains data on over 48,000 Airbnb listings in New York City, including information such as host ID, neighborhood, room type, price, availability, number of reviews, and customer ratings.
Project Structure
The project is structured as follows:
data directory contains the raw data file AB_NYC_2019.csv
notebooks directory contains the Jupyter notebooks used for data cleaning, EDA, and analysis
figures directory contains the visualizations generated during the analysis
README.md file provides an overview of the project
Getting Started
To run the code in this project, you will need Python 3 and Jupyter Notebook installed on your computer. You can download the dataset from the Kaggle, or in github repo
Data Cleaning and EDA
airbnb.ipynb contains the code for cleaning the data, handling missing values, removing duplicates, and performing necessary data transformations.
The notebook 02_EDA.ipynb contains the code for descriptive statistics analysis, data visualization, and analysis of the relationship between customer satisfaction and factors such as price, room type, and neighborhood group.
Recommendations
Based on the findings of the analysis,recommendations to Airbnb hosts on how to improve customer satisfaction and increase rental prices.
These recommendations consider potential strategies such as offering different room types, targeting specific neighborhoods, or adjusting pricing based on demand and competition.
Author
This project was created by Tarakram. You can contact me at jtarakram6699@email.com.