Boston AirBnb Data Analysis
AirBnb has taken the world by storm and has been making planning vacations in a new city exciting and invigorating. AirBnb has over 150 million users worldwide and 6 users check into AirBnb every second. Isnât that amazing?
So, what better way to enhance this amazing service by analyzing data and the key trends.
Crisp DM is a strategy that is a key component to help us analyze the data better. Now, you might have the question, What is Crisp DM?
Crisp DM is known as Cross-industry standard process for data mining.
Let us deep dive and further understand the data.
Motivation:
The motivation for analysis of this data is to deep dive into Data Science field after having experience as a Software Engineer for around 2 years. It is further enhanced by the Data Science Nano degree that I am pursuing at Udacity.
Data Understanding:
The data (available at Kaggle) has 3 files
- listings: Full descriptions and average review score
- calendar: Listing id and the price and availability for that day
- reviews: Unique id for each reviewer and detailed comments
The listing data is the primary dataset over which I will perform my data analysis.
It has 3585 rows and 95 columns.
Business Understanding:
The most common logical questions that anyone would ask in order to make profit as a new AirBnb host or as a user of the service.
These are the common questions that we will ask to better analyze the service.
- Most common price listings for AirBnb?
- What is the relation between price and property type?
- Which room types in each neighbourhood have high prices?
- What are the top 5 amenities?
Q1. What are the most common Price listings?
Data preparation:
In order to better perform our analysis, we have cleaned the Price field of the listings data frame and converted it into a float.
As we can clearly see, most of the AirBnbs are lesser than 700 dollars in rent which means that the data is mostly in the range of 0â500 dollars.
The following plot shows that the highest price is 4000 dollars. The plot shows that the count of premium priced AirBnbs above 700 dollars is less.
The following plot shows that most of the AirBnbs are in the 50â200 range. Thus, the most common price listings are in this range.
Results: The following analysis is pretty insightful and shows us the range of Price points that are available with the highest being 4000 dollars and the most common price listings being in the 50â200 USD range.
Q2.What is the relation between price and property type?
The following heatmap shows the average prices for each property type. The Shared room type is mostly the cheapest while the bed and breakfast property type is generally the cheapest.
Q3. Which room types in each neighbourhood have high prices?
I calculated the mean price for each neighbourhood and sorted them to indicate the most expensive neighbourhoods in each location. South Boston WaterFront and Bay Village are the 2 most expensive neighbourhoods. On the other hand, Mattapan and Dorchester are the cheapest neighbourhoods.
Now, coming to the key question. Which room types in each neighbourhood command a higher price. Let us understand with the help of the following scatterplot.
The general trend is that each neighbourhood, private rooms are the cheapest while the Entire apartments are expensive.
Q4. What are the top 5 amenities?
Data preparation:
I have performed data cleaning in order to better use the amenities column.
I have used scikit learnâs Multi label binarizer to find out the top 5 amenities that are the most frequent ones.
The following graph shows us that Wireless Internet, heating, Kitchen, Essentials and Smoke detector are the most common amenities that are there in most of the AirBnbs.
Deployment
I have used Google collab to deploy my code. You may also use Jupyter Notebooks to run the code. The code libraries used and detailed code breakdown is available at GitHub and is explained in the Readme.
This is the GitHub Link: https://github.com/AkshayJaitly/Boston-AirBnb-Analysis.
Whatâs next:
Use predictive analytics to determine prices in the future. Determine seasonal effects on pricing. Understand more about Superhosts and what makes them special.
Feel free to follow me on GitHub.