Skip to content

A Python-based data analysis project exploring film industry trends through data cleaning, feature engineering, and interactive visualizations. This notebook uncovers insights on genre popularity, budget vs. revenue dynamics, and audience ratings using tools like Pandas, Seaborn, and Plotly.

Notifications You must be signed in to change notification settings

salmamohammed11111/-Film-DataPrep-and-Visualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎬 Film DataPrep & Visualization
This project explores a movie dataset using Python to uncover insights through data cleaning, transformation, and interactive visualizations. It demonstrates practical skills in data wrangling, statistical analysis, and storytelling with visuals.
Film DataPrep   Visu

📌 Key Features

  • Data Cleaning: Handled missing values, corrected data types, and removed duplicates.

  • Exploratory Analysis: Investigated relationships between budget, revenue, genres, and ratings.

  • Visualizations: Created insightful plots using Seaborn and Plotly to highlight trends and outliers.

  • Interactivity: Enhanced user engagement with dynamic charts for deeper exploration.

    🔍 What Was Done — In Detail

  • Data Ingestion & Initial Assessment
    Imported the dataset using Pandas, followed by an initial structure audit. Identified key issues including null values, irrelevant columns, inconsistent data types, and unstructured categorical data.

  • Data Cleaning & Transformation

  • Replaced and removed missing or duplicate entries using targeted filters.

  • Converted data types (e.g., dates and numeric strings) to appropriate formats for analysis.

  • Standardized and simplified string formats for genres and cast names.

  • Filtered out records with clearly incomplete or unrealistic values (e.g., zero budgets and revenues).

  • Feature Selection & Statistical Exploration Selected relevant features for analysis — such as budget, revenue, vote_average, and genres. Conducted descriptive statistics and explored outliers, skewness, and inter-feature correlations using histograms, boxplots, and heatmaps.

  • Exploratory Data Analysis (EDA)

  • Evaluated how genre popularity, revenue, and ratings change over time.

  • Assessed relationships between budget and financial return.

  • Analyzed how vote averages vary across genres and production scales. Insights were derived both numerically and visually, with clear annotations.

  • Data Visualization Created a series of high-impact visualizations to communicate findings, including:

  • Bar Charts for genre distribution and vote averages

  • Scatter Plots comparing budget vs. revenue across decades

  • Boxplots illustrating rating dispersion by genre

  • Interactive Visuals using Plotly for user engagement and drill-down capability

    📊 Sample Visuals

  • Genre distribution bar chart

  • Budget vs. Revenue scatter plot

  • Top-rated films by popularit

    🛠️ Tools & Libraries Used This project leveraged a range of powerful tools and libraries to ensure a robust and insightful analysis:

  • Python 3.x – Primary programming language for data manipulation, analysis, and visualization.

  • Pandas – For efficient data ingestion, cleaning, transformation, and tabular analysis.

  • NumPy – Supported numerical operations and array-based transformations.

  • Seaborn – Created attractive and informative static visualizations for statistical exploration.

  • Matplotlib – Provided foundational plotting capabilities and customization for visuals.

  • Plotly Express – Enabled interactive charts with drill-down capabilities for deeper storytelling.

  • Jupyter Notebook – Served as the interactive development environment for combining code, visuals, and narrative.

    📈 Insights & Reflection Through this project, I uncovered several key patterns within the film industry landscape:

  • Genre Dynamics: Action and Drama were the most prevalent genres, yet Comedy and Animation often showed higher average ratings, especially in recent years.

  • Budget vs. Revenue: While a general positive correlation exists, several outliers revealed that a large budget doesn’t guarantee success, and some low-budget films achieved remarkable returns.

  • Vote Average Trends: Genre and production size influence audience ratings, with niche or critically acclaimed films often outperforming big-budget blockbusters in average scores. Beyond the data, this project strengthened my ability to:

  • Design a full analytical pipeline — from raw CSV to storytelling-ready visuals.

  • Balance static and interactive visualizations to suit different audience needs.

  • Troubleshoot real-world data issues like missing values, misclassified data types, and skewed distributions. This analysis deepened my understanding of how visualizations can amplify insights — not just display them — and reinforced the importance of curiosity-driven exploration alongside technical precision.

👩‍💻 About the Author

i'm salma Mohammed, an aspiring data analyst passionate about transforming messy datasets into meaningful insights. With a strong foundation in SQL and Python, I specialize in data cleaning, exploratory analysis, and creating interactive visualizations that tell a compelling story. I’m proficient in tools across the data ecosystem — including Excel, Power BI, Tableau, MySQL, and a wide array of Python libraries (such as Pandas, NumPy, Seaborn, Matplotlib, and Plotly). My projects reflect both technical precision and creative storytelling, aimed at building workflows that are not only insightful but also actionable. I’m currently building a portfolio that blends practical data analytics with visual clarity, with a growing interest in impactful domains like healthcare analytics and media trends.

📫 Feel free to connect with me on LinkedIn or explore more of my work on GitHub Profile , LinkedIn .

About

A Python-based data analysis project exploring film industry trends through data cleaning, feature engineering, and interactive visualizations. This notebook uncovers insights on genre popularity, budget vs. revenue dynamics, and audience ratings using tools like Pandas, Seaborn, and Plotly.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published