Skip to content
geeksforgeeks
  • Tutorials
    • Python
    • Java
    • Data Structures & Algorithms
    • ML & Data Science
    • Interview Corner
    • Programming Languages
    • Web Development
    • CS Subjects
    • DevOps And Linux
    • Software and Tools
    • School Learning
    • Practice Coding Problems
  • Go Premium
  • Data preprocessing
  • Data Manipulation
  • Data Analysis using Pandas
  • EDA
  • Pandas Exercise
  • Pandas AI
  • Numpy
  • Matplotlib
  • Plotly
  • Data Analysis
  • Machine Learning
  • Data science
Open In App
Next Article:
Extracting rows using Pandas .iloc[] in Python
Next article icon

Data Manipulation in Python using Pandas

Last Updated : 12 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In Machine Learning, the model requires a dataset to operate, i.e. to train and test. But data doesn’t come fully prepared and ready to use. There are discrepancies like Nan/ Null / NA values in many rows and columns. Sometimes the data set also contains some of the rows and columns which are not even required in the operation of our model. In such conditions, it requires proper cleaning and modification of the data set to make it an efficient input for our model. We achieve that by practicing Data Wrangling before giving data input to the model.

Today, we will get to know some methods using Pandas which is a famous library of Python. And by using it we can make out data ready to use for training the model and hence getting some useful insights from the results.

Installing Pandas

Before moving forward, ensure that Pandas is installed in your system. If not, you can use the following command to install it:

pip install pandas

Creating DataFrame

Let’s dive into the programming part. Our first aim is to create a Pandas dataframe in Python, as you may know, pandas is one of the most used libraries of Python.
Code: 

Python3
# Importing the pandas library
import pandas as pd

# creating a dataframe object
student_register = pd.DataFrame()

# assigning values to the 
# rows and columns of the dataframe
student_register['Name'] = ['Abhijit','Smriti',
                            'Akash', 'Roshni']
student_register['Age'] = [20, 19, 20, 14]
student_register['Student'] = [False, True,
                               True, False]

print(student_register)

Output:

      Name  Age  Student
0  Abhijit   20    False
1   Smriti   19     True
2    Akash   20     True
3   Roshni   14    False

As you can see, the dataframe object has four rows [0, 1, 2, 3] and three columns[“Name”, “Age”, “Student”] respectively. The column which contains the index values i.e. [0, 1, 2, 3] is known as the index column, which is a default part in pandas datagram. We can change that as per our requirement too because Python is powerful. 

Adding data in DataFrame using Append Function

Next, for some reason we want to add a new student in the datagram, i.e you want to add a new row to your existing data frame, that can be achieved by the following code snippet.
One important concept is that the “dataframe” object of Python, consists of rows which are “series” objects instead, stack together to form a table. Hence adding a new row means creating a new series object and appending it to the dataframe.
Code:

Python3
# creating a new pandas
# series object
new_person = pd.Series(['Mansi', 19, True], 
                       index = ['Name', 'Age', 
                                'Student'])

# using the .append() function
# to add that row to the dataframe
student_register.append(new_person, ignore_index = True)
print(student_register)

Output:

       Name  Age  Student
0  Abhijit   20    False
1   Smriti   19     True
2    Akash   20     True
3   Roshni   14    False

Data Manipulation on Dataset

Till now, we got the gist of how we can create dataframe, and add data to it. But how will we perform these operations on a big dataset. For this let's take a new dataset

Getting Shape and information of the data

Let's exact information of each column, i.e. what type of value it stores and how many of them are unique. There are three support functions, .shape, .info() and .corr() which output the shape of the table, information on rows and columns, and correlation between numerical columns.
Code: 

Python3
# dimension of the dataframe
print('Shape: ')
print(student_register.shape)
print('--------------------------------------')
# showing info about the data 
print('Info: ')
print(student_register.info())
print('--------------------------------------')
# correlation between columns
print('Correlation: ')
print(student_register.corr())

Output:

Shape: 
(4, 3)
--------------------------------------
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 4 non-null object
1 Age 4 non-null int64
2 Student 4 non-null bool
dtypes: bool(1), int64(1), object(1)
memory usage: 196.0+ bytes
None
--------------------------------------
Correlation:
Age Student
Age 1.000000 0.502519
Student 0.502519 1.000000

In the above example, the .shape function gives an output (4, 3) as that is the size of the created dataframe.

The description of the output given by .info() method is as follows: 

  1. RangeIndex describes about the index column, i.e. [0, 1, 2, 3] in our datagram. Which is the number of rows in our dataframe.
  2. As the name suggests Data columns give the total number of columns as output.
  3. Name, Age, Student are the name of the columns in our data, non-null tells us that in the corresponding column, there is no NA/ Nan/ None value exists. object, int64 and bool are the datatypes each column have.
  4. dtype gives you an overview of how many data types present in the datagram, which in term simplifies the data cleaning process. 
    Also, in high-end machine learning models, memory usage is an important term, we can’t neglect that.

Getting Statistical Analysis of Data

Before processing and wrangling any data you need to get the total overview of it, which includes statistical conclusions like standard deviation(std), mean and it’s quartile distributions.

Python3
# for showing the statistical 
# info of the dataframe
print('Describe')
print(student_register.describe())

Output:

Describe
             Age
count   4.000000
mean   18.250000
std     2.872281
min    14.000000
25%    17.750000
50%    19.500000
75%    20.000000
max    20.000000

The description of the output given by .describe() method is as follows: 

  1. count is the number of rows in the dataframe.
  2. mean is the mean value of all the entries in the “Age” column.
  3. std is the standard deviation of the corresponding column.
  4. min and max are the minimum and maximum entry in the column respectively.
  5. 25%, 50% and 75% are the First Quartiles, Second Quartile(Median) and Third Quartile respectively, which gives us important info on the distribution of the dataset and makes it simpler to apply an ML model.

Dropping Columns from Data

Let's drop a column from the data. We will use the drop function from the pandas. We will keep axis = 1 for columns.

Python3
students = student_register.drop('Age', axis=1)
print(students.head())

Output:

      Name  Student
0 Abhijit False
1 Smriti True
2 Akash True
3 Roshni False

From the output, we can see that the 'Age' column is dropped.

Dropping Rows from Data

Let's try dropping a row from the dataset, for this, we will use drop function. We will keep axis=0.

Python3
students = students.drop(2, axis=0)
print(students.head())

Output:

      Name  Student
0 Abhijit False
1 Smriti True
3 Roshni False

In the output we can see that the 2 row is dropped.



Data Manipulation in Python using Pandas
Next Article
Extracting rows using Pandas .iloc[] in Python

A

AbhijitTripathy
Improve
Article Tags :
  • Pandas
  • Python-pandas
  • python
Practice Tags :
  • python

Similar Reads

    Manipulating DataFrames with Pandas - Python
    Before manipulating the dataframe with pandas we have to understand what is data manipulation. The data in the real world is very unpleasant & unordered so by performing certain operations we can make data understandable based on one's requirements, this process of converting unordered data into
    4 min read
    Extracting rows using Pandas .iloc[] in Python
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. here we are learning how to Extract rows using Pandas .iloc[] in Python.Pandas .iloc[
    7 min read
    Extracting rows using Pandas .iloc[] in Python
    Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages that makes importing and analyzing data much easier. here we are learning how to Extract rows using Pandas .iloc[] in Python.Pandas .iloc[
    7 min read
    Python - Basics of Pandas using Iris Dataset
    Python language is one of the most trending programming languages as it is dynamic than others. Python is a simple high-level and an open-source language used for general-purpose programming. It has many open-source libraries and Pandas is one of them. Pandas is a powerful, fast, flexible open-sourc
    8 min read
    Pandas Extracting rows using .loc[] - Python
    Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame. To download the CSV used in code, click here.Example: Extracting single Row In this exam
    3 min read
    Pandas Extracting rows using .loc[] - Python
    Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is a method that takes only index labels and returns row or dataframe if the index label exists in the caller data frame. To download the CSV used in code, click here.Example: Extracting single Row In this exam
    3 min read
`; $(commentSectionTemplate).insertBefore(".article--recommended"); } loadComments(); }); }); function loadComments() { if ($("iframe[id*='discuss-iframe']").length top_of_element && top_of_screen articleRecommendedTop && top_of_screen articleRecommendedBottom)) { if (!isfollowingApiCall) { isfollowingApiCall = true; setTimeout(function(){ if (loginData && loginData.isLoggedIn) { if (loginData.userName !== $('#followAuthor').val()) { is_following(); } else { $('.profileCard-profile-picture').css('background-color', '#E7E7E7'); } } else { $('.follow-btn').removeClass('hideIt'); } }, 3000); } } }); } $(".accordion-header").click(function() { var arrowIcon = $(this).find('.bottom-arrow-icon'); arrowIcon.toggleClass('rotate180'); }); }); window.isReportArticle = false; function report_article(){ if (!loginData || !loginData.isLoggedIn) { const loginModalButton = $('.login-modal-btn') if (loginModalButton.length) { loginModalButton.click(); } return; } if(!window.isReportArticle){ //to add loader $('.report-loader').addClass('spinner'); jQuery('#report_modal_content').load(gfgSiteUrl+'wp-content/themes/iconic-one/report-modal.php', { PRACTICE_API_URL: practiceAPIURL, PRACTICE_URL:practiceURL },function(responseTxt, statusTxt, xhr){ if(statusTxt == "error"){ alert("Error: " + xhr.status + ": " + xhr.statusText); } }); }else{ window.scrollTo({ top: 0, behavior: 'smooth' }); $("#report_modal_content").show(); } } function closeShareModal() { const shareOption = document.querySelector('[data-gfg-action="share-article"]'); shareOption.classList.remove("hover_share_menu"); let shareModal = document.querySelector(".hover__share-modal-container"); shareModal && shareModal.remove(); } function openShareModal() { closeShareModal(); // Remove existing modal if any let shareModal = document.querySelector(".three_dot_dropdown_share"); shareModal.appendChild(Object.assign(document.createElement("div"), { className: "hover__share-modal-container" })); document.querySelector(".hover__share-modal-container").append( Object.assign(document.createElement('div'), { className: "share__modal" }), ); document.querySelector(".share__modal").append(Object.assign(document.createElement('h1'), { className: "share__modal-heading" }, { textContent: "Share to" })); const socialOptions = ["LinkedIn", "WhatsApp","Twitter", "Copy Link"]; socialOptions.forEach((socialOption) => { const socialContainer = Object.assign(document.createElement('div'), { className: "social__container" }); const icon = Object.assign(document.createElement("div"), { className: `share__icon share__${socialOption.split(" ").join("")}-icon` }); const socialText = Object.assign(document.createElement("span"), { className: "share__option-text" }, { textContent: `${socialOption}` }); const shareLink = (socialOption === "Copy Link") ? Object.assign(document.createElement('div'), { role: "button", className: "link-container CopyLink" }) : Object.assign(document.createElement('a'), { className: "link-container" }); if (socialOption === "LinkedIn") { shareLink.setAttribute('href', `https://www.linkedin.com/sharing/share-offsite/?url=${window.location.href}`); shareLink.setAttribute('target', '_blank'); } if (socialOption === "WhatsApp") { shareLink.setAttribute('href', `https://api.whatsapp.com/send?text=${window.location.href}`); shareLink.setAttribute('target', "_blank"); } if (socialOption === "Twitter") { shareLink.setAttribute('href', `https://twitter.com/intent/tweet?url=${window.location.href}`); shareLink.setAttribute('target', "_blank"); } shareLink.append(icon, socialText); socialContainer.append(shareLink); document.querySelector(".share__modal").appendChild(socialContainer); //adding copy url functionality if(socialOption === "Copy Link") { shareLink.addEventListener("click", function() { var tempInput = document.createElement("input"); tempInput.value = window.location.href; document.body.appendChild(tempInput); tempInput.select(); tempInput.setSelectionRange(0, 99999); // For mobile devices document.execCommand('copy'); document.body.removeChild(tempInput); this.querySelector(".share__option-text").textContent = "Copied" }) } }); // document.querySelector(".hover__share-modal-container").addEventListener("mouseover", () => document.querySelector('[data-gfg-action="share-article"]').classList.add("hover_share_menu")); } function toggleLikeElementVisibility(selector, show) { document.querySelector(`.${selector}`).style.display = show ? "block" : "none"; } function closeKebabMenu(){ document.getElementById("myDropdown").classList.toggle("show"); }
geeksforgeeks-footer-logo
Corporate & Communications Address:
A-143, 7th Floor, Sovereign Corporate Tower, Sector- 136, Noida, Uttar Pradesh (201305)
Registered Address:
K 061, Tower K, Gulshan Vivante Apartment, Sector 137, Noida, Gautam Buddh Nagar, Uttar Pradesh, 201305
GFG App on Play Store GFG App on App Store
Advertise with us
  • Company
  • About Us
  • Legal
  • Privacy Policy
  • In Media
  • Contact Us
  • Advertise with us
  • GFG Corporate Solution
  • Placement Training Program
  • Languages
  • Python
  • Java
  • C++
  • PHP
  • GoLang
  • SQL
  • R Language
  • Android Tutorial
  • Tutorials Archive
  • DSA
  • Data Structures
  • Algorithms
  • DSA for Beginners
  • Basic DSA Problems
  • DSA Roadmap
  • Top 100 DSA Interview Problems
  • DSA Roadmap by Sandeep Jain
  • All Cheat Sheets
  • Data Science & ML
  • Data Science With Python
  • Data Science For Beginner
  • Machine Learning
  • ML Maths
  • Data Visualisation
  • Pandas
  • NumPy
  • NLP
  • Deep Learning
  • Web Technologies
  • HTML
  • CSS
  • JavaScript
  • TypeScript
  • ReactJS
  • NextJS
  • Bootstrap
  • Web Design
  • Python Tutorial
  • Python Programming Examples
  • Python Projects
  • Python Tkinter
  • Python Web Scraping
  • OpenCV Tutorial
  • Python Interview Question
  • Django
  • Computer Science
  • Operating Systems
  • Computer Network
  • Database Management System
  • Software Engineering
  • Digital Logic Design
  • Engineering Maths
  • Software Development
  • Software Testing
  • DevOps
  • Git
  • Linux
  • AWS
  • Docker
  • Kubernetes
  • Azure
  • GCP
  • DevOps Roadmap
  • System Design
  • High Level Design
  • Low Level Design
  • UML Diagrams
  • Interview Guide
  • Design Patterns
  • OOAD
  • System Design Bootcamp
  • Interview Questions
  • Inteview Preparation
  • Competitive Programming
  • Top DS or Algo for CP
  • Company-Wise Recruitment Process
  • Company-Wise Preparation
  • Aptitude Preparation
  • Puzzles
  • School Subjects
  • Mathematics
  • Physics
  • Chemistry
  • Biology
  • Social Science
  • English Grammar
  • Commerce
  • World GK
  • GeeksforGeeks Videos
  • DSA
  • Python
  • Java
  • C++
  • Web Development
  • Data Science
  • CS Subjects
@GeeksforGeeks, Sanchhaya Education Private Limited, All rights reserved
We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy
Lightbox
Improvement
Suggest Changes
Help us improve. Share your suggestions to enhance the article. Contribute your expertise and make a difference in the GeeksforGeeks portal.
geeksforgeeks-suggest-icon
Create Improvement
Enhance the article with your expertise. Contribute to the GeeksforGeeks community and help create better learning resources for all.
geeksforgeeks-improvement-icon
Suggest Changes
min 4 words, max Words Limit:1000

Thank You!

Your suggestions are valuable to us.

What kind of Experience do you want to share?

Interview Experiences
Admission Experiences
Career Journeys
Work Experiences
Campus Experiences
Competitive Exam Experiences