使用Python进行数据分析和可视化|S2

先决条件：Python中的NumPy, 使用Python进行数据分析可视化|S1（http://www.srcmini02.com/68935.html）

1.以CSV格式存储DataFrame：

大熊猫提供to.csv(‘filename’, index =” False | True”)函数将DataFrame写入CSV文件。这里文档名称是你要创建的CSV文件的名称, 编号指示应覆盖或不覆盖DataFrame的索引(如果默认)。如果我们设置索引=错误则索引不会被覆盖。默认情况下, 索引的值为true然后索引被覆盖。

示例：

import pandas as pd
  
# assigning three series to s1, s2, s3
s1 = pd.Series([ 0 , 4 , 8 ])
s2 = pd.Series([ 1 , 5 , 9 ])
s3 = pd.Series([ 2 , 6 , 10 ])
  
# taking index and column values
dframe = pd.DataFrame([s1, s2, s3])
  
# assign column name
dframe.columns = [ 'Geeks' , 'For' , 'Geeks' ]
  
# write data to csv file
dframe.to_csv( 'srcmini.csv' , index = False )  
dframe.to_csv( 'srcmini1.csv' , index = True )

输出：

srcmini1.csv


srcmini2.csv

2.处理丢失的数据

数据分析阶段还具有处理我们数据集中缺失数据的能力, 因此, 熊猫也同样可以达到预期。这是哪里Dropna和/或Fillna方法发挥作用。处理丢失的数据时, 作为数据分析师, 你应该删除包含以下内容的列：NaN值(dropna方法)或用整个列条目的均值或众数填充缺失数据(fillna方法), 此决定具有重要意义, 取决于数据以及影响结果的因素。

删除丢失的数据：

考虑这是下面的代码生成的DataFrame：

import pandas as pd
  
# Create a DataFrame
dframe = pd.DataFrame({ 'Geeks' : [ 23 , 24 , 22 ], 'For' : [ 10 , 12 , np.nan], 'geeks' : [ 0 , np.nan, np.nan]}, columns = [ 'Geeks' , 'For' , 'geeks' ])
  
# This will remove all the
# rows with NAN values
  
# If axis is not defined then
# it is along rows i.e. axis = 0
dframe.dropna(inplace = True )
print (dframe)
  
# if axis is equal to 1
dframe.dropna(axis = 1 , inplace = True )
  
print (dframe)

输出：

axis=0
 

axis=1

填充缺失值：

现在，要用数据的平均值或模式替换任何NaN值，使用fillna，它可以根据需要替换特定列甚至整个数据帧中的所有NaN值。

import numpy as np
import pandas as pd
  
# Create a DataFrame
dframe = pd.DataFrame({ 'Geeks' : [ 23 , 24 , 22 ], 'For' : [ 10 , 12 , np.nan], 'geeks' : [ 0 , np.nan, np.nan]}, columns = [ 'Geeks' , 'For' , 'geeks' ])
  
# Use fillna of complete Dataframe 
  
# value function will be applied on every column
dframe.fillna(value = dframe.mean(), inplace = True )
print (dframe)
  
# filling value of one column
dframe[ 'For' ].fillna(value = dframe[ 'For' ].mean(), inplace = True )
print (dframe)

输出：

3. Groupby方法(聚合)：

groupby方法允许我们根据任何行或列将数据分组在一起, 因此我们可以进一步应用聚合函数来分析数据。使用映射器(dict或键函数, 将给定函数应用于组, 将结果作为序列返回)或按一系列列对系列进行分组。

考虑这是下面的代码生成的DataFrame：

import pandas as pd
import numpy as np
  
# create DataFrame
dframe = pd.DataFrame({ 'Geeks' : [ 23 , 24 , 22 , 22 , 23 , 24 ], 'For' : [ 10 , 12 , 13 , 14 , 15 , 16 ], 'geeks' : [ 122 , 142 , 112 , 122 , 114 , 112 ]}, columns = [ 'Geeks' , 'For' , 'geeks' ]) 
  
# Apply groupby and aggregate function
# max to find max value of column 
  
# &quot;For&quot; and column &quot;geeks&quot; for every
# different value of column &quot;Geeks&quot;.
  
print (dframe.groupby([ 'Geeks' ]). max ())

输出：

相关推荐

评论抢沙发

评论前必须登录！

猜你喜欢

热门标签

回顶部

相关推荐

评论 抢沙发

评论前必须登录！

猜你喜欢

热门标签

回顶部

评论抢沙发