pandas(大水文) | 忘生的blog

大水文罢了。😁😁😁😁😅

1	import pandas as pd

Series 操作

Series 创建

1 2	t=pd.Series([1,2,3,4,5])#Series是一维的数据，带标签的数组 t#第一列为索引第二列是我们的数据

0    1
1    2
2    3
3    4
4    5
dtype: int64

1 2	t2=pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) t2#index里面为索引

a    1
b    2
c    3
d    4
e    5
dtype: int64

#通过字典来创建Series   
temp_dict={"name":"wangsheng",
            "age":100,
            "tel":10086}
t3=pd.Series(temp_dict)
t3

name    wangsheng
age           100
tel         10086
dtype: object

Series 切片和索引

t3

name    wangsheng
age           100
tel         10086
dtype: object

t3["age"]

t3['tel']

1 2	print(t3[0]) t3[2]

wangsheng





10086

t3[:2]#连续

name    wangsheng
age           100
dtype: object

1	t3[[1,2]]#不连续取值

age      100
tel    10086
dtype: object

1	t3[["age",'tel']]

age      100
tel    10086
dtype: object

0    1
1    2
2    3
3    4
4    5
dtype: int64

1	t[t>3]#找出t大于3的数据

3    4
4    5
dtype: int64

1	t3.index#获取索引

Index(['name', 'age', 'tel'], dtype='object')

1	t3.values#获取数据

array(['wangsheng', 100, 10086], dtype=object)

读取外部数据

df=pd.read_csv("./dogNames2.csv")
print(df.head())
row=df["Row_Labels"]
print(row.shape)

  Row_Labels  Count_AnimalName
0          1                 1
1          2                 2
2      40804                 1
3      90201                 1
4      90203                 1
(16220,)

DataFrame

1 2	import numpy as np pd.DataFrame(np.arange(12).reshape((3,-1)))

	0	1	2	3
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11

DataFrame对象既有行索引，又有列索引
行索引，表明不同行，横向索引，叫index，0轴，axis=0

列索引，表名不同列，纵向索引，叫columns，1轴，axis=1

1	pd.DataFrame(np.arange(12).reshape((3,-1)),index=['a','b','c'],columns=['A','B','C','D'])

	A	B	C	D
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11

1.DataFrame和Series有什么关系呢？
DataFrame每一行是一个Series

2.Series能够传入字典，那么DataFrame能够传入字典作为数据么？那么mongodb的数据是不是也可以这样传入呢？

d1={"name":["xiaoming","xiaogang"],
    "age":[28,19],
    "tel":[10086,10010]}
pd.DataFrame(d1)

	name	age	tel
0	xiaoming	28	10086
1	xiaogang	19	10010

d2=[{"name":"xiaoming","age":32,"tel":10086},
    {"name":"xiaogang","tel":78},
    {"name":"ss","age":10}]
a=pd.DataFrame(d2)#DataFrame可以有缺失值
a

	name	age	tel
0	xiaoming	32.0	10086.0
1	xiaogang	NaN	78.0
2	ss	10.0	NaN

DataFrame属性

属性	含义
df.shape	行数，列数
df.dtypes	列数据类型
df.ndim	数据维度
df.index	行索引
df.columns	列索引
df.values	对象值，是二维ndarray数组

DataFrame整体情况查询

函数	作用
df.head(3)	显示头部几行，默认五行
df.tail(2)	显示尾部几行，默认5行
df.info()	相关信息概述：行数，列数，列索引，列非空值个数，列类型，行类型，内存占用
df.describe()	快速综合统计结果：计数，均值，标准差，最大值，四分位数，最小值

1 2	df=pd.read_csv("./dogNames2.csv") df.head()

	Row_Labels	Count_AnimalName
0	1	1
1	2	2
2	40804	1
3	90201	1
4	90203	1

1	df.describe()

	Count_AnimalName
count	16220.000000
mean	6.558631
std	31.862511
min	1.000000
25%	1.000000
50%	1.000000
75%	3.000000
max	1195.000000

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16220 entries, 0 to 16219
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Row_Labels        16217 non-null  object
 1   Count_AnimalName  16220 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 253.6+ KB

1
2
3

#DataFrame排序方法
df=df.sort_values(by="Count_AnimalName")#默认从小到大排序
df.head()

	Row_Labels	Count_AnimalName
0	1	1
9383	MERINO	1
9384	MERISE	1
9386	MERLEDEZ	1
9389	MERLYN	1

1 2	df=df.sort_values(by="Count_AnimalName",ascending=False)#从大到小排序 df.head()

	Row_Labels	Count_AnimalName
1156	BELLA	1195
9140	MAX	1153
2660	CHARLIE	856
3251	COCO	852
12368	ROCKY	823

DataFrame取行或取列

1	df[:10]#取前二十行

	Row_Labels	Count_AnimalName
1156	BELLA	1195
9140	MAX	1153
2660	CHARLIE	856
3251	COCO	852
12368	ROCKY	823
8417	LOLA	795
8552	LUCKY	723
8560	LUCY	710
2032	BUDDY	677
3641	DAISY	649

1	df["Count_AnimalName"][:10]#取前20行Count_AnimalNames列

1156     1195
9140     1153
2660      856
3251      852
12368     823
8417      795
8552      723
8560      710
2032      677
3641      649
Name: Count_AnimalName, dtype: int64

pandas取行或取列注意点：
-方括号写数字，表示对行进行操作
- 方括号写字符串，表示对列进行操作

loc索引

df.loc 通过标签索引行数据
df.iloc 通过位置获取行数据

1 2	t=pd.DataFrame(np.arange(12).reshape((3,-1)),index=["a",'b','c'],columns=['W','X',"Y",'Z']) t

	W	X	Y	Z
a	0	1	2	3
b	4	5	6	7
c	8	9	10	11

1	t.loc['a']#取a行

W    0
X    1
Y    2
Z    3
Name: a, dtype: int32

1	t.loc['a','Z']#取a行z列

1	t.loc[['a','c'],['W','Z']]#取间隔的多行多列

	W	Z
a	0	3
c	8	11

1	t.loc['a':'c','W':'Y']#可以看出，冒号在loc中是闭合的

	W	X	Y
a	0	1	2
b	4	5	6
c	8	9	10

1	t.iloc[0]#按位置取第一行

W    0
X    1
Y    2
Z    3
Name: a, dtype: int32

1	t.iloc[:,2]#去第二列

a     2
b     6
c    10
Name: Y, dtype: int32

1	t.iloc[[0,2],[1,2]]

	X	Y
a	1	2
c	9	10

1	t.iloc[0:2]

	W	X	Y	Z
a	0	1	2	3
b	4	5	6	7

1 2	t.loc["a","Y"]=100 t#赋值改变数据

	W	X	Y	Z
a	0	1	100	3
b	4	5	6	7
c	8	9	10	11

bool索引

1	df[(df["Count_AnimalName"]>800)&(df["Count_AnimalName"]<1000)]#大于800小于100

	Row_Labels	Count_AnimalName
2660	CHARLIE	856
3251	COCO	852
12368	ROCKY	823

1 2	#假如我们想找到所有的使用次数超过700并且名字的字符串的长度大于4的狗的名字 df[(df["Row_Labels"].str.len()>4)&(df["Count_AnimalName"]>800)]

	Row_Labels	Count_AnimalName
1156	BELLA	1195
2660	CHARLIE	856
12368	ROCKY	823

缺失数据处理

	name	age	tel
0	xiaoming	32.0	10086.0
1	xiaogang	NaN	78.0
2	ss	10.0	NaN

1	pd.isnull(a)#判断是否为nan

	name	age	tel
0	False	False	False
1	False	True	False
2	False	False	True

1	pd.notnull(a)#判断是否不为nan

	name	age	tel
0	True	True	True
1	True	False	True
2	True	True	False

处理方式1：删除NaN所在的行列dropna (axis=0, how=‘any’, inplace=False)
处理方式2：填充数据，t.fillna(t.mean()),t.fiallna(t.median()),t.fillna(0)

处理为0的数据：t[t==0]=np.nan
当然并不是每次为0的数据都需要处理
计算平均值等情况，nan是不参与计算的，但是0会

1 2	#处理方式1 a.dropna(axis=0)#默认情况下将有nan的行全删了

	name	age	tel
0	xiaoming	32.0	10086.0

1	a.dropna(axis=0,how='all')#当一行全部为nan才会删除为“any’只要有就删除

	name	age	tel
0	xiaoming	32.0	10086.0
1	xiaogang	NaN	78.0
2	ss	10.0	NaN

1 2	### 方式2 a.fillna(a.mean())

: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.
  a.fillna(a.mean())

	name	age	tel
0	xiaoming	32.0	10086.0
1	xiaogang	21.0	78.0
2	ss	10.0	5082.0

1 2	a['age']=a['age'].fillna(a["age"].mean()) a#将age列的nan替换为age列的平均值

	name	age	tel
0	xiaoming	32.0	10086.0
1	xiaogang	21.0	78.0
2	ss	10.0	NaN

pandas 数据合并与分组聚合

1
2
3

df=pd.read_csv("./IMDB-Movie-Data.csv")
print(df.head(1))
df.info()

   Rank                    Title                    Genre  \
0     1  Guardians of the Galaxy  Action,Adventure,Sci-Fi   

                                         Description    Director  \
0  A group of intergalactic criminals are forced ...  James Gunn   

                                              Actors  Year  Runtime (Minutes)  \
0  Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...  2014                121   

   Rating   Votes  Revenue (Millions)  Metascore  
0     8.1  757074              333.13       76.0  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  872 non-null    float64
 11  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(5)
memory usage: 93.9+ KB

统计电影分类情况
思路：重新构造一个全为0的数组，列名为分类，如果某一条数据中分类出现过，就让0变为1

1
2
3

#统计分类的列表
temp_list=df["Genre"].str.split(",").tolist()
print(temp_list[:5])

[['Action', 'Adventure', 'Sci-Fi'], ['Adventure', 'Mystery', 'Sci-Fi'], ['Horror', 'Thriller'], ['Animation', 'Comedy', 'Family'], ['Action', 'Adventure', 'Fantasy']]

1 2	genre_list=list(set([i for j in temp_list for i in j]))#set为集合可以去重 print(genre_list)#获得分类列表

['Horror', 'Animation', 'Action', 'Thriller', 'History', 'Adventure', 'Family', 'Mystery', 'War', 'Sport', 'Fantasy', 'Sci-Fi', 'Comedy', 'Crime', 'Biography', 'Music', 'Romance', 'Musical', 'Western', 'Drama']

1
2
3

#构造全为0的数组
zero_df=pd.DataFrame(np.zeros((df.shape[0],len(genre_list))),columns=genre_list)
zero_df.head()

	Horror	Animation	Action	Thriller	History	Adventure	Family	Mystery	War	Sport	Fantasy	Sci-Fi	Comedy	Crime	Biography	Music	Romance	Musical	Western	Drama
0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

#给出每个电影分类出现的位置赋值为1
for i in range(df.shape[0]):
    zero_df.loc[i,temp_list[i]]=1
    #zero_df.loc[i,['Action', 'Adventure', 'Sci-Fi']]
zero_df.head()

	Horror	Animation	Action	Thriller	Adventure	Family	Mystery	Fantasy	Sci-Fi	Comedy
0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0
1	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0
2	1.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0
4	0.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0

1
2
3

#统计每个分类电影的和
genre_count=zero_df.sum(axis=0)
genre_count

Horror       119.0
Animation     49.0
Action       303.0
Thriller     195.0
History       29.0
Adventure    259.0
Family        51.0
Mystery      106.0
War           13.0
Sport         18.0
Fantasy      101.0
Sci-Fi       120.0
Comedy       279.0
Crime        150.0
Biography     81.0
Music         16.0
Romance      141.0
Musical        5.0
Western        7.0
Drama        513.0
dtype: float64

1 2	#排序 genre_count.sort_values()

Musical        5.0
Western        7.0
War           13.0
Music         16.0
Sport         18.0
History       29.0
Animation     49.0
Family        51.0
Biography     81.0
Fantasy      101.0
Mystery      106.0
Horror       119.0
Sci-Fi       120.0
Romance      141.0
Crime        150.0
Thriller     195.0
Adventure    259.0
Comedy       279.0
Action       303.0
Drama        513.0
dtype: float64

数组合并之join

默认情况下将行索引相同的数据合并到一起

1 2	df1=pd.DataFrame(np.ones((2,4)),index=['A','B'],columns=['a','b','c','d']) df1

	a	b	c	d
A	1.0	1.0	1.0	1.0
B	1.0	1.0	1.0	1.0

1 2	df2=pd.DataFrame(np.zeros((3,3)),index=['A','B','C'],columns=['x','y','z']) df2

	x	y	z
A	0.0	0.0	0.0
B	0.0	0.0	0.0
C	0.0	0.0	0.0

1	df1.join(df2)

	a	b	c	d	x	y	z
A	1.0	1.0	1.0	1.0	0.0	0.0	0.0
B	1.0	1.0	1.0	1.0	0.0	0.0	0.0

1 2	df2=pd.DataFrame(np.zeros((3,3)),index=['A','B','C'],columns=['x','y','z']) df2

	x	y	z
A	0.0	0.0	0.0
B	0.0	0.0	0.0
C	0.0	0.0	0.0

1	df1.join(df2)

	a	b	c	d	x	y	z
A	1.0	1.0	1.0	1.0	0.0	0.0	0.0
B	1.0	1.0	1.0	1.0	0.0	0.0	0.0

1	df2.join(df1)

	a	b	c	d
A	1.0	1.0	1.0	1.0
B	1.0	1.0	1.0	1.0
C	NaN	NaN	NaN	NaN

数据合并之merge

merge:按照指定的列把数据按照一定的方式合并到一起

1 2	df3=pd.DataFrame(np.zeros((3,3)),columns=['f','a','x']) df3

	f	a	x
0	0.0	0.0	0.0
1	0.0	0.0	0.0
2	0.0	0.0	0.0

1	df1.merge(df3,on='a')#为空集，应为默认的合并方式为inner 交集

	a	b	c	d	f	x

1 2	df3.loc[1,'a']=1 df1.merge(df3,on='a')

	a	b	c	d	f	x
0	1.0	1.0	1.0	1.0	0.0	0.0
1	1.0	1.0	1.0	1.0	0.0	0.0

1
2
3

df3=pd.DataFrame(np.arange(9).reshape((3,3)),columns=['f','a','x'])
print(df3)
df1.merge(df3,on='a')

	a	b	c	d	f	x
0	1.0	1.0	1.0	1.0	0	2
1	1.0	1.0	1.0	1.0	0	2

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                                  'A': ['A0', 'A1', 'A2', 'A3'],
                                  'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                                    'C': ['C0', 'C1', 'C2', 'C3'],
                                    'D': ['D0', 'D1', 'D2', 'D3']})
left

	key	A	B
0	K0	A0	B0
1	K1	A1	B1
2	K2	A2	B2
3	K3	A3	B3

right

	key	C	D
0	K0	C0	D0
1	K1	C1	D1
2	K2	C2	D2
3	K3	C3	D3

1	left.merge(right,on='key')

	key	A	B	C	D
0	K0	A0	B0	C0	D0
1	K1	A1	B1	C1	D1
2	K2	A2	B2	C2	D2
3	K3	A3	B3	C3	D3

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                             'key2': ['K0', 'K1', 'K0', 'K1'],
                             'A': ['A0', 'A1', 'A2', 'A3'],
                             'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                              'key2': ['K0', 'K0', 'K0', 'K0'],
                              'C': ['C0', 'C1', 'C2', 'C3'],
                              'D': ['D0', 'D1', 'D2', 'D3']})
left

	key1	key2	A	B
0	K0	K0	A0	B0
1	K0	K1	A1	B1
2	K1	K0	A2	B2
3	K2	K1	A3	B3

right

	key1	key2	C	D
0	K0	K0	C0	D0
1	K1	K0	C1	D1
2	K1	K0	C2	D2
3	K2	K0	C3	D3

1 2	left.merge(right,on=['key1','key2'],how='outer') #how = ['left', 'right', 'outer', 'inner']

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K0	K1	A1	B1	NaN	NaN
2	K1	K0	A2	B2	C1	D1
3	K1	K0	A2	B2	C2	D2
4	K2	K1	A3	B3	NaN	NaN
5	K2	K0	NaN	NaN	C3	D3

1	left.merge(right,on=['key1','key2'],how='right')

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K1	K0	A2	B2	C1	D1
2	K1	K0	A2	B2	C2	D2
3	K2	K0	NaN	NaN	C3	D3

# handle overlapping
boys = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'age': [1, 2, 3]})
girls = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'age': [4, 5, 6]})
res = pd.merge(boys, girls, on='k', suffixes=['_boy', '_girl'], how='inner')
res

	k	age_boy	age_girl
0	K0	1	4
1	K0	1	5

数据分组聚合

如果我想知道美国的星巴克数量和中国的哪个多，或者我想知道中国每个省份星巴克的数量的情况，

1 2	df=pd.read_csv('.\starbucks_store_worldwide.csv') df.head(1)

	Brand	Store Number	Store Name	Ownership Type	Street Address	City	State/Province	Country	Postcode	Phone Number	Timezone	Longitude	Latitude
0	Starbucks	47370-257954	Meritxell, 96	Licensed	Av. Meritxell, 96	Andorra la Vella	7	AD	AD500	376818720	GMT+1:00 Europe/Andorra	1.53	42.51

1	df.info()#phone number,postcode等有缺失

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25600 entries, 0 to 25599
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Brand           25600 non-null  object 
 1   Store Number    25600 non-null  object 
 2   Store Name      25600 non-null  object 
 3   Ownership Type  25600 non-null  object 
 4   Street Address  25598 non-null  object 
 5   City            25585 non-null  object 
 6   State/Province  25600 non-null  object 
 7   Country         25600 non-null  object 
 8   Postcode        24078 non-null  object 
 9   Phone Number    18739 non-null  object 
 10  Timezone        25600 non-null  object 
 11  Longitude       25599 non-null  float64
 12  Latitude        25599 non-null  float64
dtypes: float64(2), object(11)
memory usage: 2.5+ MB

在pandas中类似的分组的操作我们有很简单的方式来完成

df.groupby(by=“columns_name”)

grouped=df.groupby(by="Country")
type(grouped)
#DataFrameGroupBy类型
    #可以进行遍历
#for i in grouped:
 #   print(i)#每个i是一个元组，第一个元素为国家，第二个是DataFrame
    #调用聚会方法
grouped.count()[:2]#默认对所有列进行统计

	Brand	Store Number	Store Name	Ownership Type	Street Address	City	State/Province	Postcode	Phone Number	Timezone	Longitude	Latitude
Country
AD	1	1	1	1	1	1	1	1	1	1	1	1
AE	144	144	144	144	144	144	144	24	78	144	144	144

1	grouped["Country"].count()[:3]#只对Country列进行统计

Country
AD      1
AE    144
AR    108
Name: Country, dtype: int64

1
2
3

country_count=grouped["Brand"].count()
print(country_count["US"])
country_count["CN"]

#统计中国每个省分国家店铺数量
china_data=df[df["Country"]=="CN"]
grouped=china_data.groupby(by="State/Province").count()["Brand"]
grouped[:4]

State/Province
11    236
12     58
13     24
14      8
Name: Brand, dtype: int64

1
2
3

# 数据按照多个条件进行分组
grouped = df["Brand"].groupby(by=[df["Country"],df["State/Province"]])
print(grouped)

<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001CA9031CF40>

# 数据按照多个条件进行分组后，任然返回DataFrame
t1 = df[["Country"]].groupby(by=[df["Country"],df["State/Province"]]).count()
t2 = df.groupby(by=["Country","State/Province"])[["Country"]].count()
print(type(t1),type(t2))
t1.head()

<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.frame.DataFrame'>

		Country
Country	State/Province
AD	7	1
AE	AJ	2
	AZ	48
	DU	82
	FU	2

pandas时间序列

1	pd.date_range(start='20220101',end='20220227',freq='D')#按一天

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
               '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
               '2022-01-13', '2022-01-14', '2022-01-15', '2022-01-16',
               '2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20',
               '2022-01-21', '2022-01-22', '2022-01-23', '2022-01-24',
               '2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
               '2022-01-29', '2022-01-30', '2022-01-31', '2022-02-01',
               '2022-02-02', '2022-02-03', '2022-02-04', '2022-02-05',
               '2022-02-06', '2022-02-07', '2022-02-08', '2022-02-09',
               '2022-02-10', '2022-02-11', '2022-02-12', '2022-02-13',
               '2022-02-14', '2022-02-15', '2022-02-16', '2022-02-17',
               '2022-02-18', '2022-02-19', '2022-02-20', '2022-02-21',
               '2022-02-22', '2022-02-23', '2022-02-24', '2022-02-25',
               '2022-02-26', '2022-02-27'],
              dtype='datetime64[ns]', freq='D')

我们可以使用pandas提供的方法把时间字符串转化为时间序列

df[“timeStamp”] = pd.to_datetime(df[“timeStamp”],format=“”)

format参数大部分情况下可以不用写，但是对于pandas无法格式化的时间字符串，我们可以使用该参数，比如包含中文

1	pd.date_range(start='20220101',end='20220227',freq='10D')#按10天

DatetimeIndex(['2022-01-01', '2022-01-11', '2022-01-21', '2022-01-31',
               '2022-02-10', '2022-02-20'],
              dtype='datetime64[ns]', freq='10D')

	Horror	Animation	Action	Thriller	History	Adventure	Family	Mystery	War	Sport	Fantasy	Sci-Fi	Comedy	Crime	Biography	Music	Romance	Musical	Western	Drama
0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

	Horror	Animation	Action	Thriller	Adventure	Family	Mystery	Fantasy	Sci-Fi	Comedy
0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0
1	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0
2	1.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0
4	0.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0

	a	b	c	d
A	1.0	1.0	1.0	1.0
B	1.0	1.0	1.0	1.0
C	NaN	NaN	NaN	NaN

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K0	K1	A1	B1	NaN	NaN
2	K1	K0	A2	B2	C1	D1
3	K1	K0	A2	B2	C2	D2
4	K2	K1	A3	B3	NaN	NaN
5	K2	K0	NaN	NaN	C3	D3

	Horror	Animation	Action	Thriller	History	Adventure	Family	Mystery	War	Sport	Fantasy	Sci-Fi	Comedy	Crime	Biography	Music	Romance	Musical	Western	Drama
0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

	Horror	Animation	Action	Thriller	Adventure	Family	Mystery	Fantasy	Sci-Fi	Comedy
0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0
1	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0
2	1.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0
4	0.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0

	a	b	c	d
A	1.0	1.0	1.0	1.0
B	1.0	1.0	1.0	1.0
C	NaN	NaN	NaN	NaN

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K0	K1	A1	B1	NaN	NaN
2	K1	K0	A2	B2	C1	D1
3	K1	K0	A2	B2	C2	D2
4	K2	K1	A3	B3	NaN	NaN
5	K2	K0	NaN	NaN	C3	D3

	Horror	Animation	Action	Thriller	History	Adventure	Family	Mystery	War	Sport	Fantasy	Sci-Fi	Comedy	Crime	Biography	Music	Romance	Musical	Western	Drama
0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
4	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

	Horror	Animation	Action	Thriller	Adventure	Family	Mystery	Fantasy	Sci-Fi	Comedy
0	0.0	0.0	1.0	0.0	1.0	0.0	0.0	0.0	1.0	0.0
1	0.0	0.0	0.0	0.0	1.0	0.0	1.0	0.0	1.0	0.0
2	1.0	0.0	0.0	1.0	0.0	0.0	0.0	0.0	0.0	0.0
3	0.0	1.0	0.0	0.0	0.0	1.0	0.0	0.0	0.0	1.0
4	0.0	0.0	1.0	0.0	1.0	0.0	0.0	1.0	0.0	0.0

	a	b	c	d
A	1.0	1.0	1.0	1.0
B	1.0	1.0	1.0	1.0
C	NaN	NaN	NaN	NaN

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K0	K1	A1	B1	NaN	NaN
2	K1	K0	A2	B2	C1	D1
3	K1	K0	A2	B2	C2	D2
4	K2	K1	A3	B3	NaN	NaN
5	K2	K0	NaN	NaN	C3	D3