pandas 笔记

mac2026-02-28  8

写在前面:

dataframe 和 series 是 pandas 的数据类型list 和 dict 是 python 的数据类型ndarray 是 numpy的数据类型NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column.

创建对象

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [3]: s = pd.Series([1, 3, 5, np.nan, 6, 8]) In [4]: s Out[4]: 0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 dtype: float64

Creating a DataFrame by passing a dict of objects that can be converted to series-like.

In [9]: df2 = pd.DataFrame({'A': 1., ...: 'B': pd.Timestamp('20130102'), ...: 'C': pd.Series(1, index=list(range(4)), dtype='float32'), ...: 'D': np.array([3] * 4, dtype='int32'), ...: 'E': pd.Categorical(["test", "train", "test", "train"]), ...: 'F': 'foo'}) ...: In [10]: df2 Out[10]: A B C D E F 0 1.0 2013-01-02 1.0 3 test foo 1 1.0 2013-01-02 1.0 3 train foo 2 1.0 2013-01-02 1.0 3 test foo 3 1.0 2013-01-02 1.0 3 train foo

The columns of the resulting DataFrame have different dtypes.

In [11]: df2.dtypes Out[11]: A float64 B datetime64[ns] C float32 D int32 E category F object dtype: object

Viewing data

Display the index, columns:

In [15]: df.index Out[15]: DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D') In [16]: df.columns Out[16]: Index(['A', 'B', 'C', 'D'], dtype='object')

Transposing your data:

In [20]: df.T Out[20]: 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06 A 0.469112 1.212112 -0.861849 0.721555 -0.424972 -0.673690 B -0.282863 -0.173215 -2.104569 -0.706771 0.567020 0.113648 C -1.509059 0.119209 -0.494929 -1.039575 0.276232 -1.478427 D -1.135632 -1.044236 1.071804 0.271860 -1.087401 0.524988

Sorting by an axis:

In [21]: df.sort_index(axis=1, ascending=False) Out[21]: D C B A 2013-01-01 -1.135632 -1.509059 -0.282863 0.469112 2013-01-02 -1.044236 0.119209 -0.173215 1.212112 2013-01-03 1.071804 -0.494929 -2.104569 -0.861849 2013-01-04 0.271860 -1.039575 -0.706771 0.721555 2013-01-05 -1.087401 0.276232 0.567020 -0.424972 2013-01-06 0.524988 -1.478427 0.113648 -0.673690

Sorting by values:

In [22]: df.sort_values(by='B') Out[22]: A B C D 2013-01-03 -0.861849 -2.104569 -0.494929 1.071804 2013-01-04 0.721555 -0.706771 -1.039575 0.271860 2013-01-01 0.469112 -0.282863 -1.509059 -1.135632 2013-01-02 1.212112 -0.173215 0.119209 -1.044236 2013-01-06 -0.673690 0.113648 -1.478427 0.524988 2013-01-05 -0.424972 0.567020 0.276232 -1.087401
最新回复(0)