任务:对金融图表进行检验——(线性回归)普通最小二乘回归(OLS)方法 EURO STOXX 50指数的历史每日收盘价的数据集的获取,有两种方法,推荐下面这种方法:
import pandas as pd es_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt' cols = ['SX5P', 'SX5E', 'SXXP', 'SXXE', 'SXXF', 'SXXA', 'DK5F', 'DKXF'] es = pd.read_csv(es_url, index_col=0, parse_dates=True, sep=';', dayfirst=True, header=None, skiprows=4, names=cols) es.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 7673 entries, 1986-12-31 to 2016-10-04 Data columns (total 8 columns): SX5P 7673 non-null float64 SX5E 7673 non-null float64 SXXP 7673 non-null float64 SXXE 7673 non-null float64 SXXF 7673 non-null float64 SXXA 7673 non-null float64 DK5F 7673 non-null float64 DKXF 7673 non-null float64 dtypes: float64(8) memory usage: 539.5 KB关于VSROXX的历史每日收盘价的数据集,类似的,可直接从网页上获取并处理
vs_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/h_vstoxx.txt' vs = pd.read_csv(vs_url, index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True) vs.info() <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 4357 entries, 1999-01-04 to 2016-02-12 Data columns (total 9 columns): V2TX 4357 non-null float64 V6I1 3906 non-null float64 V6I2 4357 non-null float64 V6I3 4296 non-null float64 V6I4 4357 non-null float64 V6I5 4357 non-null float64 V6I6 4340 non-null float64 V6I7 4357 non-null float64 V6I8 4343 non-null float64 dtypes: float64(9) memory usage: 340.4 KB观察下面的代码的输出结果,发现VSTOXX只从1999-01-01开始才有数据且到2016-02-12 就没有了
vs.head() Out[4]: V2TX V6I1 V6I2 ... V6I6 V6I7 V6I8 Date ... 1999-01-04 18.2033 21.2458 17.5555 ... 33.2232 31.8535 23.8209 1999-01-05 29.6912 36.6400 28.4274 ... 32.8457 32.2904 25.0532 1999-01-06 25.1670 25.4107 25.1351 ... 32.9260 33.2871 26.0107 1999-01-07 32.5205 35.4410 32.2004 ... 33.2843 33.7269 26.2205 1999-01-08 33.2296 35.8846 33.0020 ... 33.4837 33.9227 26.3672 [5 rows x 9 columns] es.head() Out[5]: SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF 1986-12-31 775.00 900.82 82.76 98.58 98.06 69.06 645.26 65.56 1987-01-01 775.00 900.82 82.76 98.58 98.06 69.06 645.26 65.56 1987-01-02 770.89 891.78 82.57 97.80 97.43 69.37 647.62 65.81 1987-01-05 771.89 898.33 82.82 98.60 98.19 69.16 649.94 65.82 1987-01-06 775.92 902.32 83.28 99.19 98.83 69.50 652.49 66.06 es.tail() Out[6]: SX5P SX5E SXXP ... SXXA DK5F DKXF 2016-09-28 2846.55 2991.11 342.57 ... 350.45 9072.09 581.27 2016-09-29 2848.93 2991.58 342.72 ... 350.90 9112.09 582.60 2016-09-30 2843.17 3002.24 342.92 ... 350.09 9115.81 583.26 2016-10-03 2845.43 2998.50 343.23 ... 350.92 9131.24 584.32 2016-10-04 2871.06 3029.50 346.10 ... 353.92 9212.05 588.71 [5 rows x 8 columns] vs.tail() Out[7]: V2TX V6I1 V6I2 ... V6I6 V6I7 V6I8 Date ... 2016-02-08 33.3917 34.7251 33.2189 ... 30.3705 29.8478 29.2559 2016-02-09 33.9664 35.7884 33.7800 ... 31.1659 30.5339 29.9000 2016-02-10 33.4528 34.6020 33.3637 ... 30.9937 30.8174 30.2533 2016-02-11 38.3051 41.2095 38.1347 ... 33.4126 32.2234 31.2124 2016-02-12 35.6846 35.3849 35.6965 ... 32.8208 32.3189 31.8194 [5 rows x 9 columns]那么为了实现回归分析,我们只取es和vs在1999-01-01-2016-02-12的数据进行分析,通过新生产一个DataFrame对象来处理
import datetime as dt start = dt.datetime(1999,1,1) end = dt.datetime(2016,2,12) data = pd.DataFrame({'EUROSTOXX': es['SX5E'][es.index > start]}) data = pd.DataFrame({'EUROSTOXX': data['EUROSTOXX'][data.index < end]}) data1 = pd.DataFrame({'VSTOXX': vs['V2TX'][vs.index > start]}) data = data.join(pd.DataFrame({'VSTOXX': data1['VSTOXX'][data1.index < end]})) data.head() Out[34]: EUROSTOXX VSTOXX 1999-01-04 3543.10 18.2033 1999-01-05 3604.67 29.6912 1999-01-06 3685.36 25.1670 1999-01-07 3627.87 32.5205 1999-01-08 3616.57 33.2296 data.tail() Out[5]: EUROSTOXX VSTOXX 2016-02-05 2879.39 30.0529 2016-02-08 2785.17 33.3917 2016-02-09 2736.50 33.9664 2016-02-10 2789.05 33.4528 2016-02-11 2680.35 38.3051从data.tail()的输出可以看出,改DataFrame对象存在缺失值,我们可以使用时间序列中最后的可用值填补遗漏的数值,调用fillna函数,以ffill(用于向前填充)作为method参数的值,另一个选择是bfill(用于向后填充),但会造成“预见”问题:
data = data.fillna(method='ffill') data.tail() EUROSTOXX VSTOXX 2016-09-28 2991.11 35.6846 2016-09-29 2991.58 35.6846 2016-09-30 3002.24 35.6846 2016-10-03 2998.50 35.6846 2016-10-04 3029.50 35.6846绘制data数据集的两列的图
data.plot(subplots=1,grid=1,style='b',figsize=(18,6))从上面两个图可以看出,两个指数似乎存在某种负相关,为了验证,可以使用对数收益率
rets = np.log(data /data.shift(1)) rets.plot(subplots=1,grid=1,style='b',figsize=(8,6)) # 清除inf和nan rets[np.isnan(rets)] = 0 rets[np.isinf(rets)] = 0 #训练线性回归方程 X = sm.add_constant(rets['EUROSTOXX']) y = rets['VSTOXX'] result = sm.OLS(y,X).fit() result.summary() <class 'statsmodels.iolib.summary.Summary'> """ OLS Regression Results ============================================================================== Dep. Variable: VSTOXX R-squared: 0.543 Model: OLS Adj. R-squared: 0.543 Method: Least Squares F-statistic: 5209. Date: Fri, 01 Nov 2019 Prob (F-statistic): 0.00 Time: 23:43:05 Log-Likelihood: 7966.4 No. Observations: 4386 AIC: -1.593e+04 Df Residuals: 4384 BIC: -1.592e+04 Df Model: 1 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -1.147e-05 0.001 -0.019 0.985 -0.001 0.001 EUROSTOXX -2.8463 0.039 -72.172 0.000 -2.924 -2.769 ============================================================================== Omnibus: 1316.192 Durbin-Watson: 2.093 Prob(Omnibus): 0.000 Jarque-Bera (JB): 24626.421 Skew: 0.951 Prob(JB): 0.00 Kurtosis: 14.451 Cond. No. 66.4 ============================================================================== Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. """显然,的确存在强负相关
result.params Out[6]: const -0.000011 EUROSTOXX -2.846299 dtype: float64将这个输入与原始对数收益率相结合,用于生成对数收益率散点图和回归线
plt.plot(rets['EUROSTOXX'], y, 'r.') ax = plt.axis() x = np.linspace(ax[0], ax[1] + 0.01) plt.plot(rets['EUROSTOXX'], result.params.const + result.params.EUROSTOXX*rets['EUROSTOXX'],'b',lw=2) plt.grid(True) plt.axis('tight') plt.xlabel('EURO STOXX 50 returns') plt.ylabel('VSTOXX returns') plt.show()作为交叉检验,可以直接计算他们的相关性
rets.corr() Out[18]: EUROSTOXX VSTOXX EUROSTOXX 1.000000 -0.736878 VSTOXX -0.736878 1.000000尽管在整个数据集上有强烈的负相关,但是不同时期之间有显著的不同,如下图
rets['EUROSTOXX'].rolling(window=252).corr(rets['VSTOXX']).plot(grid=True , style='b')