基于计算图的全连接层反向传播导数公式推导

mac2024-05-10  33

0. 前言

这一部分是斋藤康毅先生所编写的《深度学习入门·基于Python的理论和实现》的补充,在Chapter 5 · 误差反向传播法中,作者对全连接层和Softmax层只给出了公式,对相关推导过程进行了省略,本文主要解决问题为 : 作者在文中所提出的公式如下所示,该公式应如何理解

假设全连接层计算公式为 : Y = X * W + B,则有 : 1. ∂ L ∂ B = ∂ L ∂ Y \frac {\partial L}{\partial B} = \frac {\partial L}{\partial Y} BL=YL 2. ∂ L ∂ W = X T ∗ ∂ L ∂ Y \frac {\partial L}{\partial W} = X^{T} * \frac {\partial L}{\partial Y} WL=XTYL 3. ∂ L ∂ X = ∂ L ∂ Y ∗ W T \frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * W^{T} XL=YLWT

PS :

本文为本人的阅读笔记,只能作为对书本的补充理解,具体的知识请参阅书本在阅读前,您需阅读该文章,并对 在正向传播时有分支流出,则反向传播时它们的反向传播值会相加 这一结论有所理解 : 基于计算图的Softmax层反向传播推导

1. 全连接层计算图

2. 公式解释

∂ L ∂ B = ∂ L ∂ Y \frac {\partial L}{\partial B} = \frac {\partial L}{\partial Y} BL=YL

由图可知,B 与 X * W 关系为相加,对加结点而言,下级导数等于上级导数

∂ L ∂ W = X T ∗ ∂ L ∂ Y \frac {\partial L}{\partial W} = X^{T} * \frac {\partial L}{\partial Y} WL=XTYL

矩阵形状以图中样例为例,W 矩阵为 (2,3) ,X 矩阵为 (1,2),令 M = X * W ,假设 wij 为对 mi 而言,xj 的权重,由全连接层计算公式,有 : m i = ∑ w i j ∗ x j m_{i} = \sum {w_{ij} * x_{j}} mi=wijxj

所以可知 : wij 在全连接层输出Y的计算中,出现且只出现一次,所以 : ∂ Y ∂ w i j = x j \frac {\partial Y}{\partial w_{ij}} = x_{j} wijY=xj

又 : 对 mi 而言,上层传递的导数为 ∂ L ∂ y i \frac{\partial L}{\partial y_{i}} yiL

以该图为例构造L对参数W的导数矩阵U,以实现更新公式 : W = W - α * U,则有 : ∂ L ∂ W = ∂ L ∂ Y ∗ ∂ Y ∂ W = [ ∂ L ∂ y 1 ∗ x 1 ∂ L ∂ y 2 ∗ x 1 ∂ L ∂ y 3 ∗ x 1 ∂ L ∂ y 1 ∗ x 2 ∂ L ∂ y 2 ∗ x 2 ∂ L ∂ y 3 ∗ x 2 ] = [ x 1 x 2 ] ∗ [ ∂ L ∂ y 1 ∂ L ∂ y 2 ∂ L ∂ y 3 ] = X T ∗ ∂ L ∂ Y \frac{\partial L}{\partial W} = \frac{\partial L}{\partial Y} * \frac{\partial Y}{\partial W} = \left[ \begin{matrix} \frac{\partial L}{\partial y_{1}} * x_{1} & \frac{\partial L}{\partial y_{2}} * x_{1} & \frac{\partial L}{\partial y_{3}} * x_{1} \\ \frac{\partial L}{\partial y_{1}} * x_{2} & \frac{\partial L}{\partial y_{2}} * x_{2} & \frac{\partial L}{\partial y_{3}} * x_{2} \end{matrix} \right] = \left[ \begin{matrix} x_{1} \\ x_{2} \end{matrix} \right] * \left[ \begin{matrix} \frac{\partial L}{\partial y_{1}} & \frac{\partial L}{\partial y_{2}} & \frac{\partial L}{\partial y_{3}} \end{matrix} \right] = X^{T} * \frac{\partial L}{\partial Y} WL=YLWY=[y1Lx1y1Lx2y2Lx1y2Lx2y3Lx1y3Lx2]=[x1x2][y1Ly2Ly3L]=XTYL

∂ L ∂ X = ∂ L ∂ Y ∗ W T \frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * W^{T} XL=YLWT

假设 Y( x1,x2 ) = Y( u( x1 , x2 ) , f( x1, x2 ) , φ( x1, x2 ) ),其中 u , f , φ 对应着 y1 , y2 , y3 的输出,以 x1 为例,有 :

∂ L ∂ x 1 = ∂ L ∂ Y ∗ ∂ Y ∂ x 1 = ∂ L ∂ Y ∗ ( ∂ Y ∂ u ∗ ∂ u ∂ x 1 , ∂ Y ∂ f ∗ ∂ f ∂ x 1 , ∂ Y ∂ φ ∗ ∂ φ ∂ x 1 ) = ∂ L ∂ Y ∗ ( w 11 , w 12 , w 13 ) T = w 11 ∗ ∂ L ∂ y 1 + w 12 ∗ ∂ L ∂ y 2 + w 13 ∗ ∂ L ∂ y 3 \frac {\partial L}{\partial x_{1}} = \frac {\partial L}{\partial Y } * \frac {\partial Y}{\partial x_{1}} = \frac {\partial L}{\partial Y} * (\frac {\partial Y}{\partial u} * \frac {\partial u}{\partial x_{1}} , \frac {\partial Y}{\partial f} * \frac {\partial f}{\partial x_{1}} , \frac {\partial Y}{\partial φ} * \frac {\partial φ}{\partial x_{1}}) = \frac {\partial L}{\partial Y} * (w_{11} , w_{12} , w_{13})^{T} = w_{11}*\frac {\partial L}{\partial y1} + w_{12} * \frac {\partial L}{\partial y2} + w_{13} * \frac {\partial L}{\partial y3} x1L=YLx1Y=YL(uYx1u,fYx1f,φYx1φ)=YL(w11,w12,w13)T=w11y1L+w12y2L+w13y3L

即 :

∂ L ∂ x 1 = ∂ L ∂ Y ∗ ( w 11 , w 12 , w 13 ) T \frac {\partial L}{\partial x_{1}} = \frac {\partial L}{\partial Y } * (w_{11} , w_{12} , w_{13})^{T} x1L=YL(w11,w12,w13)T

∂ L ∂ x 2 = ∂ L ∂ Y ∗ ( w 21 , w 22 , w 23 ) T \frac {\partial L}{\partial x_{2}} = \frac {\partial L}{\partial Y} * (w_{21} , w_{22} , w_{23})^{T} x2L=YL(w21,w22,w23)T

所以 :

∂ L ∂ X = ∂ L ∂ Y ∗ [ w 11 w 12 w 13 w 21 w 22 w 23 ] = ∂ L ∂ Y ∗ W T \frac {\partial L}{\partial X} = \frac {\partial L}{\partial Y} * \left[ \begin{matrix} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \end{matrix} \right] = \frac {\partial L}{\partial Y} * W^{T} XL=YL[w11w21w12w22w13w23]=YLWT

最新回复(0)