卷积操作(原理与实现)与Max pooling实现

mac2025-05-15 39

前言

关于卷积部分我自己了解的不少，这里我只记录不常见的知识。

矩阵快速卷积

简介：卷积操作是在图像中通过滑动窗口，逐像素进行矩阵计算，会消耗大量的计算资源去寻址和修改内存数据，因此最终的卷积操作并不是可我们认为的滑动窗口执行卷积操作，而是采用转为矩阵的方式进行快速计算，矩阵操作能在计算机中快速运算并且方便移植到GPU中，在实际生产环境中可以通过两步来完成卷积操作:

(1) 使用Image to column(Im2col)算法把输入图像和卷积核转换为规定的矩阵排列方式。 (2) 使用GEMM算法对转换后的两个矩阵进行相乘，得到卷积结果。一般矩阵乘法(General Matrix Matrix Multiply, GEMM)将由卷积核产生的过滤矩阵乘以原图产生的特征图矩阵(Feature Matrix)的转置，得到大小为Count x (H x W)的输出特征图矩阵。 $\begin{aligned} feature \space map & = Filter \space Matrix . Feature \centerdot Matrix^T \\ & = [Count \times (C\times K \times K) ] \ast [(C \times K \times K) \times (H \times W)] \\ &= Count \times H \times W \end{aligned}$ 参数介绍： C:channel Count: 卷积核的数量 K:卷积核的宽度或高度 H\W: 卷积核在原图像的列或则行上能滑动几次。 Sample: $=\left[ \begin{matrix} 3 & 2 & 1 \\ 0 & 1 & 2 \\ 3 & 1 & 1 \end{matrix} \right] \Rightarrow Feature \space Matrix = \left[\begin{matrix} 3 & 2 & 0 & 1 \\ 2&1&1&2 \\ 0&1&1&3\\ 1&2&1&1 \\ \end{matrix} \right]$ 假设有2个均为 $2\times 2$ 的卷积核A,B(Count=2,c=1,k=2)因此过滤矩阵大小为 $2\times (1\times2\times2)$ : $\left[ \begin{matrix} 0 &1\\ 1 &2\\ \end{matrix} \right], B = \left[ \begin{matrix} 2 &1\\ 1 &3\\ \end{matrix} \right] \Rightarrow \left[ \begin{matrix} 0 &1&1&2\\ 2 &1&1&3\\ \end{matrix} \right]$ 输出特征矩阵 $\space Matrix 乘以Feature \space Matrix^T$ 。 $\begin{aligned} C & = Filter \space Matrix . Feature \space Matrix^T \\ & =\left[ \begin{matrix} 0 &1&1&2\\ 2 &1&1&3\\ \end{matrix} \right] \times \left[\begin{matrix} 3 & 2 & 0 & 1 \\ 2&1&1&2 \\ 0&1&1&3\\ 1&2&1&1 \\ \end{matrix} \right] \\ & = \left[ \begin{matrix} 0 &1&1&2\\ 2 &1&1&3\\ \end{matrix} \right] \\ & \Rightarrow \left[ \begin{matrix} \left[\begin{matrix} 0&1\\ 1&2 \end{matrix} \right] \left[\begin{matrix} 2&1\\ 1&3 \end{matrix} \right] \end{matrix} \right] \end{aligned}$ 其中，特征矩阵C中的两个矩阵分别为输出的两个特征图。

Max Pooling算法实现

import numpy as np def max_pool_forward(x, pool_param): (N, C, H, W) = x.shape height = pool_param['height'] #pooling窗口的高度 width = pool_param['width'] #pooling窗口的宽度 stride = pool_param['stride'] #pooling步长 H_prime = int(1 + (H - height) / stride) #向下滑动的次数 W_prime = int(1 + (W - width) / stride) #向右滑动的次数 out = np.zeros((N, C, H_prime, W_prime)) #定义输出矩阵 #遍历batch for n in range(N): for h in range(H_prime): for w in range(W_prime): h1 = h * stride w1 = w * stride #左上角起点 h2 = h * stride + height w2 = w * stride + width #右下角起点 window = x[n, :, h1:h2, w1:w2] # print('window:', window) win_1 = window.reshape((C, height * width)) #分别选取每个channel的最大值 out[n, :, h, w] = np.max(win_1, axis=1) #将对应channel的值填入每一个位置 #pooling 不会改变channel的数量 return out np.random.seed(8) x = np.random.randint(5, size=(1, 1, 4, 4)) #随机产生一个[1, 1, 4, 4]矩阵 print(x) pool_param = {'height': 2, 'width': 2, 'stride': 2} out = max_pool_forward(x, pool_param) print(out)

最新回复(0)