Recurrent Convolutional Neural Networks for Text Classification

mac2025-03-17 22

1.Abstract

传统：Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels.

提出： a recurrent structure ——> capture contextual information a max-pooling layer——>capture the key components in texts

表现 particularly on document-level datasets.

2.Introduction

传统

feature representation: bag-of-words: where unigrams, bigrams, n-grams or some exquisitely designed patterns are typically extracted as features.

several feature selection methods: frequency, MI, pLSA, LDA

缺点传统的特征表达方法经常忽略了上下文的信息和词序信息，以及语义信息。高阶n-gram，tree kernels被应用在特征表达，但是也有稀疏的缺点，影响准确性。 word embedding： word2vec 能够捕捉更多语法和语义特征。

改进

Recursive Neural Network 优点：获取上下文信息。缺点：①效果完全依赖于文本树的构建，并且构建文本树所需的时间是O(n^2). 并且两个句子的关系也不能通过一颗树表现出来。因此不适合与长句子或者文本。 ②有偏的模型（biased model），后面的词占得重要性更大。这样不好，因为每个词都可能是重要的词。

Convolutional Neural Network(CNN) 优点：①时间复杂度：O（n） ②无偏的模型（unbiased model），能够通过最大池化获得最重要的特征。 ③CNN卷积器的大小固定，如果选小了容易造成信息的丢失；如果选大了，会造成巨大的参数空间

提出：

Recurrent Neural Network (RecurrentNN) 循环结构–>捕获上下文信息最大池化层—>提取最可能的特征，即哪个单词是哪个特征的key role （原文说法：哪个单词是key role）

3.模型

1 构造词向量的链接模式

对于每个词 i， $c_l(w_i)$ 代表i的上文的向量， $c_r(w_i)$ 代表i的下文的向量，这两个向量由公式（1）（2）公式求出：（其中对于所有的输入句子，第一个单词的 $c_l(w_1)$ 用一样的参数，原文说法：The left-side context for the first word in any document uses the same shared parameters $c_l(w_1)$ .）然后对于每个单词：用公式（3）链接到一起

2.压缩链接向量

对每个单词在公式（3）获得的 $x_i$ 由下面的公式进行压缩，得到图中圈出来的2

3.最大池化层

上图中的的每一列中找出最大的，其实每一列对应的就是每种特征。然后组成 $y^{(3)}$ 。

为什么不用平均池化？因为我们要找出句子中每个哪个单词最能代表某个特征，而不是求平均的特征值。原文：We do not use average pooling here because only a few words and their combination are useful for capturing the meaning of the document. The max-pooling layer attempts to find the most important latent semantic factors in the document.

最大池化层公式：（5）

4.特征加权和分类

特征加权 softmax分类：

5.训练

所有需要训练的参数：其中E是原始的embedding。（在该模型执行之前，已经进过了skip-gram进行求词向量，所以有E）

训练的目的：最大化如下公式

好像是2015年的论文。

最新回复(0)