Skip to content

clxdsjyx/cnn_sentence_classifier

Repository files navigation

Convolutional Neural Network Sentence Classification (using Keras)

This repository consists of python codes for sentence classification inspired by this paper, which describes how to build a sentence classifier using Convolutional Neural Network. Also, this article well describes about detail of how it works.

Requirement

Since this code is implemented based on Keras, you need install Keras before running. Also, it uses my code from aother repository to replace typical part of deep learning training implementation. So, you can figure out that there are import for deeplearning_assistant repository codes. Thus, you also need to pull the repository before using it.

What is differ from the paper

  • It needs pre-trained word embedding vector to represent a sentence as an input for neural network. So, you need to get it before you run. I recommend to use Gensim Word2vec library to get vectors, which is one of easiest and trustful library for Word2vec.
  • Word vector has 200 dimensions. Of course, you can add channels or replace Word2vec vectors using other ways, such as GloVe.
  • The size and number of convolutional filter is differ from the paper, and you can change it easily. You will find out that you need to vary it depends on the Corpus to classify.
  • Stride (subsampling) is applied for filters for 2 words. So, we can extract a tuple of words which has long distances in a sentence.
    • For example, if we have a sentence "I like red and fresh apple.", we can extract (like, apple) tuple using stride with skipping "red and fresh". It might be possible to extract similar pattern using longer filter, but it must be more difficult than this way.

Usage & evaluation result

After solve the dependency, which is described above, you need to modify word_vector_iterator.py to read a data properly, unless you use example data, which is included. The example data is crawled from Clien, and classification task is predicting a proper board topic using a title of post. I've got 78% accuracy for predicting best 1 class for 10 classes (board topics). Also, I've got over 90% for sentiment analysis (predict 1 out of 3) of TMON's product reviews (which is not included in this repos).

Any comments and suggestion will be appreciated!!

About

Sentence Classification using Convolutional Neural Network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages