Sentiment Analisys with Recursive Neural Network

Note: This notebook is created from chainer/examples/sentiment. If you want to run it as script, please refer to the above link.

In this notebook, we will analysys the sentiment of the documents by using Recursive Neural Network.

First, we execute the following cell and install “Chainer” and its GPU back end “CuPy”. If the “runtime type” of Colaboratory is GPU, you can run Chainer with GPU as a backend.

[1]:
!curl https://colab.chainer.org/install | sh -
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  libcusparse8.0 libnvrtc8.0 libnvtoolsext1
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 28.9 MB of archives.
After this operation, 71.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu artful/multiverse amd64 libcusparse8.0 amd64 8.0.61-1 [22.6 MB]
Get:2 http://archive.ubuntu.com/ubuntu artful/multiverse amd64 libnvrtc8.0 amd64 8.0.61-1 [6,225 kB]
Get:3 http://archive.ubuntu.com/ubuntu artful/multiverse amd64 libnvtoolsext1 amd64 8.0.61-1 [32.2 kB]
Fetched 28.9 MB in 2s (10.4 MB/s)

78Selecting previously unselected package libcusparse8.0:amd64.
(Reading database ... 18298 files and directories currently installed.)
Preparing to unpack .../libcusparse8.0_8.0.61-1_amd64.deb ...
7Progress: [  0%] [..........................................................] 87Progress: [  6%] [###.......................................................] 8Unpacking libcusparse8.0:amd64 (8.0.61-1) ...
7Progress: [ 12%] [#######...................................................] 87Progress: [ 18%] [##########................................................] 8Selecting previously unselected package libnvrtc8.0:amd64.
Preparing to unpack .../libnvrtc8.0_8.0.61-1_amd64.deb ...
7Progress: [ 25%] [##############............................................] 8Unpacking libnvrtc8.0:amd64 (8.0.61-1) ...
7Progress: [ 31%] [##################........................................] 87Progress: [ 37%] [#####################.....................................] 8Selecting previously unselected package libnvtoolsext1:amd64.
Preparing to unpack .../libnvtoolsext1_8.0.61-1_amd64.deb ...
7Progress: [ 43%] [#########################.................................] 8Unpacking libnvtoolsext1:amd64 (8.0.61-1) ...
7Progress: [ 50%] [#############################.............................] 87Progress: [ 56%] [################################..........................] 8Setting up libnvtoolsext1:amd64 (8.0.61-1) ...
7Progress: [ 62%] [####################################......................] 87Progress: [ 68%] [#######################################...................] 8Setting up libcusparse8.0:amd64 (8.0.61-1) ...
7Progress: [ 75%] [###########################################...............] 87Progress: [ 81%] [###############################################...........] 8Setting up libnvrtc8.0:amd64 (8.0.61-1) ...
7Progress: [ 87%] [##################################################........] 87Progress: [ 93%] [######################################################....] 8Processing triggers for libc-bin (2.26-0ubuntu2.1) ...

78

Let’s import the necessary modules, then check the version of Chainer, NumPy, CuPy, Cuda and other execution environments.

[12]:
import collections
import numpy as np

import chainer
from chainer import cuda
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
from chainer import reporter


chainer.print_runtime_info()
Chainer: 4.1.0
NumPy: 1.14.3
CuPy:
  CuPy Version          : 4.1.0
  CUDA Root             : None
  CUDA Build Version    : 8000
  CUDA Driver Version   : 9000
  CUDA Runtime Version  : 8000
  cuDNN Build Version   : 7102
  cuDNN Version         : 7102
  NCCL Build Version    : 2104

1. Preparation of training data

In this notebook, we will use the training data which are preprocessed by chainer/examples/sentiment/download.py. Let’s run the following cells, download the necessary training data and unzip it.

[ ]:
# download.py
import os.path
from six.moves.urllib import request
import zipfile


request.urlretrieve(
    'https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip',
    'trainDevTestTrees_PTB.zip')
zf = zipfile.ZipFile('trainDevTestTrees_PTB.zip')
for name in zf.namelist():
    (dirname, filename) = os.path.split(name)
    if not filename == '':
        zf.extract(name, '.')

Let’s execute the following command and check if the training data have been prepared.

dev.txt  test.txt  train.txt

It will be OK if the above output is displayed.

[14]:
!ls trees
dev.txt  test.txt  train.txt

Let’s look at the first line of test.txt and see how each sample is written.

[15]:
!head trees/dev.txt -n1
(3 (2 It) (4 (4 (2 's) (4 (3 (2 a) (4 (3 lovely) (2 film))) (3 (2 with) (4 (3 (3 lovely) (2 performances)) (2 (2 by) (2 (2 (2 Buy) (2 and)) (2 Accorsi))))))) (2 .)))

As displayed above, each sample is defined by a tree structure.

The tree structure is recursively defined as (value, node), and the class label for node is value.

The class labels represent 1(really negative), 2(negative), 3(neutral), 4(positive), and 5(really positive), respectively.

The representation of the one sample is shown below.

2. Setting parameters

Here we set the parameters for training. * n_epoch: Epoch number. How many times we pass through the whole training data. * n_units: Number of units. How many hidden state vectors each Recursive Neural Network node has. * batchsize: Batch size. How many train data we will input as a block when updating parameters. * n_label: Number of labels. Number of classes to be identified. Since there are 5 labels this time, 5. * epoch_per_eval: How often to perform validation. * is_test: If True, we use a small dataset. * gpu_id: GPU ID. The ID of the GPU to use. For Colaboratory it is good to use 0.

[ ]:
# parameters
n_epoch = 100  # number of epochs
n_units = 30  # number of units per layer
batchsize = 25  # minibatch size
n_label = 5  # number of labels
epoch_per_eval = 5  # number of epochs per evaluation
is_test = True
gpu_id = 0

if is_test:
    max_size = 10
else:
    max_size = None

3. Preparing the iterator

Let’s read the dataset used for training, validation, test and create an Iterator.

First, we convert each sample represented by str type to a tree structure data represented by a dictionary type.

We will tokenize the string with read_corpus implemented by the parser SexpParser. After that, we convert each tokenized sample to a tree structure data by convert_tree. By doing like this, it is possible to express a label as int, a node as a two-element tuple, and a tree structure as a dictionary, making it a more manageable data structure than the original string.

[ ]:
# data.py
import codecs
import re


class SexpParser(object):

    def __init__(self, line):
        self.tokens = re.findall(r'\(|\)|[^\(\) ]+', line)
        self.pos = 0

    def parse(self):
        assert self.pos < len(self.tokens)
        token = self.tokens[self.pos]
        assert token != ')'
        self.pos += 1

        if token == '(':
            children = []
            while True:
                assert self.pos < len(self.tokens)
                if self.tokens[self.pos] == ')':
                    self.pos += 1
                    break
                else:
                    children.append(self.parse())
            return children
        else:
            return token


def read_corpus(path, max_size):
    with codecs.open(path, encoding='utf-8') as f:
        trees = []
        for line in f:
            line = line.strip()
            tree = SexpParser(line).parse()
            trees.append(tree)
            if max_size and len(trees) >= max_size:
                break

    return trees


def convert_tree(vocab, exp):
    assert isinstance(exp, list) and (len(exp) == 2 or len(exp) == 3)

    if len(exp) == 2:
        label, leaf = exp
        if leaf not in vocab:
            vocab[leaf] = len(vocab)
        return {'label': int(label), 'node': vocab[leaf]}
    elif len(exp) == 3:
        label, left, right = exp
        node = (convert_tree(vocab, left), convert_tree(vocab, right))
        return {'label': int(label), 'node': node}

Let’s use read_corpus () and convert_tree () to create an iterator.

[ ]:
vocab = {}

train_data = [convert_tree(vocab, tree)
                        for tree in read_corpus('trees/train.txt', max_size)]
train_iter = chainer.iterators.SerialIterator(train_data, batchsize)

validation_data = [convert_tree(vocab, tree)
                                 for tree in read_corpus('trees/dev.txt', max_size)]
validation_iter = chainer.iterators.SerialIterator(validation_data, batchsize,
                                                                                   repeat=False, shuffle=False)

test_data = [convert_tree(vocab, tree)
                        for tree in read_corpus('trees/test.txt', max_size)]

Let’s try to display the first element of test_data. It is represented by the following tree structure, lable expresses the score of that node, and the numerical value of the leaf node corresponds to the word id in the dictionary vocab.

[19]:
print(test_data[0])
{'label': 2, 'node': ({'label': 3, 'node': ({'label': 3, 'node': 252}, {'label': 2, 'node': 71})}, {'label': 1, 'node': ({'label': 1, 'node': 253}, {'label': 2, 'node': 254})})}

4. Preparing the model

Let’s define the network.

We traverse each node of the tree structure data by traverse and calculate the loss loss of the whole tree. The implementation of traverse is a recursive call, which will traverse child nodes in turn. (It is a common implementation when treating tree structure data!)

First, we calculate the hidden state vector v. In the case of a leaf node, we obtain a hidden state vector stored in embed by model.leaf(word) from word idword. In the case of an intermediate node, the hidden vector is calculated with the hidden state vector left and right of the child nodes by v = model.node(left, right).

loss += F.softmax_cross_entropy(y, t) adds the loss of the current node to the loss of the child node, then returns loss to the parent node by return loss, v.

After the line loss += F.softmax_cross_entropy(y, t), there are some lines for logging accuracy and etc. But it is not necessary for the model definition itself.

[ ]:
class RecursiveNet(chainer.Chain):

    def traverse(self, node, evaluate=None, root=True):
        if isinstance(node['node'], int):
            # leaf node
            word = self.xp.array([node['node']], np.int32)
            loss = 0
            v = model.leaf(word)
        else:
            # internal node
            left_node, right_node = node['node']
            left_loss, left = self.traverse(left_node, evaluate=evaluate, root=False)
            right_loss, right = self.traverse(right_node, evaluate=evaluate, root=False)
            v = model.node(left, right)
            loss = left_loss + right_loss

        y = model.label(v)

        label = self.xp.array([node['label']], np.int32)
        t = chainer.Variable(label)
        loss += F.softmax_cross_entropy(y, t)

        predict = cuda.to_cpu(y.data.argmax(1))
        if predict[0] == node['label']:
            evaluate['correct_node'] += 1
        evaluate['total_node'] += 1

        if root:
            if predict[0] == node['label']:
                evaluate['correct_root'] += 1
            evaluate['total_root'] += 1

        return loss, v

    def __init__(self, n_vocab, n_units):
        super(RecursiveNet, self).__init__()
        with self.init_scope():
            self.embed = L.EmbedID(n_vocab, n_units)
            self.l = L.Linear(n_units * 2, n_units)
            self.w = L.Linear(n_units, n_label)

    def leaf(self, x):
        return self.embed(x)

    def node(self, left, right):
        return F.tanh(self.l(F.concat((left, right))))

    def label(self, v):
        return self.w(v)

    def __call__(self, x):
        accum_loss = 0.0
        result = collections.defaultdict(lambda: 0)
        for tree in x:
            loss, _ = self.traverse(tree, evaluate=result)
            accum_loss += loss

        reporter.report({'loss': accum_loss}, self)
        reporter.report({'total': result['total_node']}, self)
        reporter.report({'correct': result['correct_node']}, self)
        return accum_loss

One attention to the implementation of __call__.

x passed to __call__ is mini-batched input data and contains samples s_n like [s_1, s_2, ..., s_N].

In a network such as Convolutional Network used for image recognition, it is possible to perform parallel calculation collectively for mini batch x. However, in the case of a tree-structured network like this one, it is difficult to compute parallel because of the following reasons.

  • Data length varies depending on samples.
  • The order of calculation for each sample is different.

So, the implementation is to calculate each sample and finally summarize the results.

Note: Actually, you can perform parallel calculation of mini batch in Recursive Neural Network by using stack. Since it is published in the latter part of notebook as (Advanced), please refer to it.

[ ]:
model = RecursiveNet(len(vocab), n_units)

if gpu_id >= 0:
    model.to_gpu()

# Setup optimizer
optimizer = chainer.optimizers.AdaGrad(lr=0.1)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer_hooks.WeightDecay(0.0001))

5. Preparation and training of Updater · Trainer

As usual, we define an updater and a trainer to train the model. This time, I do not use L.Classifier and calculate the accuracy accuracy by myself. You can easily implement it using extensions.MicroAverage. For details, please refer to chainer.training.extensions.MicroAverage.

[22]:
def _convert(batch, device):
  return batch

updater = chainer.training.StandardUpdater(
    train_iter, optimizer, device=gpu_id, converter=_convert)

trainer = chainer.training.Trainer(updater, (n_epoch, 'epoch'))
trainer.extend(
        extensions.Evaluator(validation_iter, model, device=gpu_id, converter=_convert),
        trigger=(epoch_per_eval, 'epoch'))
trainer.extend(extensions.LogReport())

trainer.extend(extensions.MicroAverage(
        'main/correct', 'main/total', 'main/accuracy'))
trainer.extend(extensions.MicroAverage(
        'validation/main/correct', 'validation/main/total',
        'validation/main/accuracy'))

trainer.extend(extensions.PrintReport(
        ['epoch', 'main/loss', 'validation/main/loss',
          'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.run()
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           1707.8                            0.155405                                 14.6668
2           586.467     556.419               0.497748       0.396175                  17.3996
3           421.267                           0.657658                                 19.3942
4           320.414     628.025               0.772523       0.42623                   22.2462
5           399.621                           0.704955                                 24.208
6           318.544     595.03                0.786036       0.420765                  27.0585
7           231.529                           0.880631                                 29.0178
8           160.546     628.959               0.916667       0.431694                  31.7562
9           122.076                           0.957207                                 33.8269
10          93.6623     669.898               0.975225       0.445355                  36.5802
11          74.2366                           0.986486                                 38.5855
12          60.2297     701.062               0.990991       0.448087                  41.4308
13          49.7152                           0.997748                                 43.414
14          41.633      724.893               0.997748       0.453552                  46.1698
15          35.3564                           0.997748                                 48.1999
16          30.402      744.493               1              0.448087                  50.9842
17          26.4137                           1                                        53.0605
18          23.188      760.43                1              0.459016                  55.7924
19          20.5913                           1                                        57.7479
20          18.4666     773.808               1              0.461749                  60.5636
21          16.698                            1                                        62.52
22          15.2066     785.205               1              0.461749                  65.2603
23          13.9351                           1                                        67.3052
24          12.8404     794.963               1              0.461749                  70.0323
25          11.8897                           1                                        71.9788
26          11.0575     803.388               1              0.459016                  74.7653
27          10.3237                           1                                        76.7485
28          9.67249     810.727               1              0.472678                  79.4539
29          9.09113                           1                                        81.4813
30          8.56935     817.176               1              0.480874                  84.1942
31          8.09874                           1                                        86.2475
32          7.6724      822.889               1              0.480874                  88.956
33          7.2846                            1                                        90.9035
34          6.93052     827.989               1              0.480874                  93.6949
35          6.60615                           1                                        95.6557
36          6.30805     832.574               1              0.486339                  98.4233
37          6.03332                           1                                        100.456
38          5.77941     836.724               1              0.486339                  103.18
39          5.5442                            1                                        105.129
40          5.32575     840.507               1              0.486339                  107.954

6. Checking the performance with test data

[23]:
def evaluate(model, test_trees):
    result = collections.defaultdict(lambda: 0)
    with chainer.using_config('train', False), chainer.no_backprop_mode():
        for tree in test_trees:
            model.traverse(tree, evaluate=result)
    acc_node = 100.0 * result['correct_node'] / result['total_node']
    acc_root = 100.0 * result['correct_root'] / result['total_root']
    print(' Node accuracy: {0:.2f} %% ({1:,d}/{2:,d})'.format(
        acc_node, result['correct_node'], result['total_node']))
    print(' Root accuracy: {0:.2f} %% ({1:,d}/{2:,d})'.format(
        acc_root, result['correct_root'], result['total_root']))

print('Test evaluation')
evaluate(model, test_data)
Test evaluation
 Node accuracy: 54.49 %% (170/312)
 Root accuracy: 50.00 %% (5/10)

(Advanced) Mini-batching in Recursive Neural Network[1]

Recursive Neural Network is difficult to compute mini-batched data in parallel because of the following reasons.

  • Data length varies depending on samples.
  • The order of calculation for each sample is different.

However, using the stack, Recursive Neural Network can perform mini batch parallel calculation.

Preparation of Dataset, Iterator

First, we convert the recursive calculation of Recursive Neural Network to a serial calculation using a stack.

For each node of the tree structure dataset, numbers are assigned to each node in “returning order” as follows.

The returning order is a procedure of numbering nodes of a tree structure. It is a procedure of attaching a smaller number to all child nodes than the parent node. If you process nodes in descending order of numbers, you can trace the child nodes before the parent node.

[ ]:
def linearize_tree(vocab, root, xp=np):
    # Left node indexes for all parent nodes
    lefts = []
    # Right node indexes for all parent nodes
    rights = []
    # Parent node indexes
    dests = []
    # All labels to predict for all parent nodes
    labels = []

    # All words of leaf nodes
    words = []
    # Leaf labels
    leaf_labels = []

    # Current leaf node index
    leaf_index = [0]

    def traverse_leaf(exp):
        if len(exp) == 2:
            label, leaf = exp
            if leaf not in vocab:
                vocab[leaf] = len(vocab)
            words.append(vocab[leaf])
            leaf_labels.append(int(label))
            leaf_index[0] += 1
        elif len(exp) == 3:
            _, left, right = exp
            traverse_leaf(left)
            traverse_leaf(right)

    traverse_leaf(root)

    # Current internal node index
    node_index = leaf_index
    leaf_index = [0]

    def traverse_node(exp):
        if len(exp) == 2:
            leaf_index[0] += 1
            return leaf_index[0] - 1
        elif len(exp) == 3:
            label, left, right = exp
            l = traverse_node(left)
            r = traverse_node(right)

            lefts.append(l)
            rights.append(r)
            dests.append(node_index[0])
            labels.append(int(label))

            node_index[0] += 1
            return node_index[0] - 1

    traverse_node(root)
    assert len(lefts) == len(words) - 1

    return {
        'lefts': xp.array(lefts, 'i'),
        'rights': xp.array(rights, 'i'),
        'dests': xp.array(dests, 'i'),
        'words': xp.array(words, 'i'),
        'labels': xp.array(labels, 'i'),
        'leaf_labels': xp.array(leaf_labels, 'i'),
    }
[ ]:
xp = cuda.cupy if gpu_id >= 0 else np

vocab = {}

train_data = [linearize_tree(vocab, t, xp)
                        for t in read_corpus('trees/train.txt', max_size)]
train_iter = chainer.iterators.SerialIterator(train_data, batchsize)

validation_data = [linearize_tree(vocab, t, xp)
                       for t in read_corpus('trees/dev.txt', max_size)]
validation_iter = chainer.iterators.SerialIterator(
    validation_data, batchsize, repeat=False, shuffle=False)

test_data = [linearize_tree(vocab, t, xp)
                       for t in read_corpus('trees/test.txt', max_size)]

Let’s try to display the first element of test_data.

lefts is the index of the left node for the dests parent node, rights is the index of the right node for the dests parent node, dests is the parent node’s index, dictionary contains word id,`labels has parent node label, and`` `leaf_labels`` contains dictionary of leaf node labels.

[26]:
print(test_data[0])
{'lefts': array([0, 2, 4], dtype=int32), 'rights': array([1, 3, 5], dtype=int32), 'dests': array([4, 5, 6], dtype=int32), 'words': array([252,  71, 253, 254], dtype=int32), 'labels': array([3, 1, 2], dtype=int32), 'leaf_labels': array([3, 2, 1, 2], dtype=int32)}

Definition of mini-batchable models

Recursive Neural Network has two operations: Operation A for computing an embedding vector for the leaf node. Operation B for computing the hidden state vector of the parent node from the hidden state vectors of the two child nodes.

For each sample, we assign index to each node in returning order. If you traverse the node in return order, you will find that operation A is performed on the leaf node and operation B is performed at the other nodes.

This operation can also be regarded as using a stack to scan a tree structure. A stack is a last-in, first-out data structure that allows you to do two things: a push operation to add data and a pop operation to get the last pushed data.

For operation A, push the calculation result to the stack. For operation B, pop two data and push the new calculation result.

When we parallelize the above operation, it is necessary to traverse nodes and perform operation A and operation B precisely because the tree structure is different for each sample. However, by using the stack, we can calculate different tree structures by simple repeating processing. Therefore, parallelization is possible.

[ ]:
from chainer import cuda
from chainer.utils import type_check


class ThinStackSet(chainer.Function):
    """Set values to a thin stack."""

    def check_type_forward(self, in_types):
        type_check.expect(in_types.size() == 3)
        s_type, i_type, v_type = in_types
        type_check.expect(
            s_type.dtype.kind == 'f',
            i_type.dtype.kind == 'i',
            s_type.dtype == v_type.dtype,
            s_type.ndim == 3,
            i_type.ndim == 1,
            v_type.ndim == 2,
            s_type.shape[0] >= i_type.shape[0],
            i_type.shape[0] == v_type.shape[0],
            s_type.shape[2] == v_type.shape[1],
        )

    def forward(self, inputs):
        xp = cuda.get_array_module(*inputs)
        stack, indices, values = inputs
        stack[xp.arange(len(indices)), indices] = values
        return stack,

    def backward(self, inputs, grads):
        xp = cuda.get_array_module(*inputs)
        _, indices, _ = inputs
        g = grads[0]
        gv = g[xp.arange(len(indices)), indices]
        g[xp.arange(len(indices)), indices] = 0
        return g, None, gv


def thin_stack_set(s, i, x):
    return ThinStackSet()(s, i, x)

In addition, we use thin stack[2] instead of simple stack here.

Let the sentence length be \(I\) and the number of dimensions of the hidden vector be \(D\), the thin stack can efficiently use the memory by using the matrix of \((2I-1) \times D\).

In a normal stack, you need \(O(I^2 D)\) space computation, whereas thin stacks require \(O(ID)\).

It is realized by push operation thin_stack_set and pop operation thin_stack_get.

First of all, we define ThinStackSet and ThinStackGet which inherit chainer.Function.

ThinStackSet is literally a function to set values on the thin stack.

inputs in forward and backward can be broken down like stack, indices, values = inputs.

stack is shared by functions by setting it as a function argument in the thin stack itself.

Because chainer.Function does not have internal states inside, it handles stack externally by passing it as a function argument.

[ ]:
class ThinStackGet(chainer.Function):

    def check_type_forward(self, in_types):
        type_check.expect(in_types.size() == 2)
        s_type, i_type = in_types
        type_check.expect(
            s_type.dtype.kind == 'f',
            i_type.dtype.kind == 'i',
            s_type.ndim == 3,
            i_type.ndim == 1,
            s_type.shape[0] >= i_type.shape[0],
        )

    def forward(self, inputs):
        xp = cuda.get_array_module(*inputs)
        stack, indices = inputs
        return stack[xp.arange(len(indices)), indices], stack

    def backward(self, inputs, grads):
        xp = cuda.get_array_module(*inputs)
        stack, indices = inputs
        g, gs = grads
        if gs is None:
            gs = xp.zeros_like(stack)
        if g is not None:
            gs[xp.arange(len(indices)), indices] += g
        return gs, None


def thin_stack_get(s, i):
    return ThinStackGet()(s, i)

ThinStackGet is literally a function to retrieve values from the thin stack.

inputs in forward and backward can be broken down like stack, indices = inputs.

[ ]:
class ThinStackRecursiveNet(chainer.Chain):

    def __init__(self, n_vocab, n_units, n_label):
        super(ThinStackRecursiveNet, self).__init__(
            embed=L.EmbedID(n_vocab, n_units),
            l=L.Linear(n_units * 2, n_units),
            w=L.Linear(n_units, n_label))
        self.n_units = n_units

    def leaf(self, x):
        return self.embed(x)

    def node(self, left, right):
        return F.tanh(self.l(F.concat((left, right))))

    def label(self, v):
        return self.w(v)

    def __call__(self, *inputs):
        batch = len(inputs) // 6
        lefts = inputs[0: batch]
        rights = inputs[batch: batch * 2]
        dests = inputs[batch * 2: batch * 3]
        labels = inputs[batch * 3: batch * 4]
        sequences = inputs[batch * 4: batch * 5]
        leaf_labels = inputs[batch * 5: batch * 6]

        inds = np.argsort([-len(l) for l in lefts])
        # Sort all arrays in descending order and transpose them
        lefts = F.transpose_sequence([lefts[i] for i in inds])
        rights = F.transpose_sequence([rights[i] for i in inds])
        dests = F.transpose_sequence([dests[i] for i in inds])
        labels = F.transpose_sequence([labels[i] for i in inds])
        sequences = F.transpose_sequence([sequences[i] for i in inds])
        leaf_labels = F.transpose_sequence([leaf_labels[i] for i in inds])

        batch = len(inds)
        maxlen = len(sequences)

        loss = 0
        count = 0
        correct = 0

        # thin stack
        stack = self.xp.zeros((batch, maxlen * 2, self.n_units), 'f')

        # 葉ノードの隠れ状態ベクトルとlossを計算
        for i, (word, label) in enumerate(zip(sequences, leaf_labels)):
            batch = word.shape[0]
            es = self.leaf(word)
            ds = self.xp.full((batch,), i, 'i')
            y = self.label(es)
            loss += F.softmax_cross_entropy(y, label, normalize=False) * batch
            count += batch
            predict = self.xp.argmax(y.data, axis=1)
            correct += (predict == label.data).sum()

            stack = thin_stack_set(stack, ds, es)

        # 中間ノードの隠れ状態ベクトルとlossを計算
        for left, right, dest, label in zip(lefts, rights, dests, labels):
            l, stack = thin_stack_get(stack, left)
            r, stack = thin_stack_get(stack, right)
            o = self.node(l, r)
            y = self.label(o)
            batch = l.shape[0]
            loss += F.softmax_cross_entropy(y, label, normalize=False) * batch
            count += batch
            predict = self.xp.argmax(y.data, axis=1)
            correct += (predict == label.data).sum()

            stack = thin_stack_set(stack, dest, o)

        loss /= count
        reporter.report({'loss': loss}, self)
        reporter.report({'total': count}, self)
        reporter.report({'correct': correct}, self)
        return loss
[ ]:
model = ThinStackRecursiveNet(len(vocab), n_units, n_label)

if gpu_id >= 0:
    model.to_gpu()

optimizer = chainer.optimizers.AdaGrad(0.1)
optimizer.setup(model)
<chainer.optimizers.ada_grad.AdaGrad at 0x7f8a3c453710>

Preparation of Updater · Trainer and execution of training

Let’s train with the new model ThinStackRecursiveNet. Since you can now compute mini batches in parallel, you can see that training is faster.

[ ]:
def convert(batch, device):
    if device is None:
        def to_device(x):
            return x
    elif device < 0:
        to_device = cuda.to_cpu
    else:
        def to_device(x):
            return cuda.to_gpu(x, device, cuda.Stream.null)

    return tuple(
        [to_device(d['lefts']) for d in batch] +
        [to_device(d['rights']) for d in batch] +
        [to_device(d['dests']) for d in batch] +
        [to_device(d['labels']) for d in batch] +
        [to_device(d['words']) for d in batch] +
        [to_device(d['leaf_labels']) for d in batch]
    )


updater = chainer.training.StandardUpdater(
    train_iter, optimizer, device=None, converter=convert)
trainer = chainer.training.Trainer(updater, (n_epoch, 'epoch'))
trainer.extend(
    extensions.Evaluator(validation_iter, model, converter=convert, device=None),
    trigger=(epoch_per_eval, 'epoch'))
trainer.extend(extensions.LogReport())

trainer.extend(extensions.MicroAverage(
    'main/correct', 'main/total', 'main/accuracy'))
trainer.extend(extensions.MicroAverage(
    'validation/main/correct', 'validation/main/total',
    'validation/main/accuracy'))

trainer.extend(extensions.PrintReport(
   ['epoch', 'main/loss', 'validation/main/loss',
     'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))

trainer.run()
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           1.75582                           0.268018                                 0.772637
2           1.0503      1.52234               0.63964        0.448087                  1.74078
3           0.752925                          0.743243                                 2.52495
4           1.21727     1.46956               0.745495       0.456284                  3.49669
5           0.681582                          0.817568                                 4.24974
6           0.477964    1.5514                0.880631       0.480874                  5.22265
7           0.38437                           0.916667                                 5.98324
8           0.30405     1.68066               0.923423       0.469945                  6.94833
9           0.222884                          0.959459                                 7.69772
10          0.175159    1.79104               0.977477       0.478142                  8.67923
11          0.142888                          0.97973                                  9.43108
12          0.118272    1.87948               0.986486       0.47541                   10.4046
13          0.0991659                         0.997748                                 11.1994
14          0.0841932   1.95415               0.997748       0.478142                  12.1657
15          0.0723124                         0.997748                                 12.9141
16          0.0627568   2.01682               0.997748       0.480874                  13.8787
17          0.0549726                         1                                        14.6336
18          0.04857     2.07107               1              0.478142                  15.6061
19          0.0432675                         1                                        16.3584
20          0.0388425   2.1181                1              0.480874                  17.3297
21          0.035117                          1                                        18.0761
22          0.0319522   2.15905               1              0.478142                  19.0487
23          0.0292416                         1                                        19.8416
24          0.0269031   2.1951                1              0.480874                  20.8083
25          0.0248729                         1                                        21.5566
26          0.0231      2.22721               1              0.483607                  22.5304
27          0.0215427                         1                                        23.2878
28          0.0201669   2.25614               1              0.486339                  24.2565
29          0.018944                          1                                        25.0171
30          0.017851    2.28247               1              0.480874                  26.0063
31          0.0168687                         1                                        26.7633
32          0.0159814   2.30664               1              0.483607                  27.7331
33          0.0151763                         1                                        28.5342
34          0.0144427   2.32898               1              0.483607                  29.5039
35          0.0137716                         1                                        30.257
36          0.0131555   2.34976               1              0.483607                  31.2306
37          0.0125881                         1                                        31.9842
38          0.0120638   2.3692                1              0.483607                  32.9617
39          0.0115783                         1                                        33.7175
40          0.0111272   2.38747               1              0.483607                  34.6946

It got much faster!

Reference

[1] 深層学習による自然言語処理 (機械学習プロフェッショナルシリーズ)

[2] [A Fast Unified Model for Parsing and Sentence Understanding](http://nlp.stanford.edu/pubs/bowman2016spinn.pdf)