Sentiment Analisys with Recursive Neural Network¶
Note: This notebook is created from chainer/examples/sentiment. If you want to run it as script, please refer to the above link.
In this notebook, we will analysys the sentiment of the documents by using Recursive Neural Network.
First, we execute the following cell and install “Chainer” and its GPU back end “CuPy”. If the “runtime type” of Colaboratory is GPU, you can run Chainer with GPU as a backend.
[1]:
!curl https://colab.chainer.org/install | sh -
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
libcusparse8.0 libnvrtc8.0 libnvtoolsext1
0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.
Need to get 28.9 MB of archives.
After this operation, 71.6 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu artful/multiverse amd64 libcusparse8.0 amd64 8.0.61-1 [22.6 MB]
Get:2 http://archive.ubuntu.com/ubuntu artful/multiverse amd64 libnvrtc8.0 amd64 8.0.61-1 [6,225 kB]
Get:3 http://archive.ubuntu.com/ubuntu artful/multiverse amd64 libnvtoolsext1 amd64 8.0.61-1 [32.2 kB]
Fetched 28.9 MB in 2s (10.4 MB/s)
78Selecting previously unselected package libcusparse8.0:amd64.
(Reading database ... 18298 files and directories currently installed.)
Preparing to unpack .../libcusparse8.0_8.0.61-1_amd64.deb ...
7Progress: [ 0%] [..........................................................] 87Progress: [ 6%] [###.......................................................] 8Unpacking libcusparse8.0:amd64 (8.0.61-1) ...
7Progress: [ 12%] [#######...................................................] 87Progress: [ 18%] [##########................................................] 8Selecting previously unselected package libnvrtc8.0:amd64.
Preparing to unpack .../libnvrtc8.0_8.0.61-1_amd64.deb ...
7Progress: [ 25%] [##############............................................] 8Unpacking libnvrtc8.0:amd64 (8.0.61-1) ...
7Progress: [ 31%] [##################........................................] 87Progress: [ 37%] [#####################.....................................] 8Selecting previously unselected package libnvtoolsext1:amd64.
Preparing to unpack .../libnvtoolsext1_8.0.61-1_amd64.deb ...
7Progress: [ 43%] [#########################.................................] 8Unpacking libnvtoolsext1:amd64 (8.0.61-1) ...
7Progress: [ 50%] [#############################.............................] 87Progress: [ 56%] [################################..........................] 8Setting up libnvtoolsext1:amd64 (8.0.61-1) ...
7Progress: [ 62%] [####################################......................] 87Progress: [ 68%] [#######################################...................] 8Setting up libcusparse8.0:amd64 (8.0.61-1) ...
7Progress: [ 75%] [###########################################...............] 87Progress: [ 81%] [###############################################...........] 8Setting up libnvrtc8.0:amd64 (8.0.61-1) ...
7Progress: [ 87%] [##################################################........] 87Progress: [ 93%] [######################################################....] 8Processing triggers for libc-bin (2.26-0ubuntu2.1) ...
78
Let’s import the necessary modules, then check the version of Chainer, NumPy, CuPy, Cuda and other execution environments.
[12]:
import collections
import numpy as np
import chainer
from chainer import cuda
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
from chainer import reporter
chainer.print_runtime_info()
Chainer: 4.1.0
NumPy: 1.14.3
CuPy:
CuPy Version : 4.1.0
CUDA Root : None
CUDA Build Version : 8000
CUDA Driver Version : 9000
CUDA Runtime Version : 8000
cuDNN Build Version : 7102
cuDNN Version : 7102
NCCL Build Version : 2104
1. Preparation of training data¶
In this notebook, we will use the training data which are preprocessed by chainer/examples/sentiment/download.py. Let’s run the following cells, download the necessary training data and unzip it.
[ ]:
# download.py
import os.path
from six.moves.urllib import request
import zipfile
request.urlretrieve(
'https://nlp.stanford.edu/sentiment/trainDevTestTrees_PTB.zip',
'trainDevTestTrees_PTB.zip')
zf = zipfile.ZipFile('trainDevTestTrees_PTB.zip')
for name in zf.namelist():
(dirname, filename) = os.path.split(name)
if not filename == '':
zf.extract(name, '.')
Let’s execute the following command and check if the training data have been prepared.
dev.txt test.txt train.txt
It will be OK if the above output is displayed.
[14]:
!ls trees
dev.txt test.txt train.txt
Let’s look at the first line of test.txt
and see how each sample is written.
[15]:
!head trees/dev.txt -n1
(3 (2 It) (4 (4 (2 's) (4 (3 (2 a) (4 (3 lovely) (2 film))) (3 (2 with) (4 (3 (3 lovely) (2 performances)) (2 (2 by) (2 (2 (2 Buy) (2 and)) (2 Accorsi))))))) (2 .)))
As displayed above, each sample is defined by a tree structure.
The tree structure is recursively defined as (value, node)
, and the class label for node
is value
.
The class labels represent 1(really negative), 2(negative), 3(neutral), 4(positive), and 5(really positive), respectively.
The representation of the one sample is shown below.
2. Setting parameters¶
Here we set the parameters for training. * n_epoch
: Epoch number. How many times we pass through the whole training data. * n_units
: Number of units. How many hidden state vectors each Recursive Neural Network node has. * batchsize
: Batch size. How many train data we will input as a block when updating parameters. * n_label
: Number of labels. Number of classes to be identified. Since there are 5 labels this time, 5
. * epoch_per_eval
: How often to perform validation.
* is_test
: If True
, we use a small dataset. * gpu_id
: GPU ID. The ID of the GPU to use. For Colaboratory it is good to use 0
.
[ ]:
# parameters
n_epoch = 100 # number of epochs
n_units = 30 # number of units per layer
batchsize = 25 # minibatch size
n_label = 5 # number of labels
epoch_per_eval = 5 # number of epochs per evaluation
is_test = True
gpu_id = 0
if is_test:
max_size = 10
else:
max_size = None
3. Preparing the iterator¶
Let’s read the dataset used for training, validation, test and create an Iterator.
First, we convert each sample represented by str
type to a tree structure data represented by a dictionary
type.
We will tokenize the string with read_corpus
implemented by the parser SexpParser
. After that, we convert each tokenized sample to a tree structure data by convert_tree
. By doing like this, it is possible to express a label as int
, a node as a two-element tuple
, and a tree structure as a dictionary
, making it a more manageable data structure than the original string.
[ ]:
# data.py
import codecs
import re
class SexpParser(object):
def __init__(self, line):
self.tokens = re.findall(r'\(|\)|[^\(\) ]+', line)
self.pos = 0
def parse(self):
assert self.pos < len(self.tokens)
token = self.tokens[self.pos]
assert token != ')'
self.pos += 1
if token == '(':
children = []
while True:
assert self.pos < len(self.tokens)
if self.tokens[self.pos] == ')':
self.pos += 1
break
else:
children.append(self.parse())
return children
else:
return token
def read_corpus(path, max_size):
with codecs.open(path, encoding='utf-8') as f:
trees = []
for line in f:
line = line.strip()
tree = SexpParser(line).parse()
trees.append(tree)
if max_size and len(trees) >= max_size:
break
return trees
def convert_tree(vocab, exp):
assert isinstance(exp, list) and (len(exp) == 2 or len(exp) == 3)
if len(exp) == 2:
label, leaf = exp
if leaf not in vocab:
vocab[leaf] = len(vocab)
return {'label': int(label), 'node': vocab[leaf]}
elif len(exp) == 3:
label, left, right = exp
node = (convert_tree(vocab, left), convert_tree(vocab, right))
return {'label': int(label), 'node': node}
Let’s use read_corpus ()
and convert_tree ()
to create an iterator.
[ ]:
vocab = {}
train_data = [convert_tree(vocab, tree)
for tree in read_corpus('trees/train.txt', max_size)]
train_iter = chainer.iterators.SerialIterator(train_data, batchsize)
validation_data = [convert_tree(vocab, tree)
for tree in read_corpus('trees/dev.txt', max_size)]
validation_iter = chainer.iterators.SerialIterator(validation_data, batchsize,
repeat=False, shuffle=False)
test_data = [convert_tree(vocab, tree)
for tree in read_corpus('trees/test.txt', max_size)]
Let’s try to display the first element of test_data
. It is represented by the following tree structure, lable
expresses the score of that node
, and the numerical value of the leaf node
corresponds to the word id in the dictionary vocab
.
[19]:
print(test_data[0])
{'label': 2, 'node': ({'label': 3, 'node': ({'label': 3, 'node': 252}, {'label': 2, 'node': 71})}, {'label': 1, 'node': ({'label': 1, 'node': 253}, {'label': 2, 'node': 254})})}
4. Preparing the model¶
Let’s define the network.
We traverse each node of the tree structure data by traverse
and calculate the loss loss
of the whole tree. The implementation of traverse
is a recursive call, which will traverse child nodes in turn. (It is a common implementation when treating tree structure data!)
First, we calculate the hidden state vector v
. In the case of a leaf node, we obtain a hidden state vector stored in embed
by model.leaf(word)
from word idword
. In the case of an intermediate node, the hidden vector is calculated with the hidden state vector left
and right
of the child nodes by v = model.node(left, right)
.
loss += F.softmax_cross_entropy(y, t)
adds the loss of the current node to the loss of the child node, then returns loss to the parent node by return loss, v
.
After the line loss += F.softmax_cross_entropy(y, t)
, there are some lines for logging accuracy and etc. But it is not necessary for the model definition itself.
[ ]:
class RecursiveNet(chainer.Chain):
def traverse(self, node, evaluate=None, root=True):
if isinstance(node['node'], int):
# leaf node
word = self.xp.array([node['node']], np.int32)
loss = 0
v = model.leaf(word)
else:
# internal node
left_node, right_node = node['node']
left_loss, left = self.traverse(left_node, evaluate=evaluate, root=False)
right_loss, right = self.traverse(right_node, evaluate=evaluate, root=False)
v = model.node(left, right)
loss = left_loss + right_loss
y = model.label(v)
label = self.xp.array([node['label']], np.int32)
t = chainer.Variable(label)
loss += F.softmax_cross_entropy(y, t)
predict = cuda.to_cpu(y.data.argmax(1))
if predict[0] == node['label']:
evaluate['correct_node'] += 1
evaluate['total_node'] += 1
if root:
if predict[0] == node['label']:
evaluate['correct_root'] += 1
evaluate['total_root'] += 1
return loss, v
def __init__(self, n_vocab, n_units):
super(RecursiveNet, self).__init__()
with self.init_scope():
self.embed = L.EmbedID(n_vocab, n_units)
self.l = L.Linear(n_units * 2, n_units)
self.w = L.Linear(n_units, n_label)
def leaf(self, x):
return self.embed(x)
def node(self, left, right):
return F.tanh(self.l(F.concat((left, right))))
def label(self, v):
return self.w(v)
def __call__(self, x):
accum_loss = 0.0
result = collections.defaultdict(lambda: 0)
for tree in x:
loss, _ = self.traverse(tree, evaluate=result)
accum_loss += loss
reporter.report({'loss': accum_loss}, self)
reporter.report({'total': result['total_node']}, self)
reporter.report({'correct': result['correct_node']}, self)
return accum_loss
One attention to the implementation of __call__
.
x
passed to __call__
is mini-batched input data and contains samples s_n
like [s_1, s_2, ..., s_N]
.
In a network such as Convolutional Network used for image recognition, it is possible to perform parallel calculation collectively for mini batch x
. However, in the case of a tree-structured network like this one, it is difficult to compute parallel because of the following reasons.
- Data length varies depending on samples.
- The order of calculation for each sample is different.
So, the implementation is to calculate each sample and finally summarize the results.
Note: Actually, you can perform parallel calculation of mini batch in Recursive Neural Network by using stack. Since it is published in the latter part of notebook as (Advanced), please refer to it.
[ ]:
model = RecursiveNet(len(vocab), n_units)
if gpu_id >= 0:
model.to_gpu()
# Setup optimizer
optimizer = chainer.optimizers.AdaGrad(lr=0.1)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer_hooks.WeightDecay(0.0001))
5. Preparation and training of Updater · Trainer¶
As usual, we define an updater and a trainer to train the model. This time, I do not use L.Classifier
and calculate the accuracy accuracy
by myself. You can easily implement it using extensions.MicroAverage
. For details, please refer to chainer.training.extensions.MicroAverage.
[22]:
def _convert(batch, device):
return batch
updater = chainer.training.StandardUpdater(
train_iter, optimizer, device=gpu_id, converter=_convert)
trainer = chainer.training.Trainer(updater, (n_epoch, 'epoch'))
trainer.extend(
extensions.Evaluator(validation_iter, model, device=gpu_id, converter=_convert),
trigger=(epoch_per_eval, 'epoch'))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.MicroAverage(
'main/correct', 'main/total', 'main/accuracy'))
trainer.extend(extensions.MicroAverage(
'validation/main/correct', 'validation/main/total',
'validation/main/accuracy'))
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.run()
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 1707.8 0.155405 14.6668
2 586.467 556.419 0.497748 0.396175 17.3996
3 421.267 0.657658 19.3942
4 320.414 628.025 0.772523 0.42623 22.2462
5 399.621 0.704955 24.208
6 318.544 595.03 0.786036 0.420765 27.0585
7 231.529 0.880631 29.0178
8 160.546 628.959 0.916667 0.431694 31.7562
9 122.076 0.957207 33.8269
10 93.6623 669.898 0.975225 0.445355 36.5802
11 74.2366 0.986486 38.5855
12 60.2297 701.062 0.990991 0.448087 41.4308
13 49.7152 0.997748 43.414
14 41.633 724.893 0.997748 0.453552 46.1698
15 35.3564 0.997748 48.1999
16 30.402 744.493 1 0.448087 50.9842
17 26.4137 1 53.0605
18 23.188 760.43 1 0.459016 55.7924
19 20.5913 1 57.7479
20 18.4666 773.808 1 0.461749 60.5636
21 16.698 1 62.52
22 15.2066 785.205 1 0.461749 65.2603
23 13.9351 1 67.3052
24 12.8404 794.963 1 0.461749 70.0323
25 11.8897 1 71.9788
26 11.0575 803.388 1 0.459016 74.7653
27 10.3237 1 76.7485
28 9.67249 810.727 1 0.472678 79.4539
29 9.09113 1 81.4813
30 8.56935 817.176 1 0.480874 84.1942
31 8.09874 1 86.2475
32 7.6724 822.889 1 0.480874 88.956
33 7.2846 1 90.9035
34 6.93052 827.989 1 0.480874 93.6949
35 6.60615 1 95.6557
36 6.30805 832.574 1 0.486339 98.4233
37 6.03332 1 100.456
38 5.77941 836.724 1 0.486339 103.18
39 5.5442 1 105.129
40 5.32575 840.507 1 0.486339 107.954
6. Checking the performance with test data¶
[23]:
def evaluate(model, test_trees):
result = collections.defaultdict(lambda: 0)
with chainer.using_config('train', False), chainer.no_backprop_mode():
for tree in test_trees:
model.traverse(tree, evaluate=result)
acc_node = 100.0 * result['correct_node'] / result['total_node']
acc_root = 100.0 * result['correct_root'] / result['total_root']
print(' Node accuracy: {0:.2f} %% ({1:,d}/{2:,d})'.format(
acc_node, result['correct_node'], result['total_node']))
print(' Root accuracy: {0:.2f} %% ({1:,d}/{2:,d})'.format(
acc_root, result['correct_root'], result['total_root']))
print('Test evaluation')
evaluate(model, test_data)
Test evaluation
Node accuracy: 54.49 %% (170/312)
Root accuracy: 50.00 %% (5/10)
(Advanced) Mini-batching in Recursive Neural Network[1]¶
Recursive Neural Network is difficult to compute mini-batched data in parallel because of the following reasons.
- Data length varies depending on samples.
- The order of calculation for each sample is different.
However, using the stack, Recursive Neural Network can perform mini batch parallel calculation.
Preparation of Dataset, Iterator¶
First, we convert the recursive calculation of Recursive Neural Network to a serial calculation using a stack.
For each node of the tree structure dataset, numbers are assigned to each node in “returning order” as follows.
The returning order is a procedure of numbering nodes of a tree structure. It is a procedure of attaching a smaller number to all child nodes than the parent node. If you process nodes in descending order of numbers, you can trace the child nodes before the parent node.
[ ]:
def linearize_tree(vocab, root, xp=np):
# Left node indexes for all parent nodes
lefts = []
# Right node indexes for all parent nodes
rights = []
# Parent node indexes
dests = []
# All labels to predict for all parent nodes
labels = []
# All words of leaf nodes
words = []
# Leaf labels
leaf_labels = []
# Current leaf node index
leaf_index = [0]
def traverse_leaf(exp):
if len(exp) == 2:
label, leaf = exp
if leaf not in vocab:
vocab[leaf] = len(vocab)
words.append(vocab[leaf])
leaf_labels.append(int(label))
leaf_index[0] += 1
elif len(exp) == 3:
_, left, right = exp
traverse_leaf(left)
traverse_leaf(right)
traverse_leaf(root)
# Current internal node index
node_index = leaf_index
leaf_index = [0]
def traverse_node(exp):
if len(exp) == 2:
leaf_index[0] += 1
return leaf_index[0] - 1
elif len(exp) == 3:
label, left, right = exp
l = traverse_node(left)
r = traverse_node(right)
lefts.append(l)
rights.append(r)
dests.append(node_index[0])
labels.append(int(label))
node_index[0] += 1
return node_index[0] - 1
traverse_node(root)
assert len(lefts) == len(words) - 1
return {
'lefts': xp.array(lefts, 'i'),
'rights': xp.array(rights, 'i'),
'dests': xp.array(dests, 'i'),
'words': xp.array(words, 'i'),
'labels': xp.array(labels, 'i'),
'leaf_labels': xp.array(leaf_labels, 'i'),
}
[ ]:
xp = cuda.cupy if gpu_id >= 0 else np
vocab = {}
train_data = [linearize_tree(vocab, t, xp)
for t in read_corpus('trees/train.txt', max_size)]
train_iter = chainer.iterators.SerialIterator(train_data, batchsize)
validation_data = [linearize_tree(vocab, t, xp)
for t in read_corpus('trees/dev.txt', max_size)]
validation_iter = chainer.iterators.SerialIterator(
validation_data, batchsize, repeat=False, shuffle=False)
test_data = [linearize_tree(vocab, t, xp)
for t in read_corpus('trees/test.txt', max_size)]
Let’s try to display the first element of test_data
.
lefts
is the index of the left node for the dests
parent node, rights
is the index of the right node for the dests
parent node, dests
is the parent node’s index, dictionary
contains word id,`labels
has parent node label, and`` `leaf_labels`` contains dictionary of leaf node labels.
[26]:
print(test_data[0])
{'lefts': array([0, 2, 4], dtype=int32), 'rights': array([1, 3, 5], dtype=int32), 'dests': array([4, 5, 6], dtype=int32), 'words': array([252, 71, 253, 254], dtype=int32), 'labels': array([3, 1, 2], dtype=int32), 'leaf_labels': array([3, 2, 1, 2], dtype=int32)}
Definition of mini-batchable models¶
Recursive Neural Network has two operations: Operation A for computing an embedding vector for the leaf node. Operation B for computing the hidden state vector of the parent node from the hidden state vectors of the two child nodes.
For each sample, we assign index to each node in returning order. If you traverse the node in return order, you will find that operation A is performed on the leaf node and operation B is performed at the other nodes.
This operation can also be regarded as using a stack to scan a tree structure. A stack is a last-in, first-out data structure that allows you to do two things: a push operation to add data and a pop operation to get the last pushed data.
For operation A, push the calculation result to the stack. For operation B, pop two data and push the new calculation result.
When we parallelize the above operation, it is necessary to traverse nodes and perform operation A and operation B precisely because the tree structure is different for each sample. However, by using the stack, we can calculate different tree structures by simple repeating processing. Therefore, parallelization is possible.
[ ]:
from chainer import cuda
from chainer.utils import type_check
class ThinStackSet(chainer.Function):
"""Set values to a thin stack."""
def check_type_forward(self, in_types):
type_check.expect(in_types.size() == 3)
s_type, i_type, v_type = in_types
type_check.expect(
s_type.dtype.kind == 'f',
i_type.dtype.kind == 'i',
s_type.dtype == v_type.dtype,
s_type.ndim == 3,
i_type.ndim == 1,
v_type.ndim == 2,
s_type.shape[0] >= i_type.shape[0],
i_type.shape[0] == v_type.shape[0],
s_type.shape[2] == v_type.shape[1],
)
def forward(self, inputs):
xp = cuda.get_array_module(*inputs)
stack, indices, values = inputs
stack[xp.arange(len(indices)), indices] = values
return stack,
def backward(self, inputs, grads):
xp = cuda.get_array_module(*inputs)
_, indices, _ = inputs
g = grads[0]
gv = g[xp.arange(len(indices)), indices]
g[xp.arange(len(indices)), indices] = 0
return g, None, gv
def thin_stack_set(s, i, x):
return ThinStackSet()(s, i, x)
In addition, we use thin stack[2] instead of simple stack here.
Let the sentence length be \(I\) and the number of dimensions of the hidden vector be \(D\), the thin stack can efficiently use the memory by using the matrix of \((2I-1) \times D\).
In a normal stack, you need \(O(I^2 D)\) space computation, whereas thin stacks require \(O(ID)\).
It is realized by push operation thin_stack_set
and pop operation thin_stack_get
.
First of all, we define ThinStackSet
and ThinStackGet
which inherit chainer.Function
.
ThinStackSet
is literally a function to set values on the thin stack.
inputs
in forward
and backward
can be broken down like stack, indices, values = inputs
.
stack
is shared by functions by setting it as a function argument in the thin stack itself.
Because chainer.Function
does not have internal states inside, it handles stack
externally by passing it as a function argument.
[ ]:
class ThinStackGet(chainer.Function):
def check_type_forward(self, in_types):
type_check.expect(in_types.size() == 2)
s_type, i_type = in_types
type_check.expect(
s_type.dtype.kind == 'f',
i_type.dtype.kind == 'i',
s_type.ndim == 3,
i_type.ndim == 1,
s_type.shape[0] >= i_type.shape[0],
)
def forward(self, inputs):
xp = cuda.get_array_module(*inputs)
stack, indices = inputs
return stack[xp.arange(len(indices)), indices], stack
def backward(self, inputs, grads):
xp = cuda.get_array_module(*inputs)
stack, indices = inputs
g, gs = grads
if gs is None:
gs = xp.zeros_like(stack)
if g is not None:
gs[xp.arange(len(indices)), indices] += g
return gs, None
def thin_stack_get(s, i):
return ThinStackGet()(s, i)
ThinStackGet
is literally a function to retrieve values from the thin stack.
inputs
in forward
and backward
can be broken down like stack, indices = inputs
.
[ ]:
class ThinStackRecursiveNet(chainer.Chain):
def __init__(self, n_vocab, n_units, n_label):
super(ThinStackRecursiveNet, self).__init__(
embed=L.EmbedID(n_vocab, n_units),
l=L.Linear(n_units * 2, n_units),
w=L.Linear(n_units, n_label))
self.n_units = n_units
def leaf(self, x):
return self.embed(x)
def node(self, left, right):
return F.tanh(self.l(F.concat((left, right))))
def label(self, v):
return self.w(v)
def __call__(self, *inputs):
batch = len(inputs) // 6
lefts = inputs[0: batch]
rights = inputs[batch: batch * 2]
dests = inputs[batch * 2: batch * 3]
labels = inputs[batch * 3: batch * 4]
sequences = inputs[batch * 4: batch * 5]
leaf_labels = inputs[batch * 5: batch * 6]
inds = np.argsort([-len(l) for l in lefts])
# Sort all arrays in descending order and transpose them
lefts = F.transpose_sequence([lefts[i] for i in inds])
rights = F.transpose_sequence([rights[i] for i in inds])
dests = F.transpose_sequence([dests[i] for i in inds])
labels = F.transpose_sequence([labels[i] for i in inds])
sequences = F.transpose_sequence([sequences[i] for i in inds])
leaf_labels = F.transpose_sequence([leaf_labels[i] for i in inds])
batch = len(inds)
maxlen = len(sequences)
loss = 0
count = 0
correct = 0
# thin stack
stack = self.xp.zeros((batch, maxlen * 2, self.n_units), 'f')
# 葉ノードの隠れ状態ベクトルとlossを計算
for i, (word, label) in enumerate(zip(sequences, leaf_labels)):
batch = word.shape[0]
es = self.leaf(word)
ds = self.xp.full((batch,), i, 'i')
y = self.label(es)
loss += F.softmax_cross_entropy(y, label, normalize=False) * batch
count += batch
predict = self.xp.argmax(y.data, axis=1)
correct += (predict == label.data).sum()
stack = thin_stack_set(stack, ds, es)
# 中間ノードの隠れ状態ベクトルとlossを計算
for left, right, dest, label in zip(lefts, rights, dests, labels):
l, stack = thin_stack_get(stack, left)
r, stack = thin_stack_get(stack, right)
o = self.node(l, r)
y = self.label(o)
batch = l.shape[0]
loss += F.softmax_cross_entropy(y, label, normalize=False) * batch
count += batch
predict = self.xp.argmax(y.data, axis=1)
correct += (predict == label.data).sum()
stack = thin_stack_set(stack, dest, o)
loss /= count
reporter.report({'loss': loss}, self)
reporter.report({'total': count}, self)
reporter.report({'correct': correct}, self)
return loss
[ ]:
model = ThinStackRecursiveNet(len(vocab), n_units, n_label)
if gpu_id >= 0:
model.to_gpu()
optimizer = chainer.optimizers.AdaGrad(0.1)
optimizer.setup(model)
<chainer.optimizers.ada_grad.AdaGrad at 0x7f8a3c453710>
Preparation of Updater · Trainer and execution of training¶
Let’s train with the new model ThinStackRecursiveNet
. Since you can now compute mini batches in parallel, you can see that training is faster.
[ ]:
def convert(batch, device):
if device is None:
def to_device(x):
return x
elif device < 0:
to_device = cuda.to_cpu
else:
def to_device(x):
return cuda.to_gpu(x, device, cuda.Stream.null)
return tuple(
[to_device(d['lefts']) for d in batch] +
[to_device(d['rights']) for d in batch] +
[to_device(d['dests']) for d in batch] +
[to_device(d['labels']) for d in batch] +
[to_device(d['words']) for d in batch] +
[to_device(d['leaf_labels']) for d in batch]
)
updater = chainer.training.StandardUpdater(
train_iter, optimizer, device=None, converter=convert)
trainer = chainer.training.Trainer(updater, (n_epoch, 'epoch'))
trainer.extend(
extensions.Evaluator(validation_iter, model, converter=convert, device=None),
trigger=(epoch_per_eval, 'epoch'))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.MicroAverage(
'main/correct', 'main/total', 'main/accuracy'))
trainer.extend(extensions.MicroAverage(
'validation/main/correct', 'validation/main/total',
'validation/main/accuracy'))
trainer.extend(extensions.PrintReport(
['epoch', 'main/loss', 'validation/main/loss',
'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))
trainer.run()
epoch main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time
1 1.75582 0.268018 0.772637
2 1.0503 1.52234 0.63964 0.448087 1.74078
3 0.752925 0.743243 2.52495
4 1.21727 1.46956 0.745495 0.456284 3.49669
5 0.681582 0.817568 4.24974
6 0.477964 1.5514 0.880631 0.480874 5.22265
7 0.38437 0.916667 5.98324
8 0.30405 1.68066 0.923423 0.469945 6.94833
9 0.222884 0.959459 7.69772
10 0.175159 1.79104 0.977477 0.478142 8.67923
11 0.142888 0.97973 9.43108
12 0.118272 1.87948 0.986486 0.47541 10.4046
13 0.0991659 0.997748 11.1994
14 0.0841932 1.95415 0.997748 0.478142 12.1657
15 0.0723124 0.997748 12.9141
16 0.0627568 2.01682 0.997748 0.480874 13.8787
17 0.0549726 1 14.6336
18 0.04857 2.07107 1 0.478142 15.6061
19 0.0432675 1 16.3584
20 0.0388425 2.1181 1 0.480874 17.3297
21 0.035117 1 18.0761
22 0.0319522 2.15905 1 0.478142 19.0487
23 0.0292416 1 19.8416
24 0.0269031 2.1951 1 0.480874 20.8083
25 0.0248729 1 21.5566
26 0.0231 2.22721 1 0.483607 22.5304
27 0.0215427 1 23.2878
28 0.0201669 2.25614 1 0.486339 24.2565
29 0.018944 1 25.0171
30 0.017851 2.28247 1 0.480874 26.0063
31 0.0168687 1 26.7633
32 0.0159814 2.30664 1 0.483607 27.7331
33 0.0151763 1 28.5342
34 0.0144427 2.32898 1 0.483607 29.5039
35 0.0137716 1 30.257
36 0.0131555 2.34976 1 0.483607 31.2306
37 0.0125881 1 31.9842
38 0.0120638 2.3692 1 0.483607 32.9617
39 0.0115783 1 33.7175
40 0.0111272 2.38747 1 0.483607 34.6946
It got much faster!
Reference¶
[1] 深層学習による自然言語処理 (機械学習プロフェッショナルシリーズ)
[2] [A Fast Unified Model for Parsing and Sentence Understanding](http://nlp.stanford.edu/pubs/bowman2016spinn.pdf)