Rotation and shear mapping of Linear algebra

I created simple examples of the rotation and shear mapping of linear algebra for understanding.

My examples this repository.

This visualized example was very helpful for understanding eigenvectors and eigenvalues.


The rotation uses below rotation matrix.
A = \begin{bmatrix}
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta

A rotated vector is represented like this;

rotatedVector =
\cos\theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{bmatrix} v


\cos90 & -\sin90 \\
\sin90 & \cos90
1 \\
0 & -1 \\
1 & 0
1 \\
-1 \\


This is my example code.

Origin image here.

original image

Rotate it 90 degrees.


Shear mapping

The shear mapping uses below matrix.
A = \begin{bmatrix}
1 & 1 \\
0 & 1

A sheer mapping vector is represented like this;

shearedVector =
1 & 1 \\
0 & 1
\end{bmatrix} v


1 & 1 \\
0 & 1
1 \\
2 \\

The x-axis of the vecter is increased, but the y-axis is not changed. Any other vectors are the same behaviou​r​, only the x-a​xis values are changed.


This is my example code.

Sheard image.

sheard image

Traveling Germany and U.K.

I traveled to Germany and U.K. from Jul 5 to Jul 27.

My things

  • Two wallets (the one is sub)
  • Clothes (three pairs of under ware, a pair of pants, a pair of short pants, a pair of short pants for sleeping)
  • A multiple power plug adapter
  • A hand towel and face towel
  • Documents (booking information for flight, insurance)
  • A beach sandal
  • Wet Tissue
  • Teeth brash
  • A shoulder bag
  • An umbrella
  • Some of ziplock
  • Foreign language book
  • A windbreaker
  • A portable battery and some cables
  • A three-pronged outlet
  • A string for wallet
  • A camera
  • An iPhone
  • An earphone

These are the things I brought for my travel. I stuffed these things into my backpack.

Day 1 Berlin

I went to Berlin from Haneda airport via Munich.

First, arrived in Berlin central station and a hostel in Berlin. I felt sleepy by jet lag.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 2 Berlin

I visited Berlin first.

The Berlin Wall

Post from RICOH THETA. – Spherical Image – RICOH THETA

Berlin Cathedral

Post from RICOH THETA. – Spherical Image – RICOH THETA

I tried a Berliner Weisse (Berlin style beer). I felt its taste was sour and not the beer for me, and also I didn’t have all of them because I’m full at the time.

Day 3 Berlin to Frankfurt

I moved to Frankfurt by train.
First, I missed a train and second I take on a train with an additional cost of about 140 Euro😭.

Day 4 Frankfurt

It was pretty dirty around the central station in Frankfurt. I had a Schnitzel and an Apple wine and Beer for lunch. The apple wine was good. I would like to drink it in Japan. After lunch, I took on a hop-on-hop-off bus for touring around the center of the city.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 5 Frankfurt to Stuttgart

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 6 Stuttgart to Konstanz

I only visited the Mercedes-Benz museum. There are many cars there.

I moved by train and stay in a Switzerland side hotel.

Day 7 Konstanz

Reichenau Island

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 8 Konstanz to Garmisch-Partenkirchen

I used Flixbus’s bus to move to Garmisch-Partenkirchen because it was cheaper than the DB train. But the coach had some trouble (delay, the bus stop is far from Garmisch-Partenkirchen station about 2km), so I might I don’t use Flixbus next time.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 9 Garmisch-Partenkirchen

I visited mount Zugspitze and AlpspiX. The town is best I visited in Germany.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 10 Garmisch-Partenkirchen to Fussen

I went to the street the woman in the hotel telling me it’s beautiful. There are wall paintings both sides in the street 200 meters long.

Day 11 Fussen

I went to Neuschwanstein Castle. I think the castle is better in winter than in summer because the white castle’s wall matches snow.

Day 12 Fussen to Munich

Day 13 Munich

Marienplatz and around here. Notably, the new town hall and the Hofbräuhaus am Platzl was impressive.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 14 Munich to London, U.K.

I used the easyJet to move to London from Munich. I assumed that the airline strict with hand baggage, but I was not any checks when I started boarding.

Day 15 London

I went to the British Museum in the morning. Then I met my friend then went to 10 downing street, Buckingham Palace in the afternoon, and Japan Center. I had sushi in the Japan center it is not good and more expensive than having it in Japan.

Day 16 London

I went to several markets all the day. Spitalfields Market, Covent Garden, and Borough Market. I bought a bracelet for my mother, and I could eat jellied eels in Borough Market (this was a TODO in London!😀)

Day 17 London

I went on a day trip to southern the Seven Sisters. It is a beautiful park in U.K.

Day 18 London to Preston

I met my friend and was taken around to London. The changing of the guard) in Buckingham Palace, The Tower Bridge (also we saw the opening of the gate. That is very rarely!), A brewery and tasted some beers. Then we move to Preston.

Day 19 Preston

The Morning Tea

My friend took me to Blackpool and Blackpool Tower. The tower was build based on the Eiffel Tower, and I brought some Blackpool Rock. I have a fish and chips with Mushy Peas for lunch. This is my only food in U.K. it is not good because it was very oily😅. Then we went to authentic British pubs in Preston, and I stayed at my friend’s home.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 20 Preston to Windermere

I had a Full English Breakfast and moved to Windermere. I arrived at Windermere station evening, and I was pretty tired, so I decided just stayed a hotel in the day.

Day 21 Windermere

Orest Head is near Windermere station. It takes 15 minutes from the station on foot.

I went to Keswick by bus number 555 for seeing Castlerigg Stone Circle.

Then went Grasmere and bought some Sarah Nelson’s Grasmere Gingerbreads and then hiked from Grasmere to Rydal Mount. It takes about two hours.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Then move to Waterhead pier from Rydal Mount and took on a ferry to Bowness.

When I arrived in Bowness just went to Windermere station and
a supermarket for dinner and went back to my hotel.

Day 22 Windermere to Heathrow airport

I just moved to Heathrow airport. I was in a hurry because a train I reserved from Oxenholme Lake District to London delayed about 45 minutes. I had never seen that situation. I was glad I had enough time to depart about three hours.

Post from RICOH THETA. – Spherical Image – RICOH THETA

Day 23 Tokyo, Japan

It was hot and with a high degree of moisture💦.

The unuseful things I brought for my traveling


Basically, the things we can do on iPhone and iPad are same. The iPad just has a bigger display. I thought it was useful for reading books, but I could read with my iPhone, not iPad. And also I didn’t have time to use iPad in hotels or some other places. It only gained my baggage weight.

A pair of pants

Germany and U.K. were hot! I don’t need it in July.

Foreign language textbook

I couldn’t speak and listen German just by learning German for a few days.

A three-pronged outlet

This outlet was for only Japan. I didn’t notice that. For instance, it didn’t work on 240v.

A string for wallet

It was just annoying. Eventually, I didn’t use it.

About three weeks traveling

I enjoyed at first of one week. But after two weeks, I was starting to be boring. So I think that the number of days for traveling is about 10 days for me.

Full-text search for Japanese with ngram full-text parser


Create an index with ngram full-text parser.

CREATE FULLTEXT INDEX idx_message_log ON message_log (message) WITH PARSER ngram;

ngram Full-Text Parser

We want to do a full-text search for searching our entire texts that exist about 150k rows. Its text is written in Japanese.

Our table like this;

create table message_log
    id int(11) unsigned not null,
    message varchar(255) default '' not null,
    primary key (id)

If message column would fill out in English or other space-separated languages, you can create a full-text index.

CREATE FULLTEXT INDEX idx_message_log on message_log.message (message);

However, we treat the message as Japanese. In this case, we cannot get any message, because Japanese is not the text space-separated words.

For instance, Japanese like this;


not space-separated;

好きな メンバー と その 理由 を 教えて 下さい!

We assume we have this record.

id message
1 好きなメンバーとその理由を教えて下さい!
2 好きな メンバー と その 理由 を 教えて 下さい!

We find messages with the full-text search function.

SELECT * FROM message_log WHERE MATCH (message) AGAINST ('メンバー');

And then get this result.

id message
2 好きな メンバー と その 理由 を 教えて 下さい!

We expect to can get all records. However, it does not include the text 好きなメンバーとその理由を教えて下さい!.

So we create an index with ngram full-text parser.

CREATE FULLTEXT INDEX idx_message_log ON message_log (message) WITH PARSER ngram;

Again, find messages.

SELECT * FROM message_log WHERE MATCH (message) AGAINST ('メンバー');

And then we get ID 1 and 2 that we expected.

id message
1 好きなメンバーとその理由を教えて下さい!
2 好きな メンバー と その 理由 を 教えて 下さい!

Building a CI for Golang test

I built a CI with Jenkins for Golang test. We run go test on a Docker container and even run Jenkins on a Docker container.


├── docker
│   ├── dockerfiles # Dockerfiles for unit test
│   └── test
│       ├── # This initializes DB before testing
│       └── # Testing script
└── Jenkinsfile # The configuration for Jenkins pipeline

Environment of CI

Our Jenkins server uses an EC2 instance of t2.large, and the server runs on Docker container, and even a unit test run on Docker container on the container Jenkins runs with /var/run/docker.sock.

Jenkins loads Jenkinsfile and then execute it on the Jenkins pipeline.

How to build an execution environment

Create an AWS EC2 instance

We prepare the instance of EC2 installed Docker CE. Please see Get Docker CE for CentOS installation guide.

Create a Docker image for golang unit test



FROM jenkins/jenkins:lts

# Switch to root user
USER root

# Install Docker
RUN apt-get update
RUN apt-get install -y \
     apt-transport-https \
     ca-certificates \
     curl \
     gnupg2 \

RUN curl -fsSL | apt-key add -
RUN add-apt-repository \
   "deb [arch=amd64] \
   $(lsb_release -cs) \
RUN apt-get update
RUN apt-get install -y docker-ce
RUN echo "jenkins ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# Switch back to jenkins user
USER jenkins

# Set system timezone JST

ENV TZ Asia/Tokyo

Run on a Jenkins host.

$ docker build --rm --tag jenkins-docker:latest .



FROM circleci/golang:1.9

# Install goose
RUN curl | sh
RUN go get

# Set system timezone JST
ENV TZ Asia/Tokyo

Run on our Jenkins host.

$ docker build --rm --tag golang:latest .

Launch Jenkins

$ sudo docker run --env JAVA_OPTS=-Dorg.apache.commons.jelly.tags.fmt.timeZone=Asia/Tokyo -v /var/run/docker.sock:/var/run/docker.sock --name jenkins -d -p 80:8080 -p 50000:50000 -v jenkins_home:/var/jenkins_home jenkins-docker:latest

-v /var/run/docker.sock:/var/run/docker.sock is used to manipulate host’s Docker because we want to launch containers on host-side.

-v jenkins_home:/var/jenkins_home is used to store everything of our Jenkins configurations and build results on our host’s filesystem. If you
move your Jenkins to another host or backup your Jenkins data, read this page.

Add a job to Jenkins

We need to enable Jenkins to hook Pull Requests when some developer does it.

Below settings on Jenkins.

Add a credential of GitHub enterprise

Because we use GitHub enterprise our development.

Credentials -> Jenkins -> Global credentials -> Add Credentials

key value
Kind Username with password
Scope Global
Username ci
Password ****

Add our GitHub enterprise

Configure System -> GitHub Enterprise Servers

key value
API endpoint http://***/api/v3
Name GitHub Enterprise

Create a job

New Item -> GitHub Organization -> OK

Configure the job’s settings

<Job> -> Configure -> Projects

key value
API endpoint GitHub Enterprise (http://***/api/v3)
Credentials ci/****
Owner some-repogitory
Script Path Jenkinsfile

<Job> -> Configure -> Projects -> Behaviours

key value
Filter by name (with regular expression) some-repogitory
Discover pull request from forks – Strategy Merging the pull request with the current target branch revision
Discover pull request from forks – Trust Everyone

Create a webhook in GitHub

To hook PR in Jenkins, We need to create a webhook in GitHub. Note that we must use the user has right permission.

<your repository> -> Settings -> Hooks

key value
Payload URL http://***/github-webhook/
Content type application/json
Which events would you like to trigger this webhook? Send me everything
Active true

Disable Jenkins’ authentication

Because we use Jenkins in a secure place, there are no incoming packets from the internet.

Manage Jenkins -> Configure Global Security -> Access Control -> Authorization -> check Anyone can do anything

Upgrade Jenkins

Since jenkins_home Docker volume has all Jenkins’ setting files, We pull a latest Docker image and relaunch Docker container, that’s it!

$ sudo docker stop jenkins
$ sudo docker rm jenkins
$ sudo docker pull jenkins/jenkins:lts
$ cd app/docker/dockerfiles/jenkins
$ docker build --rm --tag jenkins-docker:latest .
$ sudo docker run --env JAVA_OPTS=-Dorg.apache.commons.jelly.tags.fmt.timeZone=Asia/Tokyo -v /var/run/docker.sock:/var/run/docker.sock --name jenkins -d -p 80:8080 -p 50000:50000 -v jenkins_home:/var/jenkins_home jenkins-docker:latest

Jenkinsfile template

The Jenkinsfile we use, almost same, like this;

pipeline {
    agent any

    stages {
        stage('Checkout') {
            steps {
                step($class: 'GitHubSetCommitStatusBuilder')
                checkout scm

        stage('Start up containers') {
            steps {
                sh "sudo docker network create ci${env.EXECUTOR_NUMBER}"

                sh "sudo docker run -d --name mysql${env.EXECUTOR_NUMBER} --network ci${env.EXECUTOR_NUMBER} -p 3306${env.EXECUTOR_NUMBER}:3306 circleci/mysql:5.7"
                sh "sudo docker run -d --name redis${env.EXECUTOR_NUMBER} --network ci${env.EXECUTOR_NUMBER} redis:4.0"

                script {
                    if (sh (
                            script: "sudo docker create --name golang${env.EXECUTOR_NUMBER} --network ci${env.EXECUTOR_NUMBER} golang:latest bash /go/src/app/docker/test/",
                            returnStatus: true
                    ) == 0) {
                        sh "sudo docker cp ${env.WORKSPACE} golang${env.EXECUTOR_NUMBER}:/go/src/leo-server"

        stage('Initialize containers') {
            steps {
                // Initialize something like DB

        stage('Unit test') {
            steps {

                script {
                    if (sh (
                            script: "sudo docker start -a golang${env.EXECUTOR_NUMBER}",
                            returnStatus: true
                    ) != 0) {
                        currentBuild.result = 'FAILURE'

                // Copy test report and convert it into junit xml report
                sh "sudo docker cp golang${env.EXECUTOR_NUMBER}:/go/src/app/report.xml ."

                step([$class: 'JUnitResultArchiver', testResults: 'report.xml'])

    post {
        always {
            sh script: "sudo docker stop mysql${env.EXECUTOR_NUMBER}", returnStatus: true
            sh script: "sudo docker stop redis${env.EXECUTOR_NUMBER}", returnStatus: true

            sh script: "sudo docker rm mysql${env.EXECUTOR_NUMBER}", returnStatus: true
            sh script: "sudo docker rm redis${env.EXECUTOR_NUMBER}", returnStatus: true
            sh script: "sudo docker rm golang${env.EXECUTOR_NUMBER}", returnStatus: true

            sh script: "sudo docker network rm ci${env.EXECUTOR_NUMBER}", returnStatus: true

Our test script,, like this;


sudo chown -R circleci:circleci /go/src

cd /go/src/leo-server

echo 'Installing go-packages...'
glide i

echo 'Migrating DBs...'
go get
goose -env=ci -path=database/user up

echo 'Installing testing libraries...'
go get -u

echo 'Testing...'
go test -v ./... 2>&1 > tmp
go-junit-report < tmp > report.xml

exit ${status}


Goは他のフレームワークにあるような大きなアサーションツールを持っていません。Goでは testing.T オブジェクトのメソッドがテストに使われます。

  • T.Error(args ...interface{}) または T.Error(msg string, args interface{}) はメッセージを受け取ってテストを失敗させるために使用されます
  • T.Fatal(args ...interface{}) または T.Fatal(mst string, args interface{})T.Error() と似ていますがテストが失敗すると、それ以降のテストは実行されません。テストが失敗した時それ以降のテストも失敗する場合、 T.Fatal() を使うべきです




Goのインターフェースはメソッドの期待する動作を表しています。 例として io.Writer を見てみます。

type Writer interface {
    Write(p []byte) (n int, err error)

io.Writer インターフェースは引数で受け取ったバイト列を書き込みますが、このインターフェースは os.Fileなどで実装されています。Goのtypeシステムではどのインターフェースを使うか明示する必要がありません。既存のtypeのプロパティと一致するインターフェースを宣言することで、外部ライブラリの動作を変更することができます。



type Message struct {
     // ...

func (m *Message) Send(email, subject string, body []byte) error {
     // ...
     return nil


type Messager interface {
    Send(email, subject string, body []byte) error

Alertメソッドでメッセージを送信することを考えます。Message typeを直接渡すのではなくMessager引数で受け取って、インターフェースのSendメソッドを呼び出すようにします。

func Alert(m Messager, problem []byte) error {
    return m.Send("", "Critical Error", problem)



package msg

import (

type MockMessage struct {
    email, subject  string
    body            []byte

func (m *MockMessage) Send(email, subject string, body []byte) error { = email
    m.subject = subject
    m.body = body
    return nil

func TestAleart(t *testing.T) {
    msgr := new(MockMessage) // モックのメッセージを作成します
    body := []byte("Critical Error")

    Alert(msgr, body) // Aleartメソッドを実行します

    if msgr.subject != "Critical Error" {
        t.Errorf("Expected 'critical Error', Got '%s'", msgr.subject)

Messagerインターフェースを実装するためにMockMessage typeを作成します。MockeMessageではMessagerと同じSend()が実装されています。このSend()はメーセージを実際に送信するのではなくデータをオブジェクトに保存しておくことでテストしやすくなります。





type MyWriter struct{
     // ...

func (m *MyWriter) Write([]byte) error {
     // どこかにデータを書き出す
     return nil

ぱっと見io.Writeを実装しているように見えますが、正しくはWrite(p []byte) (n int, err error)です。なのでio.Writeを実装できていません。

次に、type assertionを使ってコードを書いてみます。

func main() {
    m := map[string]interface{}{
        "w": &MyWriter(),

func doSomething(m map[string]interface{}) {
    w := m["w"].(io.Writer) // runtime exceptionになる


これを防ぐために以下のようなカナリアテストを追加します。(ちなみにカナリアテストは”canary in the coal mine”から来ているようです)

func TestWriter(t *testing.T) {
    var _ io.Writer = &MyWriter{} // コンパイラにtype assertionをやってもらう

このテストはもちろん失敗します。このようにtype assertionを使ってテストすることで、インターフェースを正しく実装できているか確認することができます。また、外部ライブラリのシグネチャの変更にも気づくことができます。

Lenear algebra for machine learning

I’ve been reviewing linear algebra, Mathematics for Machine Learning: Linear Algebra on Coursera. I finished the Week 2 module. This course is easy to understand as far. And I memorize what I did in week one and week two modules.

The three properties of dot product


r \cdot s = r_i s_i + r_j s_j \\
= 3 \times -1 + 2 \times 2 = 1 \\
= s \cdot r


r \cdot (s + t) = r \cdot s + r \cdot t
\] \[
r =
r_1 \\
r_2 \\
\vdots \\
r_n \\
s =
s_1 \\
s_2 \\
\vdots \\
s_n \\
t =
t_1 \\
t_2 \\
\vdots \\
t_n \\
\end{bmatrix} \\
s \cdot (s + t) = r_1(s_1 + t_1) + r_2(s_2 + t_2) + \cdot s + r_n (s_n + t_n) \\
= r_1s_1 + r_1t_1 + r_2s_2 + r_2t_2 + \cdot s + r_ns_n + r_nt_n \\
= r \cdot s + r \cdot t

Associative over scalar multiplication

r \cdot (as) = a(r \cdot s) \\
r_i(as_i) + r_j(a s_j) = a(r_is_i + r_js_j)

And r dot r is equal to the size of r squared.

r \cdot r = r_ir_i + r_jr_j \\
= r_i^2 + r_j^2 \\
r \cdot r = |r|^2

Cosine and dot product

cosine rule

c^2 = a^2 + b^2 – 2ab \cos\theta
\] \[
|r – s|^2 = |r|^2 + |s|^2 – 2|r||s|\cos\theta \\
(r-s) \cdot (r-s) = r \cdot r -s \cdot r -s \cdot r -s \cdot -s \\
= |r|^2 – 2s \cdot r + |s|^2 \\
-2s \cdot r = -2|r||s|\cos\theta \\
2s \cdot r = 2|r||s|\cos\theta \\
r \cdot s = |r||s|\cos\theta

It takes the size of the two vectors and multiplies by cos of the angle between them. It tells us something about the extent to which the two vectors go in the same direction.

\(\cos 0 = 1\), \(r \cdot s = |r||s|\).
Two vectors are orthogonal to each other, \(\cos 90 = 0\), \(r \cdot s = |r||s| \times 0 = 0\).
\(\cos 180 = -1\), \(r \cdot s = -|r||s|\).


A light coming down from s. It’s the shadow of s on r. This is called the projection.

\cos = \frac{adjecent}{hypotenuse} = \frac{adjecent}{|s|} \\
r \cdot s = |r| \underbrace{|s| \cos \theta}_{adjecent(|r| \times projection)}

Scalar projection

\frac {r \cdot s}{|r|} = |s| \cos \theta

Vector projection

The scalar projection also encoded with something about the direction of r a unit vector.

\frac {r \cdot s}{|r||r|}r = \frac {r \cdot s}{r \cdot r}r

Changing Basis

If you do the projection, two vectors must be orthogonal.

Convert from the e set of basis vectors to the b set of bases vectors.

This projection is of length 2 time \(b_1\)

\frac {r_e \cdot b_1}{|b_1|^2} = \frac {3 \times 2 + 4 \times 1}{2^2 + 1^2} = \frac {10}{5} = 2
\] \[
\frac {r_e \cdot b_1}{|b_1|^2} b1 = 2 \begin{bmatrix}2\\1 \end{bmatrix} = \begin{bmatrix}4\\2 \end{bmatrix}

This projection is of length \(\frac{1}{2}\) time \(b_2\)

\frac {r_e \cdot b_2}{|b_2|^2} = \frac {3 \times -2 + 4 \times 4}{-2^2 + 4^2} = \frac {10}{20} = \frac {1}{2}
\] \[
\frac {r_e \cdot b_2}{|b_2|^2} b2 = \frac {1}{2} \begin{bmatrix}-2\\4 \end{bmatrix} = \begin{bmatrix}-1\\2 \end{bmatrix}

We get the original vector r from above.

\begin{bmatrix}4\\2\end{bmatrix} + \begin{bmatrix}-1\\2\end{bmatrix} = \begin{bmatrix}3\\4\end{bmatrix}

In the basis b, it’s going to be
r_b =
2 \\
\frac{1}{2} \\

We can redescribe original axis using some other axis, some other basis vectors. The basis vectors we use to describe the space of data.

Basis, vector, and linear independence

Basis is a set of n vectors that:

  • are not linear combinations of each other (linearly independent)
  • span the space
  • The space is then n-dimensional

Applications of changing basis

We get minimus possible number for the noisiness.

Proposal of CEDEC 2018

I proposed the automatic reply system for our customer support to CEDEC 2018. Last week, CEDEC 2018 committee announced proposals adoptions. My proposal was not adopted, I am afraid. The causes I thought is that I just created an automatic reply system so I should have included about applying the system to our customer support and operation and feedback from our customer support on my proposal, but I did not finish these tasks yet. I will propose again what include considerations about above causes next year!












etc, other, account, payment
0.0038606818, 0.036638796, 0.04247639, 0.46222764



etc, other, account, payment
0.0007114554, 0.04938373, 0.72704375, 0.0038164733



  • サンプル数を増やす
  • LSTMの代わりに1-D convolutional networkを使う
  • 学習済みのword embeddingを使う

Understanding sentence with LSTM

I am going to demonstrate LSTM understand a sentence. The model I used explained this blog post.

Below video gives an example classifies the two questions that A is about payment and B is about an account. Both texts are what mix these two categories up and also reverse these sentence before and after each other.

The A (upper question) means in English “Thank you for helping a problem with an account. But, today, I get another problem about payment. I am sad about this happening.”

The B (lower question) means in English “Thank you for helping a problem with payment. But, today, I get another problem about an account. I am sad about this happening.”

These examples flip these means each other. And the A and B succeeded to classify categories. The model is sure of the categories because the score gets higher than the other scores. Let’s look at the score on the video. Below the predictions on the video shows the score, higher is better.
The 1st column (zero-based) express “other” category, 2nd is “account,” and 3rd is “payment.” The score like this:

Sentence A

etc, other, account, payment
0.0038606818, 0.036638796, 0.04247639, 0.46222764

Sentence B

etc, other, account, payment
0.0007114554, 0.04938373, 0.72704375, 0.0038164733

In the A, 3rd column is higher more than the other columns. It means the model is sure the A is about “payment” category. B is the same as A; it is certain of “account” category.

Thus, I found this model which uses LSTM may understand the sentence of a text.

Future tasks

  • Use more samples
  • Use 1-D convolutional network instead of LSTM
  • Use pre-trained word embedding









id question answer category




import json
import numpy as np
import csv

issues = []

with open("data/issues.tsv", 'r', encoding="utf-8") as tsv:
    tsv = csv.reader(tsv, delimiter='\t')

    for row in tsv:
        row = []
        row.append(row[1]) # question
        row.append(row[2]) # answer
        row.append(row[3]) # category






filtered_text = []
text = ["お時間を頂戴しております。version 1.2.3 ----------------------------------------"]

for t in issues:
    result = re.compile('-+').sub('', t)
    result = re.compile('[0-9]+').sub('0', result)
    result = re.compile('\s+').sub('', result)
    # ... このような置換処理が複数繋がっています

    # 質問テキストが空文字になることがあるのでその行は含めないようにします
    if len(result) > 0:

    print("text:%s" % result)
    # text:お時間を頂戴しております。




labels = []
samples = []
threshold = 700
cnt1 = 0
cnt2 = 0
cnt3 = 0

for i, row in enumerate(filtered_samples):
    if 'Account' in row[2]:
        if cnt2 < threashold:
            cnt1 += 1
    elif 'Payment' in row[2]:
        if cnt3 < threashold:
            cnt3 += 1
        if cnt1 < threashold:
            cnt1 += 1






import MeCab
import re

def tokenize(text):
    wakati = MeCab.Tagger("-O wakati")
    words = wakati.parse(text)

    # Make word list
    if words[-1] == u"\n":
        words = words[:-1]

    return words

texts = [tokenize(a) for a in samples]


お 時間 を 頂戴 し て おり ます



from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import numpy as np
from keras.utils.np_utils import to_categorical

maxlen = 1000
training_samples = 1600 # training data 80 : validation data 20
validation_samples = len(texts) - training_samples
max_words = 15000

# word indexを作成
tokenizer = Tokenizer(num_words=max_words)
sequences = tokenizer.texts_to_sequences(texts)

word_index = tokenizer.word_index
print("Found {} unique tokens.".format(len(word_index)))

data = pad_sequences(sequences, maxlen=maxlen)

# バイナリの行列に変換
categorical_labels = to_categorical(labels)
labels = np.asarray(categorical_labels)

print("Shape of data tensor:{}".format(data.shape))
print("Shape of label tensor:{}".format(labels.shape))

# 行列をランダムにシャッフルする
indices = np.arange(data.shape[0])
data = data[indices]
labels = labels[indices]

x_train = data[:training_samples]
y_train = labels[:training_samples]
x_val = data[training_samples: training_samples + validation_samples]
y_val = labels[training_samples: training_samples + validation_samples]

data は以下のような整数のシーケンスなデータになっています。

[0, 0, 0, 10, 5, 24]



学習にはKerasを使用しています。KerasにはLSTMとword embeddingが用意されているので、それを使います。LSTMは時系列データの分類や回帰問題などに利用されます。


from keras.models import Sequential
from keras.layers import Flatten, Dense, Embedding
from keras.layers import LSTM

model = Sequential()
model.add(Embedding(15000, 100, input_length=maxlen))
model.add(Dense(4, activation='sigmoid'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])

このモデルはLSTMの学習の他にEmbedding()を使ってword embeddingも同時に学習します。


history =, y_train, epochs=15, batch_size=32, validation_split=0.2, validation_data=(x_val, y_val))


%matplotlib inline

import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')


plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')


最終的にvalidation accuracyが約90%になりました。






カテゴリーを予測する前にword indexを作成する必要があります。このword indexはモデルを作成した時と同じものです。

# 学習済みモデルをロードする
model = load_model('../pre_trained_model.h5')

# padded_seqは2次元の行列で渡す必要があります
result = model.predict([padded_seq])





Deep Learning with Python こちらの書籍がとても参考になりました!Keras作者のCholletさんによって書かれているのでとてもオススメです。