How to find out bottlenecks in HTTP server and MySQL

I participated in ISUCON for the first time.

I acquired some knowledge to find out bottleneck in HTTP server and MySQL. Our team used alp for HTTP response analysis and pt-query-digest for MySQL slow query analysis.

I try to introduce these tools that were useful in our situation or might also other teams(or situations).

How to find the bottleneck of web APIs on web application

We need to analyze access log of API generated with Nginx to decrease API’s response time.

So, I used alp to analyze access log.

Download alp binary proper your architecture, usually amd64 if you use Linux, from here

$ wget https://github.com/tkuchiki/alp/releases/download/v0.3.1/alp_linux_amd64.zip
$ unzip alp_linux_amd64.zip

Change Nginx log formart to LTSV format

$ sudo vim /etc/nginx/nginx.conf
log_format ltsv "time:$time_local"
                "\thost:$remote_addr"
                "\tforwardedfor:$http_x_forwarded_for"
                "\treq:$request"
                "\tstatus:$status"
                "\tmethod:$request_method"
                "\turi:$request_uri"
                "\tsize:$body_bytes_sent"
                "\treferer:$http_referer"
                "\tua:$http_user_agent"
                "\treqtime:$request_time"
                "\tcache:$upstream_http_x_cache"
                "\truntime:$upstream_http_x_runtime"
                "\tapptime:$upstream_response_time"
                "\tvhost:$host";

access.log ltsv;

Analyze Nginx log

$ ./alp -f /var/log/nginx/access.log

The list analyzed is like this;

+-------+-------+-------+---------+-------+-------+-------+-------+--------+------------+------------+-------------+------------+--------+-----------------------------------------------------+
| COUNT |  MIN  |  MAX  |   SUM   |  AVG  |  P1   |  P50  |  P99  | STDDEV | MIN(BODY)  | MAX(BODY)  |  SUM(BODY)  | AVG(BODY)  | METHOD |                         URI                         |
+-------+-------+-------+---------+-------+-------+-------+-------+--------+------------+------------+-------------+------------+--------+-----------------------------------------------------+
|    12 | 0.000 | 0.000 |   0.000 | 0.000 | 0.000 | 0.000 | 0.000 |  0.000 |    318.000 |    318.000 |    3816.000 |    318.000 | GET    | /favicon.ico                                        |
|     1 | 0.000 | 0.000 |   0.000 | 0.000 | 0.000 | 0.000 | 0.000 |  0.000 |   4162.000 |   4162.000 |    4162.000 |   4162.000 | GET    | /js/chat.js                                         |
|     6 | 0.770 | 1.932 |   7.167 | 1.194 | 0.770 | 0.988 | 1.628 |  0.432 | 327999.000 | 327999.000 | 1967994.000 | 327999.000 | GET    | /icons/b1c8c5bc9b026507ae62e6bf62cf55f70f4ac3d8.png |
|     7 | 0.212 | 1.967 |   7.447 | 1.064 | 0.212 | 0.998 | 1.501 |  0.536 | 105173.000 | 105173.000 |  736211.000 | 105173.000 | GET    | /icons/e58ed7194ed4a99d9d6ada9e3383983edcbd1edc.png |
|   301 | 0.099 | 3.752 | 473.578 | 1.573 | 0.549 | 1.507 | 2.921 |  0.407 |      0.000 |   3005.000 |  885578.000 |   2942.120 | GET    | /fetch                                              |
+-------+-------+-------+---------+-------+-------+-------+-------+--------+------------+------------+-------------+------------+--------+-----------------------------------------------------+

This table shows the API response times that are listed in ascending order of time.

We started to improve the response time on worst. This case, we should start at /fetch and /icon.

How to find bottleneck queries on MySQL

To find bottleneck of SQL queries, we used pt-query-digest.

Install percona toolkit

See this document.

$ wget https://repo.percona.com/apt/percona-release_0.1-4.$(lsb_release -sc)_all.deb

$ sudo dpkg -i percona-release_0.1-4.$(lsb_release -sc)_all.deb

$ sudo apt-get update

$ sudo apt-cache search percona

$ sudo apt-get install percona-toolkit
$ sudo vim /etc/mysql/my.cnf
# Enable slow query log
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow_query.log

Analyze slow query log

$ pt-query-digest /var/log/mysql/slow_query.log

The list analyzed is like this;

# 660ms user time, 30ms system time, 39.37M rss, 121.42M vsz
# Current date: Sat Oct 21 14:54:35 2017
# Hostname: app1053
# Files: /var/log/mysql/slow-query.log
# Overall: 1.50k total, 23 unique, 2.73 QPS, 0.01x concurrency ___________
# Time range: 2017-10-21T05:45:19 to 2017-10-21T05:54:29
# Attribute          total     min     max     avg     95%  stddev  median
# ============     ======= ======= ======= ======= ======= ======= =======
# Exec time             3s     1us   608ms     2ms     5ms    20ms   103us
# Lock time           78ms       0     7ms    51us   131us   174us    36us
# Rows sent          1.96k       0     100    1.33    0.99    8.72       0
# Rows examine       2.91M       0   9.77k   1.98k   9.33k   3.74k       0
# Query size        90.88k      20     202   61.88  136.99   25.68   62.76

# Profile
# Rank Query ID           Response time Calls R/Call V/M   Item
# ==== ================== ============= ===== ====== ===== ==============
#    1 0x16A13978F3B7FCFA  1.7328 55.1%     6 0.2888  0.06 SELECT image
#    2 0x4866C96BA295C56E  0.7782 24.7%   301 0.0026  0.00 SELECT message
#    3 0xEBDB62606B914A34  0.4282 13.6%    90 0.0048  0.00 INSERT channel
#    4 0x91283C307570F439  0.0334  1.1%     4 0.0083  0.00 SELECT message
#    5 0x41883793B68D128E  0.0302  1.0%   100 0.0003  0.00 SELECT user
# MISC 0xMISC              0.1427  4.5%  1003 0.0001   0.0 <18 ITEMS>

# Query 1: 6 QPS, 1.73x concurrency, ID 0x16A13978F3B7FCFA at byte 190178
# Scores: V/M = 0.06
# Time range: 2017-10-21T05:46:17 to 2017-10-21T05:46:18
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count          0       6
# Exec time     55      2s   203ms   608ms   289ms   580ms   136ms   242ms
# Lock time      1   840us   109us   193us   140us   185us    29us   152us
# Rows sent      5     103      10      23   17.17   22.53    4.06   17.65
# Rows examine   0   5.87k    1001    1001    1001    1001       0    1001
# Query size     0     474      79      79      79      79       0      79
# String:
# Databases    isubata
# Hosts        app1051
# Users        isucon
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms  ################################################################
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `isubata` LIKE 'image'\G
#    SHOW CREATE TABLE `isubata`.`image`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT * FROM image WHERE name = '78a9228a393eb2621f346fc6a5e099d5bc373f76.png'\G

# Query 2: 1.08 QPS, 0.00x concurrency, ID 0x4866C96BA295C56E at byte 177211
# Scores: V/M = 0.00
# Time range: 2017-10-21T05:46:16 to 2017-10-21T05:50:54
# Attribute    pct   total     min     max     avg     95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         20     301
# Exec time     24   778ms     2ms     9ms     3ms     5ms   966us     2ms
# Lock time     15    12ms    30us   208us    39us    66us    15us    33us
# Rows sent     15     301       1       1       1       1       0       1
# Rows examine  98   2.87M   9.77k   9.77k   9.77k   9.77k       0   9.77k
# Query size    19  17.58k      58      60   59.81   59.77    0.85   59.77
# String:
# Databases    isubata
# Hosts        app1052 (298/99%), app1051 (3/0%)
# Users        isucon
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms  ################################################################
#  10ms
# 100ms
#    1s
#  10s+
# Tables
#    SHOW TABLE STATUS FROM `isubata` LIKE 'message'\G
#    SHOW CREATE TABLE `isubata`.`message`\G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT COUNT(*) as cnt FROM message WHERE channel_id = '733'\G

This table shows the queries time are listed in desacending order of time.

In this case, we should start to improve the queryt time that is SELECT * FROM image WHERE name = '78a9228a393eb2621f346fc6a5e099d5bc373f76.png'\G. For example to improve it, use caching on Nginx.

I went to iOSDC 2017

I went to iOSDC 2017 on September 17th.
This conference held for iOS developers on September from 16th to 17th at Waseda University.

I heard these sessions:

飛び道具ではないMetal
This session told us how to use Metal to render an image on display. What most interesting is that to draw a picture without Metal is faster than with it.

Apple TV – tvOS入門 –

React Native vs. iOSエンジニ

I had no time to listen to all the sessions this time. And I haven’t had knowledge of developing iOS. So There were some sessions I couldn’t make sense. Now, I’ve been developing an iOS app with Swift!

I participated in CEDEC 2017

I participated CEDEC 2017 from August 30th to September 1st in Yokohama. The conference is the first time I went to DECEC that is held every year as far as I know. All participants engage in game developing.

I look at below sessions.

Day 1

Shadowverseにおける「デッキのトレンド分析」を題材としたデータマイニング技術の活用手法紹介

They have tried to analyze card deck’s trends that user in the game use. They use a technique like “edit distance” among different decks.

アイドルマスター シンデレラガールズ ビューイングレボリューション制作事例 ~最高のVRライブ体験に必要となる要素とは~

They created the great experience of a live concert in VR. It is important to render mob’s detail whom body, showing the light on a body it reflected by lighting sticks in VR and mob’s cheers.


Day 2

- 人工知能(メタAI)を用いたゲームデザインの変革
[人工知能学会×CEDECコラボセッション] 人工知能と対話システム ~キャラクターとの自然な会話~
データ分析の固定観念を覆す -ユーザー体験を向上し続けるモンスターハンター エクスプロアのデータサイエンス-

Day 3

遺伝的アルゴリズムによる人工知能を用いたゲームバランス調整

They have been creating the player AI that emulates a game player. The reason to create it that reduce quality assurance costs because a lot of contents in a game are created continuously, and they need to check contents also created continuously. It is easy we could imagine that the contents created which increase the cost of quality check.

キャラクターらしさ学習AI: 多数のキャラクターの個性や違いの可視化によるシナリオライティング支援システム事例

This support a writer in writing scenario for games with a script written by a writer.
The AI learns characteristics of a character in a game and classifies a script by it into several characters. A writer can fix the script with the scores calculated by the AI. The score indicates a likelihood of characters in a game is like this:

Character Name Score
A 10
B 5
C 2

This example indicates that a script might be the A because its core is higher than the others.

クラウドの分析システムを活用したアクセスログにもとづくリアルタイム不正ユーザBANシステム

They have built the system that bans the user request illegal accesses to game API servers in real time.

I went to builderscon tokyo 2017

I went to builderscon tokyo 2017 from August 3rd to 5th.

The eve of the conference

I heard some fascinating talks, but I was told not to tell anyone about them, so unfortunately I can’t write about them.

I think the organizers should keep this event next time.

Day 1

Recognizing Japanese idols with Deep Learning

@sugyan spoke about recognizing Japanese idols with Deep Learning.It helped me to learn about Deep Learning recognizes images.

He gave me some hints on image recognition. Thank you @sugyan.

Building high performance push notification server in Go

@cubicdaiya[https://twitter.com/cubicdaiya] spoke abount a push notification server.

Gaurun might be able to use my project. I’ll look up detailed information about it.

After party

I went to the after party, which was held after the end the day 1. Almost all of the dishes were delicious.

Day 2

Factory Class

Jesse Vincent told me the exciting episode that’s about developing keyboards from scratch.

He said it’s important to meet up with a supplier and keep a good relationship with them if you want to make a good product in China.

Finally

The name card the organizers made was useful for me, and I think the voting system (It uses a QR code) to choose best speakers was convenient for us.

There was only one thing I was dissatisfaction at the lunch session. I think that I and the other audience members were crammed into there, so I think the room was a little small.

But I had a totally enjoyable time at the conference, and I would love to go again!