High Performance Python: Prima Facie

Table of Contents


Once in a while memories of old pet projects (or imagined pet projects) bubble up again. This time it was fighty.

Many years ago friends and I played an online browser game called The Arena (which still exists). The game core was quite simple: a crowd of fighters enters an arena and hack and slay until the last one remains. The server calculated the fights automatically, we as players had then to distribute experience points, buy new weapons etc., and engange in some RPG in the forums.

The game developers programmed in PHP, and on the hardware of that time it took several hours to resolve one fight. A friend and I thought, we could do better ;) and hacked a simple engine in C. It only calculated the bare results, but we were astonished that it finished in minutes (the exact numbers I have forgotten).

Back to the present. Python seems perfect to easily formulate such a simple game engine, and a straightforward project like this seems a good candidate to learn about the several tools for profiling and speeding up Python programs.

Let us sketch the components of the final fighty program:

We will need a roster where all fighters participating at the current brawl are listet with all their attributes. These fighters will enter an arena, which is a geometric area where each fighter has a specific location. This location determines which other fighters may be opponents. We attach also the fight rules to the arena. These rules dictate the sequence in which the fighters can act, how attacks are performed and damage is dealt, and how fighters die.

These components describe the bare fight engine. In the long run, however, especially if the engine is embedded in a bigger game scenario, it will be interesting to analyse the fights in regards to questions like, how did a certain fighter perform?, is significantly raising a specific attribute, like the attack value, favorable to win the fight?, etc. In other words, we want to measure the game balance, and may present our players a backstory of what their fighters had encountered during that fight.

To do this, we need to gather detailed statistics during the fight. A logger, therefore, will be another component of fighty. Now, as I already have implemented and tested all components of fighty, let me state as a fact, not as a mere speculation, that these statistics quickly run into millions of records, and their memory consumption and the time it takes to persist these numbers into a database (including vacuuming!) will overshadow whatever run-time performance we could achieve in the fight engine.

But don’t get disappointed. This is a pet project, we have no tight schedule, no limited budget. Why shall we restrain ourselves to the rational things and optimize the database access only? No, we will do the fun stuff, too, and also profile the bare fight engine.

In a lecture, I firstly would present a simple implementation of the game engine in Python, show how to profile its run-time behaviour and memory consumption, and then present methods to optimize the code, be it Python-intrinsic, or by external means like compilers or libraries written in C.

Eventually I will cover all these topics. But here I’d like to follow more the course I took myself, with a rather extensive deviation to C++.

The story will unfold in a series of articles, for which this one (“prima facie”) will serve not only as an introduction but also as a table of contents [1].

Let me, as a teaser, show you some output of fighty’s current implementation:

$ time ./bin/fighty
(current locale: en_GB.UTF-8; cout: en_GB.UTF-8)
Roster has 1,000,000 fighters.
Roster has 1,000,000 fighters.
New fight: 16 Sat Sep  1 18:11:43 2012

WARNING in 16/204: Shadowboxing detected.

---[ TIMING in SECS ]-------
 0       Connecting to DB              0.00361            100.00000%
 2    Populating fighters              0.32210          8,924.31118%
 4               Fighting              2.20079         60,975.89939%
 6       Ranking fighters              0.11591          3,211.47163%
 8   Saving fight summary              0.00870            240.99590%
10 Saving round summaries              0.05021          1,390.99975%
12 Saving fighter results             28.90984        800,986.78678%
14    Saving fight lineup              0.04124          1,142.61110%
16         Saving attacks             41.51148      1,150,132.54662%
18          Saving reward            337.71233      9,356,783.44403%

Fighters:  1,000,000
Rounds:          215
Hits:      1,523,538
Misses:    1,419,985
Damage:    8,358,963
Deaths:      999,999

Hit/Miss ratio:         1.07293
Hit/Damage ratio:       0.18226
Hit/Deaths ratio:       1.52354
Damage/Deaths ratio:    8.35897

           What         Amount Size     Total Size
      Fighters:      1,000,000   96     96,000,000
    Fight Summ:              1   44             44
    Round Summ:            256   44         11,264
       Attacks:      4,194,304   44    184,549,376

   Fighter-ID Rank Kills Killed in   Hits sent Misses sent Damage sent   Hits rcvd Misses rcvd
      104,010    1    10       216          11         196          55           1         206
      770,436    2     4       215           6         190          28           1         142
      764,840    3    11       214          12         194          62           2         184
      805,327    4    32       213          44         163         219           2         198
      547,414    4    11       213          12         197          66           1         169
      436,166    5    10       212          14         193          64           2         201
      952,111    5    12       212          14         194          72           1         227
      288,828    6     9       211          14         192          71           2         178
      380,960    6     5       211           7         198          31           2         213
      914,644    6     8       211          11         193          49           2         202

real    6m51.450s
user    0m7.092s
sys    0m1.736s

Let me point out some key indicators here:

  • One million fighters participated in that single fight
  • The fight engine took about 2 seconds to calculate the whole fight including all statistics
  • Saving the statistics is what really takes time (as mentioned above)
  • Saving the summaries (~215 records) and attack stats (~4.2 million records) is comparably fast, because we use PostgreSQL’s bulk loading mechanism (copy from CSV file): ~29 seconds and ~42 seconds
  • Rewarding each fighter means, one million times we have to select a single fighter from the database and update its available experience points and gold pieces. This is slow!

And, yeah, I cheated ;) The implementation of fighty which produced above results contains not a single line of Python code, but is written entirely in C++. But, I had good intentions… I started with Python, and in the end will be there again. There are several methods to optimize the run-time behaviour of a Python program; like PyPy [4], a JIT-compiler; Cython [5], a compiler that translates Python code into C; but for now I was more interested to hand code an application in C++, and to design its API so that it can be called from Python (the roster will be that API). The other means I will evaluate too, in separate articles. And finally, there might be some kind of database shootout, to compare the bulk loading time between e.g. SQLite (which is perfectly suitable here since we have no concurrency), MySQL with its different table types, and PostgreSQL. Maybe we also find time to check out a different approach of persisting large amounts of data, HDF5 [7] and PyTables [6].

[1]So, just bookmark this page, and you will not miss a thing ;)