Note
This article is part of an ongoing series about profiling Python programs that starts here [4].
Note
This text is a draft and may be subject to changes.
There are two things we can profile, execution time and memory consumption.
Profiling Execution Time, Part I
From the Linux command line we can profile the execution time of any program with the time command:
$time ./main.py # This starts fighty real 0m14.098s user 0m13.937s sys 0m0.132s
To implement fighty, I first wrote the roster. It is a regular Python class which has a list fighters as member, and will gradually be filled with the API methods. One method we need here is populate(), which, ah, populates the list with fighters:
class Roster(object):
def __init__(self):
self.fighters = None
def populate(self, n):
self.fighters = []
for i in range(0, n):
self.fighters.append(Fighter())
As data structure for a fighter I chose a class. Why this choice is not so bad at all, even memory-wise, will become evident later on.
So, each fighter is an instance of class Fighter and its attributes are initialized with random values:
class FightData(object):
def __init__(self):
self.ini = None
self.pos = None
self.hp = None
self.killed_in_round = None
self.kills = None
self.rank = None
self.opp = None
self.hits_sent = None
self.misses_sent = None
self.dmg_sent = None
self.hits_rcvd = None
self.misses_rcvd = None
class Fighter(object):
ID_SEQ = 1
def __init__(self, **kw):
self.fight_data = FightData()
if not kw:
self.populate()
else:
for k, v in kw:
setattr(self, k, v)
if self.id > self.__class__.ID_SEQ:
self.__class__.ID_SEQ = self.id
def populate(self):
self.id = self.__class__.ID_SEQ
self.__class__.ID_SEQ += 1
self.hp = 5
self.ini0 = 3
self.ini1 = 6
self.att0 = 6
self.att1 = 9
self.dfn0 = 5
self.dfn1 = 8
self.dmg0 = 3
self.dmg1 = 6
ap = 10
while ap > 0:
r = randint(0, 8)
ap -= 1
if r == 0:
self.ini0 += 1
elif r == 1:
self.ini1 += 1
elif r == 2:
self.att0 += 1
elif r == 3:
self.att1 += 1
elif r == 4:
self.dfn0 += 1
elif r == 5:
self.dfn1 += 1
elif r == 6:
self.dmg0 += 1
elif r == 7:
self.dmg1 += 1
elif r == 8:
self.hp += 1
I was curious how Python would perform when I populate the roster with, say, 100,000 fighters. The result you see in the time example above, — and I must say, I am somewhat disappointed. Python took way longer as expected, and that only to populate the list in memory. Also my memory gauge was ascending noticably!
The latter phenomenon lead me to abandon profiling execution time at this point and rather look at the memory consumption. We will return to this topic later.
Profiling Memory Consumption
Memory profiling in Python in my opinion is a really rocky subject. The standard library does not contain any module to help here, there is only sys.getsizeof(), which is okay for simple data types, but will lie to you about more complex ones like dicts or classes.
More elaborate is Guppy [5], almost a suite, which contains a module heapy to measure the memory consumption. But documentation sucks. I was not able to figure out how to use it until I found this tutorial [5]. And its biggest issue is that Guppy seems not to be maintained any more, the latest package on PyPI dates from 2008 (Python 2.6). I was able to compile it for Python 2.7, but on Python 3.2 compilation fails.
Then there is Pympler [6], another module to analyse memory consumption. It seems rather modern, works with Python 3.2, but—documentation sucks. Anyways, we will use it here.
Here is a simple test script that, based on Pympler, prints out the used memory for several simple and more complex data types.
import sys
import random
from pympler.asizeof import asizeof, leng
class Fighter_double:
def __init__(self):
self.ini = random.randint(1, 10**7) * 1.0
self.att = random.randint(1, 10**7) * 1.0
self.dfn = random.randint(1, 10**7) * 1.0
self.dmg = random.randint(1, 10**7) * 1.0
self.hp = random.randint(1, 10**7) * 1.0
class Fighter(object):
def __init__(self):
self.ini = random.randint(1, 10**7)
self.att = random.randint(1, 10**7)
self.dfn = random.randint(1, 10**7)
self.dmg = random.randint(1, 10**7)
self.hp = random.randint(1, 10**7)
class Fighter_slots(object):
__slots__ = ['ini', 'att', 'dfn', 'dmg', 'hp']
def __init__(self):
self.ini = random.randint(1, 10**7)
self.att = random.randint(1, 10**7)
self.dfn = random.randint(1, 10**7)
self.dmg = random.randint(1, 10**7)
self.hp = random.randint(1, 10**7)
def p(s, v):
v1 = asizeof(v)
v1b = sys.getsizeof(v)
try:
v2 = len(v)
except TypeError:
v2 = '-'
v3 = leng(v)
if v3 is None:
v3 = '-'
print("{:25} | asizeof={:5n} | sizeof={:3n} | len={:>5} | leng={:>5}".format(s, v1, v1b, v2, v3))
def pe(s, v, e, v0):
l1 = len(v)
l2 = leng(v)
se = asizeof(e)
sv = asizeof(v)
sv0 = asizeof(v0)
s_per_e = int(0.5 + (sv - sv0) / l1)
e_diff = int(0.5 + s_per_e - se)
print(s)
print(" len: {:10n}".format(l1))
print(" leng: {:10n}".format(l2))
print(" Size total: {:10n}".format(sv))
print(" Size empty: {:10n}".format(sv0))
print(" Element size: {:10n}".format(se))
print(" Size per element: {:10n}".format(s_per_e))
print(" Element size diff: {:10n}".format(e_diff))
def create_fighters():
i = 12
d = 4.5
s0 = ""
s = "übung"
s1 = "1"
s2 = "ü"
di0 = {}
di = {'foo': 1}
di1 = {'foo': 1, 'bar': 2}
di2 = {'foobar': 1}
li0 = []
li1 = [1]
li2 = [1,2,3]
tp0 = tuple()
tp1 = (1,)
tp2 = (1,2,3)
f_double = Fighter_double()
f = Fighter()
f_slots = Fighter_slots()
p("INT", i)
p("DOUBLE", d)
p("STRING ('')", s0)
p("STRING ('übung')", s)
p("STRING ('1')", s1)
p("STRING ('ü')", s2)
p("DICT (empty)", di0)
p("DICT (1 INT)", di)
p("DICT (1 INT long key)", di2)
p("DICT (2 INTs)", di1)
p("LIST (empty)", li0)
p("LIST (1 INT)", li1)
p("LIST (3 INTs)", li2)
p("TUPLE (empty)", tp0)
p("TUPLE (1 INT)", tp1)
p("TUPLE (3 INTs)", tp2)
p("Fighter double", f_double)
p("Fighter", f)
p("Fighter slots", f_slots)
print()
n = 10**2
# Careful to not generate duplicate items in list. asizeof() will count
# references to the same item only once!
a = random.sample(range(n), n)
pe("LIST of INTs", a, a[0], [])
print()
b = tuple(a)
pe("TUPLE of INTs", b, b[0], tuple())
print()
a = [ Fighter() for i in range(n) ]
pe("LIST of Fighters", a, a[0], [])
print()
a = [ Fighter_slots() for i in range(n) ]
pe("LIST of Fighters w/ slots", a, a[0], [])
print()
a = [ [random.randint(1,10**7)
, random.randint(1,10**7)
, random.randint(1,10**7)
, random.randint(1,10**7)
, random.randint(1,10**7) ] for i in range(n) ]
pe("LIST of LIST w/ 5 INTs", a, a[0], [])
print()
a = tuple([ (random.randint(1,10**7)
, random.randint(1,10**7)
, random.randint(1,10**7)
, random.randint(1,10**7)
, random.randint(1,10**7) ) for i in range(n) ])
pe("TUPLE of TUPLE w/ 5 INTs", a, a[0], [])
if __name__ == '__main__':
import locale
locale.setlocale(locale.LC_ALL, "")
create_fighters()
We find that a simple integer in Python takes up 32 bytes, a double 24 bytes:
INT | asizeof= 32 | sizeof= 28 | len= - | leng= 1 DOUBLE | asizeof= 24 | sizeof= 24 | len= - | leng= -
And an instance of a class with a handful of integer attributes, like a fighter, uses 824 bytes. Each instance! (Using doubles instead of integers does not gain much.):
Fighter double | asizeof= 824 | sizeof= 64 | len= - | leng= - Fighter | asizeof= 864 | sizeof= 64 | len= - | leng= -
asizeof() calculates the size of the object including its items, where-as sys.getsizeof() only returns the size of the object itself. In a similar fashion does leng() count the real amount of items, even if they are not used, where-as len() only counts the used items.
‘Unused items’ result from the fact that for speed reasons Python allocates memory not for each single item as you add it, but fetches a bigger memory block that can contain several items. If you add less items than may fit into that block, the memory is still occupied:
LIST of INTs len: 100 leng: 118 Size total: 4,064 Size empty: 72 Element size: 32 Size per element: 40 Element size diff: 8
An empty list occupies 72 bytes, 118 ints (118 reserved, 100 used) need 3,776 bytes. The above result names as total size 4,064 bytes, which means 216 bytes are unaccounted for.
Let us now look at fighty again, how much memory the roster and its fighter list occupies:
$ time ./main.py Locale: ('en_GB', 'UTF-8') 100,000 fighters enlisted in roster. ============== Memory Profile ============== Roster | asizeof=226,426,808 | sizeof= 64 | len= - | leng= - Fighters len: 100,000 leng: 112,506 Size total: 226,426,368 Size empty: 72 Element size: 4,344 Size per element: 2,264 Element size diff: -2,079 One Fighter | asizeof=4,344 | sizeof= 64 | len= - | leng= -
One fighter uses 4,344 bytes, should not 100,000 of them occupy 434,400,000 bytes? Pympler only counts memory used by unique objects: maybe there are fighters which have the same attribute values… I do not know how to find that out reliably.
Anyways, the list of 100,000 fighters, remember, this is just the list, we have not calculated any fighting data yet, takes up 226 MB RAM. Ouch! Populating one million fighters I was too short tempered to wait for the results — my developer machine almost started to swap, and it has 16 GB RAM.
Slots to save the day…
…but the nights are still troubled…
We can reduce the needed memory for a fighter (and thus for any class) significantly, if we use slots [1] instead of regular attributes:
Roster | asizeof=32,025,224 | sizeof= 64 | len= - | leng= - Fighters len: 100,000 leng: 112,506 Size total: 32,024,784 Size empty: 72 Element size: 784 Size per element: 320 Element size diff: -463 One Fighter | asizeof= 784 | sizeof=136 | len= - | leng= -
And here is proof that a class with slots is really not a bad container for data. Compare its memory consumption with that of a list or tuple:
LIST of Fighters len: 100 leng: 118 Size total: 51,680 Size empty: 72 Element size: 864 Size per element: 516 Element size diff: -347 LIST of Fighters w/ slots len: 100 leng: 118 Size total: 25,816 Size empty: 72 Element size: 344 Size per element: 257 Element size diff: -86 LIST of LIST w/ 5 INTs len: 100 leng: 118 Size total: 28,120 Size empty: 72 Element size: 272 Size per element: 280 Element size diff: 8 TUPLE of TUPLE w/ 5 INTs len: 100 leng: 100 Size total: 26,456 Size empty: 72 Element size: 256 Size per element: 264 Element size diff: 8
What surprises me here is that the total size of a list of clots [2] is smaller than a list of lists or a tuple of tuples, even if the size of a single clot is larger than that of a similar list (344 bytes to 272 bytes). Like above, I am not quite sure if this is a result of Pympler only counting unique objects or, some hideous optimization of Python itself [3].
More about profiling execution time in a later article.
[1] | Keep in mind that slots are way less than a panacea, google has a lot of discussions about their shortcomings. |
[2] | Classes with slots |
[3] | The oncoming Python 3.3 will have key-sharing dictionaries [7] for classes. |
[4] | ../high-performance-python-prima-facie.html |
[5] | (1, 2) http://pypi.python.org/pypi/guppy/ |
[6] | http://packages.python.org/Pympler/ |
[7] | http://docs.python.org/dev/whatsnew/3.3.html#pep-412 |