Memory allocation, release to OS - Fragmentation - Python - Elixir - Tcl
This article represents the idea and the difference between developers, scriptwriters and true programmers (actually they should be called software architects next to software engineers).
True PROgrammers understand their environment and how everything
works from low level to high level.
This way they know when it’s futile to try something and when you should
have a workaround. They are the ones that listen to your problem and
already have a solution in mind even if they haven’t written a single
line of code yet. They’ve tested this principle enough times to know
what works and what doesn’t.
The powerful software engineers are not afraid to test it again and if
the tests look grim to change course.
Real Programmers will be saying: Thank you Reverse Engineering for teaching me how stuff actually works!
"People find thinking to be difficult because civilization has never made any attempt to make thinking simpler." Edward de Bono
Somewhere in November 2017 I saw a programmer granting a sum of money to
any experienced developer who could create a long-running script in
Python. One that could run for months.
The script should allocate a list of 1000 integers, float and ,
deallocate data and not use up the memory. This should be done under
raspberry pi linux.
Note | I am not an expert when it comes to Python. The only reasons I
learned python was to use the great penetration testing tools that where
developed. Together with the power Blender 3D offers when using
python. My main choices for comparison are always C, Tcl and Elixir. Since I’ve discovered Elixir I only use the Port module to access certain functionalities not provided by Elixir or Erlang itself. This works great with Python for Data Science. This didn’t stop me from thinking. |
Of course just reading the offer instantly made me realize that the
person in question was having problems with the script not releasing the
memory to the operating system.
This is true and happens even after you have freed up the memory and
even after the garbage collector did its job.
Developers and programmers began offering solutions instantly without understanding what the person needs. The offer started out at 1000 EURO and ended at 5000 EURO.
I just posted something explaining the futility of the project and no
solution would be good without modifications to the underlying system.
Asking what his problem really was?
That the best viable solution should be to use a Supervisor model with
subprocesses.
In the end no one got the 5000 prize. Yet, I had seen a certain
blindness caused by a sum of money given for a simple thing?
How could those programmers be so ignorant to how the underlying system
works? Even if I’m not an expert in Python it wasn’t hard for me to
understand the problem he had even if he didn’t mention it explicitly.
What did he mean?
Can you read? Can you Think?
This reminded me of the one time I was a project manager for a project
on various freelance websites.
I did the project management because I was too busy working on other
projects and couldn’t do the programming, hence the client needed it
pretty fast and wanted me to control the quality of things.
After I had analyzed the problem and written a detailed 2 page
explanation of what needs to be done and WHAT was to be avoided, I
posted it so others could bid.
What happened in the following days made me change my perception and understand that in order to be a true PROgrammer you need knowledge of the whole underlying system.
To my surprise NONE of the initial people who posted bids even read the
requirements.
They just posted bids without even understanding the problem and the
environment.
For me that was flabbergasting!
In all my time as a software PROgrammer(or better put engineer) I’ve
always approached this problem in an unique way it seems!
I always analyzed the requirements. I did a quick simulation in my head
of how a solution would look from a technical standpoint and how
feasible it was.
Then I start going deeper into asking questions so I get a firm grip of
what is actually needed.
After that I start analyzing everything. Doing some research and
depending on project, I try to do a proof of concept, prototype, or MVP
in under a day.
If I can’t do the MVP in 2-3 days then the project either has a problem
or the customer needs something else or has a different problem/
Then i start thinking about the implications, searching for potential
solutions and when I’m done with my research can I give an answer and a
offer.
Nope, those people just said they can do it like this. Clap of the
hands. How the heck did they knew they could do it so fast? They didn’t
even get to know what the user needs?!
I started asking some questions of how they think they would implement
it.
Funny, responses included the aspects mentioned in the requirements as
TO BE AVOIDED. It’s as if they didn’t even bother to read the
descriptions.
Fellow users, programmers and managers. This is how things really work.
Even with the thousands of the meetings, you all do each day. This is
the end result.
2 disputes later we found our match, someone who already did something similar in the past.
+
Understanding the problem and environment
First we need to understand how the memory model works.
Back to our python script which should be a long running script that
doesn’t eat up memory.
Whenever a new process is created the operating system and underlying
kernel use something called Virtual memory Mapping.
This allows EACH process to have up to 4 GB of memory on 32 bit systems
and more than that on 64 bit.
The OS/KERNEL handles the mapping from virtual to real.
Understand that it has to do the mapping for ALL processes running.
Which in time leads to FRAGMENTATION of memory. C and C++ have these
issues so this is normal that dyamic languages have it too.
https://www.design-reuse.com/articles/25090/dynamic-memory-allocation-fragmentation-c.html
Facebook encountered a similar issue in the past, i’ll talk about this
later.
Next to this the Operating System has a few tricks up it’s sleeve to
protect the memory from unauthorized hacking. These tricks were
implemented in the past 20 years so they may lead to more
fragmentation.
ASLR - Address Space Layout Randomization - It tries to randomize the
space to make it harder for the same program to be run multiple times
and have the same memory locations.
DEP/NX - The No Execute bit which segregates areas of memory for storage
and execution.. Making sure certain portions are NOT executable. This
means protecting against ROP and stuff.
stack canary - This involves modifying the prologue and epilogue regions
of all functions so to make it harder to do stack buffer overflow
attacks.
Other various protections such as fortify_source, RELRO and PIE ..
Should a programmer know about these even if he’s not into Reverse
Engineering or cyber security? Yes, in theory, it will help him/her
become a better PROgrammer.
All these protections make it much harder for bad things to happen in bad code. But at the same time, it makes the allocation/deallocation of memory a more difficult process.
Whenever we add things to the heap via pointers (strings, data objects)
a reference is created to the actual data. But both the data and the
reference occupy memory.
In a multi tasking environment this happens for all the processes,
threads running.
Whenever we try to add random data to a process’s memory for 1-2GB in an
unaligned way this will increase the process memory usage.
When we free the memory and delete everything and even let the GC do
it’s work we will still have a huge portion of memory that is NOT free
in certain environments.
WHY?
It’s because of memory fragmentation which takes place in the real
memory.
Even after you’ve freed your memory, the 1GB of data will probably be
fragmented all across your RAM and unless you restart your app there is
no way to release it.
For C and C++ there is a solution.. it’s called jemalloc.
Certain Operating Systems and software packages had this problem too.
Facebook started using a different memory allocator called jemalloc just
to try to fix this issue:
https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919/
Examples Please
Let me illustrate this with a few Python examples that you can run to
test it out yourself.
I’m using Debian 9 with Python 2.7.13 and Python 3.5.4. The results may
vary if you use different versions so beware.
I know from previous examples that scripting languages tend to have this issue. I’ve had the same issue in Tcl in the past, the only solution was a restart.
First of all Python 2.7 does have big issues on this point, while 3.5 has less. But we will see how even 3.5 will succumb to failure when random data is accessed.
A simple example (note it will take up to 1.7 GB on your pc!) I’ve did tests that took up to 20 GB of ram for testing, if you have the ram please do the more memory intensive tests to see for yourself.
List = range(50*1024*1024) del List
This process gets to around 1.6 GB. After we use the del List, the
process goes back to 1.2GB.
But why is there such a gap? Well, the allocation and pointers are
deleted but the fragmentation still exists as mentioned earlier. This is
the problem our developer had in the beginning.
Testing the same thing in 3.5.3 we can see that the memory is used but
after we use del
we go back to 9-10 MB.
Does this mean that 3.5 solved this problem? For some instances YES. But
for random data the problem still remains.
In the following example we will use Python 3.5.
We will create one million strings randomly allocated at 1KB (1024
bytes) which we repeat a few random times.
#NOTE: This can use up to a few GB of RAM.import randomimport stringimport timeimport gcgc.enable()#time.sleep(60)def random_string(size=6, chars=string.ascii_uppercase + string.digits):return ''.join(random.choice(chars) for _ in range(size))number = []string = []for x in range(0, 100000):if (x%20000==0):\tprint("We're at ", (x/10000) , " %")number.append(x * random.randrange(100,10000) + x)string.append(random_string(1024)*random.randrange(10,50))#string.append("Fragmentation of memory together with memory alignment and \\#virtual memory mapping will for sure screw up stuff if you use random data and not something Python can control like pointers to the same data fragment"*random.randrange(1,10)))print("Great We're done!")#NOTE A sleep of 1 minute makes it evidenttime.sleep(5)del numberdel stringprint("Deleted everything!")#while True:#\t0gc.enable()gc.collect()
+
By running this example you will see an increase in memory to a few GB
and then when we free everything a small decrease.
Even though the data doesn’t exist anymore after it’s freed the
fragmentation seems to have taken it’s toll. If we would analyze the
memory (well, it could actually still exists since nothing writes it
over since there is no reference anymore)
Is the data NOT released to the Operating system?
By running this simple example we can then go to to a very important
aspect.
Why do you think that web servers start new threads/processes for every
request?
Or more modern they have a pool of processes and threads which get
restarted once a variable number of connections have went through
(1000-10000 connections)
We can observe that memory fragmentation through allocation deallocation
is one of the biggest issues in long running systems.
Even if you optimize and review everything you will certainly require a
process restart.
I’ve even forced garbage collection and it still failed.
What solutions are there?
We certainly have multiple solutions for this problem.
Depending on the programming language and the context we can do one of
several things.
One of the best things is to have a supervisor that then spawns a new
process that runs the memory intensive stuff.
Whenever we reach a certain limit or a time has passed we can spawn a
new process, move the state through IPC (inter process communication)
and close the old process.
This way we can be sure we have a running system and we don’t lose the
state.
If this is to difficult to implement you can then restart the process losing data and waiting some time.
Functional Versus Imperative
Scaling this solution in Imperative programming languages (Tcl, C,
Python, Perl, Java) proves to be extremely difficult for the smallest
tasks.
You will waste time writing code to handle state because imperative
paradigms are different.
I’ve personally found that functional languages like Elixir and Erlang can solve this issue quite fast and easy with their high concurrency system.
The Supervisor functions from Elixir actually make it pretty easy to do
the spawning.
It’s actually what happens whenever a process crashes and is one of the
great hallmarks of the Erlang BEAM!
Switching to a functional programming language means relearning new patterns of thinking. This will make it worth your time since you will gain the power of concurrency and fault tolerance.
Tcl version
I did some similar tests with Tcl.
There is no automatic garbage collector available
We;re stuck with a big runtime unless we explicitly free everything
ourselves.
Using the C Api of Tcl means we need to be watchful of everything.
A Tcl only version grows the data to 2.6 GB and then when released we’re left with 249 MB of data… still a lot considering that a new Tcl process requires under 10 MB of ram.
proc rnd {min max} {expr {int(($max - $min + 1) * rand()) + $min}} proc generateCode {length} {set string "azertyuiopqsdfghjklmwxcvbnAZERTYUIOPQSDFGHJKLMWXCVBN0123456789"set code ""set stringlength [expr {[string length $string]-1}]for {set i 0} {$i<$length} {incr i} {\tappend code [string index $string [rnd 0 $stringlength]]}return $code}set numbers(0) 0set max 1000000for {set i 1} {$i<$max} {incr i} {if {[set ok [expr {$i%($max/100)}]] == 0} { puts "$ok procent complete" }#\tset data($i)append data [string repeat "Intriguing I might add" [rnd 64 128]] set numbers($i) [expr {$i*$i+$i+[rnd 1000 1000000]}]}puts "OK done.."after 5000unset dataarray unset numbersputs "Deleted data"
C Version
A C version is futile since we know what free does. I’ll leave the examples to you to do and see what happens:)
Elixir
Let\\s see how Elixir and and the BEAM handles things, shall we?
do_it = fn -> String.duplicate("Certainly, fascinating my dear fellow!", 1024*1024*512)enddo_it.()
My system reports 19 GB used by BEAM.
Not going away even if the strings where left unused.
Let’s see if the garbage collector can do something.
:erlang.garbage_collect(self())
:erlang.system_flag(:fullsweep_after, 10)
Let’s start the built in observer to see what’s happening
:observer.start()
Seems there are 19 GB of binaries
Doing it the Elixir way just for showing off
defmodule HeavyLifting do def heavy_stuff() do max = 1024*1024 Enum.map(1..max, fn (x) -> {:crypto.strong_rand_bytes(1024), :rand.uniform(100000)} end) |> Enum.max() {:ok, "Cool"} endendHeavyLifting.heavy_stuff()
Beam grows to a few GB (1.4 GB in this example)
Sometimes it decreases automatically, other times we need to run the
garbage collector ourselves and then it instantly decreases and sits at
around 90 MB. Given that a clean iex uses around 40 MB i’d say this is a
win since we don’t have hundreds of MB lingering around. Think about the
fact that we had REAL random strong bytes which couldn’t be reused!
We can force the garbage collector and it will go down instantly
:erlang.garbage_collect()
Of course using a functional programming language won’t solve ALL your
problems.
Erlang (Elixir inherits this) have some issues with certain binaries
which won’t seem to go away as fast as you would like them.
Conclusion
Bad programming styles eat up memory as much as random data..
You can also try a version where we have a static string which is
repeated multiple times.
Even after release we see that we still have 600 MB remaining in certain
cases.
Understanding what Linux does with caching and data:
https://www.linuxatemyram.com/
Details and more info
Wikipedia
Google
https://security.stackexchange.com/questions/20497/stack-overflows-defeating-canaries-aslr-dep-nx
Cover image by Markus Spiske from Pixino.com