ESET researchers bring to light unique obfuscation techniques discovered in the course of analyzing a new cryptomining module distributed by the Stantinko group’s botnet
In the new cryptomining module we discovered and described in our previous article, the cybercriminals behind the Stantinko botnet introduced several obfuscation techniques, some of which have not yet been publicly described. In this article, we dissect these techniques and describe possible countermeasures against some of them.
To thwart the analysis and avoid detection, Stantinko’s new module uses various obfuscation techniques:
Obfuscation of strings – meaningful strings are constructed and only present in memory when they are to be used Control-flow obfuscation – transformation of the control flow to a form that is hard to read and the execution order of basic blocks is unpredictable without extensive analysis Dead code – addition of code that is never executed; it also contains exports that are never called. Its purpose is to make the files look more legitimate to prevent detection Do-nothing code – addition of code that is executed, but that has no material effect on the overall functionality. It is meant to bypass behavioral detections Dead strings and resources – addition of resources and strings with no impact on the functionality
Out of these techniques, the most notable are obfuscation of strings and control-flow obfuscation; we will describe them in detail in the following sections.
All the strings embedded in the module are unrelated to the real functionality. Their source is unknown and they either serve as building blocks for constructing the strings that are actually used or they are not used at all.
The actual strings used by the malware are generated in memory in order to avoid file-based detection and thwart analysis. They are formed by rearranging bytes of the decoy strings – those embedded in the module – and using standard functions for string manipulation, such as strcpy(), strcat(), strncat(), strncpy(), sprintf(), memmove() and their Unicode versions.
Since all the strings to be used in a particular function are always assembled sequentially at the beginning of the function, one can emulate the entry points of the functions and extract the sequences of printable characters that arise to reveal the strings.
Control-flow flattening is an obfuscation technique used to thwart analysis and avoid detection.
Common control-flow flattening is achieved by splitting a single function into basic blocks. These blocks are then placed as dispatches into a switch statement inside of a loop (i.e. each dispatch consists of exactly one basic block). There is a control variable to determine which basic block should be executed in the switch statement; its initial value is assigned before the loop.
The basic blocks are all assigned an ID and the control variable always holds the ID of the basic block to be executed.
All the basic blocks set the value of the control variable to the ID of its successor (a basic block can have multiple possible successors; in that case the immediate successor can be chosen in a condition).
There are various approaches to resolving this obfuscation, such as using IDA’s microcode API. Rolf Rolles used this method to identify these loops heuristically, extract the control variable from each flattened block and rearrange them in accordance with the control variables.
This – and similar – approaches would not work on Stantinko’s obfuscation, because it has some unique features compared to common control-flow-flattening obfuscations:
Code is flattened on the source code level, which also means the compiler can introduce some anomalies into the resulting binary The control variable is incremented in a control block (to be explained later), not in basic blocks Dispatches contain multiple basic blocks (the division may be disjunctive, i.e. each basic block belongs to exactly one dispatch, but sometimes the dispatches intertwine, meaning that they share some basic blocks) Flattening loops can be nested and successive Multiple functions are merged
These features show that Stantinko has introduced new obstacles to this technique that must be overcome in order to analyze its final payload.
Control-flow flattening in Stantinko
In most of Stantinko’s functions, the code is split into several dispatches (described above) and two control blocks — a head and a tail — that control the flow of the function.
The head decides which dispatch should be executed by checking the control variable. The tail increases the control variable by a fixed constant and either goes back to the head or exits the flattening loop:
Stantinko appears to be flattening code of all functions and bodies of high-level constructs (such as a for loop), but sometimes it also tends to choose seemingly random blocks of code. Since it applies the control-flow-flattening loops on both functions and high-level constructs, they can be naturally nested and there happen to be multiple consecutive loops too.
When a control-flow-flattening loop is created by merging code of multiple functions, the control variable in the resulting merged function is initialized with different values, based on which of the original functions is called. The value of the control variable is passed to the resulting function as a parameter.
We overcame this obfuscation technique by rearranging the blocks in the binary; our approach is described in the next section.
It’s important to note that we observed multiple anomalies in some of the flattening loops that make it harder to automate the deobfuscation process. The majority of them seem to be generated by the compiler; this leads us to believe that the control-flow-flattening obfuscation is applied prior to compilation.
We witnessed the following anomalies; they can appear separately or in combination:
Some dispatches can be just dead code – they will never be executed. (Examples in the section “Dead code inside the control-flow-flattening loop” below.) Basic