Malware is harder to find when written in obscure languages • The Register

Malware is harder to find when written in obscure languages • The Register

03/29/2025


Malware authors looking to evade analysis are turning to less popular programming languages like Delphi or Haskell.

Computer scientists affiliated with the University of Piraeus and Athena Research Center in Greece and Delft University of Technology in the Netherlands have taken a look at recent malware to better understand why some of it gets missed by static analysis – a software testing technique for understanding code without executing it.

The authors – Theodoros Apostolopoulos, Vasilios Koutsokostas, Nikolaos Totosis, Constantinos Patsakis, and Georgios Smaragdakis – describe their findings in a preprint paper titled, “Coding Malware in Fancy Programming Languages for Fun and Profit.”

There is a lot of malware – almost 26 million new instances of malicious code just in 2025, according to antivirus evaluators AV-TEST. And one of the main ways to identify bad code is static analysis.

Malware authors know this and many make an effort to obfuscate their code or to apply anti-sandboxing or anti-debugging techniques.

One way to do so is simply to use a programming language that’s not widely used for malware, which tends to be written in C or C++.

“For years, ransomware groups have been switching to newer, unconventional languages to make reverse engineering and detection more difficult,” the authors observe. “Moreover, various threat actors have used this approach, employing a wide range of programming languages and techniques to obfuscate their malicious code.”

They point to how security researchers hated Visual Basic 6 binaries due to the complexity of reverse engineering the software, the presence of a Lua obfuscation layer in the 2012 Flame malware, and the Grip virus, which contained a Brainfuck interpreter coded in Assembly to generate its keycodes, as examples.

“Even though malware written in C continues to be the most prevalent, malware operators, primarily known threat groups such as APT29, increasingly include non-typical malware programming languages in their arsenal,” they write.

“For instance, APT29 recently used Python in their Masepie malware against Ukraine, while in their Zebrocy malware, they used a mixture of Delphi, Python, C#, and Go. Likewise, Akira ransomware shifted from C++ to Rust, BlackByte ransomware shifted from C# to Go, and Hive was ported to Rust.”

To some extent, this is simply a variation on security through obscurity – when fewer people are familiar with a given language, less manual detection can be expected and automated tools will have fewer samples.

But automated detection mechanisms based on signatures of identified malware won’t work when the malware has been rewritten in a different language. And some languages like Haskell and Lisp, the researchers note, employ an execution model that differs from malware developed in C. Others, like Dart and Go, may add a large number of functions to the executable as part of their standard environment, making even simple programs complicated.

To better understand why certain languages resist analysis better than others, the authors examined a set of almost 400,000 Windows executables from Malware Bazaar.

They found not only that the programming language used affects the malware detection rate but also that the choice of compiler makes a difference. “While one would expect less used programming languages, e.g. Rust and Nim, to have worse detection rates because the sparsity of samples would not allow the creation of robust rules, the use of non-widely used compilers, e.g. Pelles C, Embarcadero Delphi, and Tiny C, has a more substantial impact on the detection rate,” they state.

After looking at a more limited dataset focused on APTs (advanced persistent threats), the researchers say it’s clear that over time, APT authors have diversified their choice of programming languages and compilers.

One of the ways the boffins examined malware differences across programming languages involved assessing how well the binaries resisted shellcode pattern matching – the process of looking for malicious sets of instructions.

The results showed significant variations across languages and underscored why malware is easier to find in more common languages. “Samples written in languages such as C and C++ retained, usually, all shellcode bytes in sequential order or had a fixed gap between the bytes, leading to relatively straightforward detection,” the authors say. “However, other languages demonstrated significant byte fragmentation and variations in memory layout, complicating static detection.”

They cite Rust, Phix, Lisp, and Haskell as languages that distribute shellcode bytes irregularly or in non-obvious ways.

There are other reasons that less popular languages can make malware more difficult to identify, such as the complexity of the executed functions, the number of indirect calls and jumps executed, and the number of threads spawned.

“Malware is predominantly written in C/C++ and is compiled with Microsoft’s compiler,” the authors conclude. “However … our work practically shows that by shifting the codebase to another, less used programming language or compiler, malware authors can significantly decrease the detection rate of their binaries but simultaneously increase the reverse engineering effort of the malware analysts.”

Thus, the authors argue, code in less popular programming languages deserves more attention in the security community and more relevant detection tools. ®

You May Also Like…

0 Comments