What is Stegomalware? Information hiding-capable malware and the European answer: the SIMARGL project

What is Stegomalware? Information hiding-capable malware and the European answer: the SIMARGL project

By Matteo Mauri, Igino Corona, Davide Ariu

Stegomalware (or stegware) is a particular and sophisticated type of malware (malicious / unauthorized software) that uses steganography to evade detection and secretly exchange information.

Steganography was already used in ancient Greece and defined in the glossaries towards the end of the fifteenth century. In essence, it is a secret mechanism for encoding information through any means of transmission. Both the encoding and the transmission medium are secret, that is, known only to the parties who intend to communicate in an occult way.

In this sense, it differs from cryptography, in which the encoding of information and the transmission medium are generally known (e.g., the HTTPS protocol used by this website). In this case, the encoding mechanism makes the extraction of clear-text data (extremely) difficult without the knowledge of additional information, known as encryption / decryption keys. These keys are known only to the authorized communication parties (for example, your browser and our web server).

In the digital steganography the transmission medium is digital: for example, a file (images, videos, documents, etc.) or a communication protocol (network traffic, data exchanged inside a computer, etc.). The stegomalware works by using digital steganography to hide confidential data and / or malicious code, then to extract them and (in the case of the code) execute them dynamically. It is considered one of the most sophisticated and stealthy ways of obfuscation.
The size and characteristics of multimedia files are fertile ground for steganography: for instance, the pixels of an image or video can be slightly altered to embed secret information without any noticeable visual difference.

In academic research, the term "steganography" is often viewed as a subfield of information hiding, which aims to hide secret data within a suitable courier. For simplification purposes, however, malware that uses various types of information concealment techniques is defined stegomalware.
The term "stegomalware" was introduced by researchers in the context of mobile malware and presented at the Inscrypt conference (Beijing, China) in 2014. However, the fact that (mobile) malware could potentially use steganography was already noted a few years earlier: the use of steganography was initially applied to botnets communicating over probabilistically unobservable channels. And even earlier is the use of steganography to convey malware on non-mobile devices. But let's proceed with order.

Due to improvements in countering cybercrime, methods of hiding data have gradually gained increasing attention from actors such as cyber criminals, terrorists and cyber spies. They are also attracted by the fact that steganography or other information concealment techniques can be treated as an authentic enhancement to convey malware. And here, the combination of attractive factors increases exponentially if steganography and cryptography are combined.  Over the past decade, many threats and malware have improved their ability to remain undetected at different stages of the attack, such as during data exfiltration, host infection or communication through a Command & Control (C&C channel) for a remote server.

But there is no need to elaborate (or talk about) advanced obfuscation techniques. The basic techniques also work very well, according to the expert researchers of the SIMARGL project, a new instrument financed by the European Commission to combat stegomalware (we’ll discuss about it in the final part of this article).

The evolution of stegomalware can be briefly summarized as follows. Initially, information hiding techniques were implemented in Advanced Persistent Threats (APT) only. However, given their effectiveness, they are slowly becoming the de facto standard for "ordinary" malware as well. Today these techniques can be divided into five main groups:

•    Malware using modifications to digital media files
Currently, one of the most common ways for hiding data is to use digital media files as the secret carrier. The most common technique exploits digital images to: (i) conceal malware settings or a configuration file; (ii) provide the malware with a URL from which additional components can be downloaded from; (iii) store the whole malicious code directly.
Often, these types of malware (e.g. Trojan.Downbot and Duqu) are spread through phishing campaigns. Once the victims' computers are infected, they sometimes manage to generate backdoors and then download, from files that appear as legitimate HTML pages or JPEG images, some executable code from remote servers. Other types of malware manage to spread through websites using <iframes>, or even to hide settings in favicons, which are completely harmless images that appear in our browsers tabs. Through sophisticated procedures, these malware use some bits of a digital image to reconstruct a previously embedded URL, which allows to download configuration files (of malicious code) or additional software components.
Attacks on e-commerce platforms were not lacking: through these attacks, certain types of malware collected the details of payments and transactions, hiding them inside images of real products available on the infected e-commerce sites. By downloading such modified images, an attacker can easily exfiltrate stolen data.

•    Malware posing as other legitimate applications or mimicking their traffic behaviour
In this case, the malware relies on the mimicry of legitimate programs and/or their communications. A paradigmatic example is a variant of Android/Twitoor.A, a malware spreading by SMS or via malicious URLs. The malware impersonates a porn player app or an MMS application but without having their functionality, eventually tricking the user to install them and spread the infection.
Other applications have demonstrated the ability to record several seconds of normal and legitimate traffic and therefore to use it as a smokescreen (i.e. malicious commands are masked using legitimate ones). This operation allows an attacker to modify a controlled process without generating security warnings in the user's system.

•    Information hiding usage in ransomware
From 2016 to the present day there have been several cases (TeslaCrypt, Cerber, SyncCrypt) of malware capable to exploiting vulnerabilities of systems (execution of downloaders) or users (opening of infected e-mails) that led to the download of image files on the target computer. Ransomware executables were cleverly hidden inside these innocent-looking images.

•    Information hiding in exploit kits
It is one of the most recent trends. Information hiding methods became so popular among cybercriminals that they are already incorporated within exploit kits. This allows developers with little or no programming skills to create, customize, and distribute malvertising campaigns.
Put simply, the attack is spread through malicious code embedded within advertising banners, generated by changing the color space of the PNG image used. Then, the browsers of users who view an infected ad, analyze a JavaScript code that extracts the malicious code and redirects users to the landing page of the exploit kit. On the landing page, the infection with different types of malware is performed.

•    Malware injecting secret data into network traffic
Taking advantage of some network protocols features, some malware are able to directly act at the level of network traffic, sometimes executing queries thanks to which the return information allows the recovery of further executable malware.

Therefore, as it has been highlighted above, information-hiding-capable threats and stegomalware could become a new trend that can contribute to the creation of even more sophisticated malware, which can endanger a variety of setups.
Just think about the potential danger posed by the interconnected world and attacks on Internet of Things (IoT) nodes. In this perspective, the problem space to be addressed when dealing with novel steganographic malware is very broad and composite. To give a possible idea of the issues characterizing modern smart environments, we briefly highlight the most obvious security risks in the near future:

Buildings: with the advent of smart buildings (e.g., deploying Heat, Ventilation and Air Conditioning remotely accessed through the Internet), the possibility of exploiting a variety of sensors, actuators, nodes and software frameworks for steganography exploded: a playground difficult to control and inspect, where it will be hard to precisely track a data- or an execution flow. Therefore, nodes composing a smart building are candidate to become zombie nodes of a botnet or to offer a place where to temporary store stolen data to be covertly exfiltrated.

Phones, gaming consoles, set-top-boxes: there are many devices connected to the network and equipped with sufficient processing and storage resources to make them interesting targets for an attacker, for example to orchestrate a botnet or conduct a DoS. All these devices offer many vectors where to embed data (for example, actuators and sensors can implement a hidden channel). They are also delicious dishes because they contain tons of data that can be used for mass profiling campaigns or to develop scams based on social engineering.

Vehicles: modern vehicles (even in the mid-price range) offer a plethora of features for localization and route planning, support fleet management frameworks and intelligent/smart transportation services, remotely connect for telemetry and safety purposes and connect with personal devices for entertainment and communication. Possible threats to consider are: i) malicious software that can be inoculated via the onboard diagnostic port, firmware updates, embedded Web browsers, aftermarket devices or ports allowing to connect mass storage devices like SD cards or USB memories; ii) low-level attacks can be used to breach the privacy of the driver or perform sabotage; iii) data-centric attacks targeting the machine-learning portion of the vehicle (e.g., to assist the driver) can breach physical security by altering the behavior, for instance by poisoning data used to train algorithms or surrounding information gathered by sensors.

New information hiding techniques will be introduced continuously, and their degree of sophistication will increase. Hence, future malware-related traffic could be even harder to detect. Information hiding-capable malware can remain cloaked for a long period of time while slowly but continuously leaking sensitive user data.
For this reason, the European Commission considers stegomalware as a new advanced and persistent threat to be vigorously faced. Hence, the EU has funded the SIMARGL project - Secure intelligent methods for advanced recognition of malware, stegomalware & information hiding methods (Grant Agreement n ° 833042), funded with a total budget of 6 million Euros. In SIMARGL, Pluribus One cooperates with international actors such as Airbus, Siveco, Thales, Orange Cert, FernUniversität (project coordinator), and other two Italian partners: CNR, Genoa Unit, with its studies on Energy-Aware detection algorithms based on artificial intelligence; Numera, a company operating in the ICT sector based in Sassari, with its systems for credit transactions.

Pluribus One participates in the project making available two solutions. One is its flagship product for the security of the web applications: Attack Prophecy, advanced system for the detection and protection against web attacks, based on (adversarial) Machine Learning algorithms. The second is AIsafe DNS, a comprehensive solution for the prevention and detection of endpoints threats, that offers coverage against a wide range of threats, from malware to phishing, enabling the mitigation of the risk associated with them.

Altoghether, the SIMARGL consortium, is made up of 14 international partners from 7 countries (Netzfactor, Itti, Warsaw University, IIR, RoEduNet, Stichting CUIng Foundation also participate in the partnership). They will provide artificial intelligence studies, sophisticated products already available, and machine learning algorithms, to propose an integrated solution capable of dealing with different scenarios, acting at different levels: from network traffic monitoring to the detection blurred bits within images. It is an example of the know-how put in place by the consortium: latest generation web application firewalls based on machine learning algorithms, web gateways, concept drift detectors, advanced signal processing systems, lifelong learning intelligent systems (LLIS), hybrid classifiers.
The platform will be tested on several use cases, including payment systems, online transactions, and network protocols of reputable international Internet and Mobile providers.
The journey of the SIMARGL project has just begun and will provide concrete answers to the challenges posed by stegomalware in the next two years (the project will end in April 2022). The research team is well aware that it is addressing a never-ending challenge, in which it is difficult to set a definitive conclusion. The battle between cops and robbers is a well-known axiom: the improvement of the first ones generates the improvement of the others and vice versa.




Pluribus One S.r.l.

Via Bellini 9, 09128, Cagliari (CA)


PEC: pluribus-one[at]pec.pluribus-one.it


Legal entity

Share capital: € 10008

VAT no.: 03621820921

R.E.A.: Cagliari 285352


University of Cagliari

  Pluribus One is a spin-off of the Department of Electrical and Electronic Engineering, University of Cagliari, Italy