29 - STUDENT - A Signature-free approach to malicious code detection by applying entropy analysis to network streams

Benjamin Jochheim (HAW Hamburg)

As modern software architectures become more and more complex, so are the ways for an attacker on such an architecture. The trend to distribute services of a complex system on various hosts on a network means that such a system becomes even more vulnerable to an attack over the network than a centralized system. Every system on a network has to be patched constantly so that the known attack vectors can be fixed as soon as possible. Traditional virus scanners that likewise operate on a signature database need frequent upgrading to remain effective. However, there will always be a gap between the detection of an attack vector and the time until counter measures apply.
To protect distributed systems from network attacks, it is desirable to provide early warning
procedures that can efficiently diagnose unwanted code fragments in data which frequently indicate network attacks. Such early warnings should not rely on signatures or previous knowledge, but prove sensitive to detecting zero-day exploits. Such early indicators could trigger further analysis of packets, either at end systems or at the network carrier that may support its customers by adding a new layer of security through scanning the network data streams in real-time and detect embedded shell-code attacks.
In this paper, we present early results of a statistical approach to a lightweight scanning of
data streams in real-time that is based on the Shannon Entropy. The Entropy Function [4] can
be used to differentiate between different content types, as displayed in Figure 1.
The scheme uses time-frequency analysis to extract non-stationary properties of entropy
signals. While previous work [2, 3] has used (stationary) entropy averages to roughly distinguish data types, our approach applies a window-based Short-term Fourier Transform (STFT) to the entropy signal. Such an Entropy Spectrum can be seen in Figure 2.
Using threshold filters, we can differentiate between normal data and – possibly embedded –
shell-code with reasonable accuracy. The Figure 3 shows that our filter passed binary ELF-code.
The binary code is successfully detected.
Initially designed for mobiles [5], the proposed scheme can efficiently filter large amounts of
data and find code sequences that can contain malicious code. A shellcode sequence can often be seen as a peak in the entropy spectrum as shown in Figure 4 Those code sequences can then be further processed to filter out false positives.

Download file