16 C
New York
Thursday, April 3, 2025

Evaluating the Efficiency of Hashing Strategies for Related Perform Detection


Think about that you simply reverse engineered a bit of malware in painstaking element solely to search out that the malware writer created a barely modified model of the malware the subsequent day. You would not need to redo all of your exhausting work. One solution to keep away from beginning over from scratch is to make use of code comparability strategies to attempt to establish pairs of features within the outdated and new model which are “the identical.” (I’ve used quotes as a result of “similar” on this occasion is a little bit of a nebulous idea, as we’ll see).

A number of instruments may help in such conditions. A very fashionable business device is zynamics’ BinDiff, which is now owned by Google and is free. The SEI’s Pharos binary evaluation framework additionally features a code comparability utility known as fn2hash. On this SEI Weblog put up, the primary in a two-part sequence on hashing, I first clarify the challenges of code comparability and current our resolution to the issue. I then introduce fuzzy hashing, a brand new sort of hashing algorithm that may carry out inexact matching. Lastly, I evaluate the efficiency of fn2hash and a fuzzy hashing algorithm utilizing quite a few experiments.

Background

Precise Hashing

fn2hash employs a number of varieties of hashing. Essentially the most generally used known as place impartial code (PIC) hashing. To see why PIC hashing is necessary, we’ll begin by a naive precursor to PIC hashing: merely hashing the instruction bytes of a operate. We’ll name this actual hashing.

For instance, I compiled this straightforward program oo.cpp with g++. Determine 1 exhibits the start of the meeting code for the operate myfunc (full code):

Figure 1- Assembly code and bytes from oo.gcc

Determine 1: Meeting Code and Bytes from oo.gcc

Precise Bytes

Within the first highlighted line of Determine 1, you’ll be able to see that the primary instruction is a push r14, which is encoded by the hexadecimal instruction bytes 41 56. If we accumulate the encoded instruction bytes for each instruction within the operate, we get the next (Determine 2):

Figure 2-Exact bytes in oo.gcc

Determine 2: Precise Bytes in oo.gcc

We name this sequence the actual bytes of the operate. We will hash these bytes to get an actual hash, 62CE2E852A685A8971AF291244A1283A.

Shortcomings of Precise Hashing

The highlighted name at deal with 0x401210 is a relative name, which signifies that the goal is specified as an offset from the present instruction (technically, the subsequent instruction). For those who have a look at the instruction bytes for this instruction, it contains the bytes bb fe ff ff, which is 0xfffffebb in little endian kind. Interpreted as a signed integer worth, that is -325. If we take the deal with of the subsequent instruction (0x401210 + 5 == 0x401215) after which add -325 to it, we get 0x4010d0, which is the deal with of operator new, the goal of the decision. Now we all know that bb fe ff ff is an offset from the subsequent instruction. Such offsets are known as relative offsets as a result of they’re relative to the deal with of the subsequent instruction.

I created a barely modified program (oo2.gcc) by including an empty, unused operate earlier than myfunc (Determine 3). Yow will discover the disassembly of myfunc for this executable right here.

If we take the precise hash of myfunc on this new executable, we get 05718F65D9AA5176682C6C2D5404CA8D. You’ll discover that is totally different from the hash for myfunc within the first executable, 62CE2E852A685A8971AF291244A1283A. What occurred? Let’s take a look at the disassembly.

Figure 3 - Assembly code and bytes from oo2.gcc

Determine 3: Meeting Code and Bytes from oo2.gcc

Discover that myfunc moved from 0x401200 to 0x401210, which additionally moved the deal with of the decision instruction from 0x401210 to 0x401220. The decision goal is specified as an offset from the (subsequent) instruction’s deal with, which modified by 0x10 == 16, so the offset bytes for the decision modified from bb fe ff ff (-325) to ab fe ff ff (-341 == -325 – 16). These modifications modify the precise bytes as proven in Determine 4.

Figure 4 - Exact bytes in 002.gcc

Determine 4: Precise Bytes in oo2.gcc

Determine 5 presents a visible comparability. Crimson represents bytes which are solely in oo.gcc, and inexperienced represents bytes in oo2.gcc. The variations are small as a result of the offset is barely altering by 0x10, however this is sufficient to break actual hashing.

Figure 5- Difference between exact bytes in oo.gcc and oo2.gcc

Determine 5: Distinction Between Precise Bytes in oo.gcc and oo2.gcc

PIC Hashing

The aim of PIC hashing is to compute a hash or signature of code in a approach that preserves the hash when relocating the code. This requirement is necessary as a result of, as we simply noticed, modifying a program typically leads to small modifications to addresses and offsets, and we do not need these modifications to change the hash. The instinct behind PIC hashing is very straight-forward: Determine offsets and addresses which are more likely to change if this system is recompiled, akin to bb fe ff ff, and easily set them to zero earlier than hashing the bytes. That approach, if they modify as a result of the operate is relocated, the operate’s PIC hash will not change.

The next visible diff (Determine 6) exhibits the variations between the precise bytes and the PIC bytes on myfunc in oo.gcc. Crimson represents bytes which are solely within the PIC bytes, and inexperienced represents the precise bytes. As anticipated, the primary change we will see is the byte sequence bb fe ff ff, which is modified to zeros.

Figure 6- Byte Difference Between PIC Bytes (Red) and Exact Bytes (Green)

Determine 6: Byte Distinction Between PIC Bytes (Crimson) and Precise Bytes (Inexperienced)

If we hash the PIC bytes, we get the PIC hash EA4256ECB85EDCF3F1515EACFA734E17. And, as we’d hope, we get the similar PIC hash for myfunc within the barely modified oo2.gcc.

Evaluating the Accuracy of PIC Hashing

The first motivation behind PIC hashing is to detect similar code that’s moved to a special location. However what if two items of code are compiled with totally different compilers or totally different compiler flags? What if two features are very comparable, however one has a line of code eliminated? These modifications would modify the non-offset bytes which are used within the PIC hash, so it will change the PIC hash of the code. Since we all know that PIC hashing is not going to all the time work, on this part we focus on the way to measure the efficiency of PIC hashing and evaluate it to different code comparability strategies.

Earlier than we will outline the accuracy of any code comparability method, we want some floor fact that tells us which features are equal. For this weblog put up, we use compiler debug symbols to map operate addresses to their names. Doing so gives us with a floor fact set of features and their names. Additionally, for the needs of this weblog put up, we assume that if two features have the identical title, they’re “the identical.” (This, clearly, shouldn’t be true normally!)

Confusion Matrices

So, for instance we have now two comparable executables, and we need to consider how effectively PIC hashing can establish equal features throughout each executables. We begin by contemplating all attainable pairs of features, the place every pair incorporates a operate from every executable. This strategy known as the Cartesian product between the features within the first executable and the features within the second executable. For every operate pair, we use the bottom fact to find out whether or not the features are the identical by seeing if they’ve the identical title. We then use PIC hashing to foretell whether or not the features are the identical by computing their hashes and seeing if they’re similar. There are two outcomes for every willpower, so there are 4 prospects in complete:

  • True constructive (TP): PIC hashing appropriately predicted the features are equal.
  • True unfavorable (TN): PIC hashing appropriately predicted the features are totally different.
  • False constructive (FP): PIC hashing incorrectly predicted the features are equal, however they don’t seem to be.
  • False unfavorable (FN): PIC hashing incorrectly predicted the features are totally different, however they’re equal.

To make it a bit of simpler to interpret, we coloration the nice outcomes inexperienced and the unhealthy outcomes crimson. We will symbolize these in what known as a confusion matrix:

table1_04152024

For instance, here’s a confusion matrix from an experiment wherein I take advantage of PIC hashing to check openssl variations 1.1.1w and 1.1.1v when they’re each compiled in the identical approach. These two variations of openssl are fairly comparable, so we’d count on that PIC hashing would do effectively as a result of quite a lot of features will probably be similar however shifted to totally different addresses. And, certainly, it does:

04152024_table2

Metrics: Accuracy, Precision, and Recall

So, when does PIC hashing work effectively and when does it not? To reply these questions, we will want a neater solution to consider the standard of a confusion matrix as a single quantity. At first look, accuracy looks as if probably the most pure metric, which tells us: What number of pairs did hashing predict appropriately? This is the same as

For the above instance, PIC hashing achieved an accuracy of

You may assume 99.9 % is fairly good, however if you happen to look carefully, there’s a refined drawback: Most operate pairs are not equal. In line with the bottom fact, there are TP + FN = 344 + 1 = 345 equal operate pairs, and TN + FP = 118,602 + 78 = 118,680 nonequivalent operate pairs. So, if we simply guessed that every one operate pairs have been nonequivalent, we’d nonetheless be proper 118680 / (118680 + 345) = 99.9 % of the time. Since accuracy weights all operate pairs equally, it isn’t probably the most acceptable metric.

As a substitute, we would like a metric that emphasizes constructive outcomes, which on this case are equal operate pairs. This result’s in step with our aim in reverse engineering, as a result of realizing that two features are equal permits a reverse engineer to switch data from one executable to a different and save time.

Three metrics that focus extra on constructive circumstances (i.e., equal features) are precision, recall, and F1 rating:

  • Precision: Of the operate pairs hashing declared equal, what number of have been really equal? This is the same as TP / (TP + FP).
  • Recall: Of the equal operate pairs, what number of did hashing appropriately declare as equal? This is the same as TP / (TP + FN).
  • F1 rating: This can be a single metric that displays each the precision and recall. Particularly, it’s the harmonic imply of the precision and recall, or (2 ∗ Recall ∗ Precision) / (Recall + Precision). In comparison with the arithmetic imply, the harmonic imply is extra delicate to low values. Which means if both the precision or recall is low, the F1 rating will even be low.

So, trying on the instance above, we will compute the precision, recall, and F1 rating. The precision is 344 / (344 + 78) = 0.81, the recall is 344 / (344 + 1) = 0.997, and the F1 rating is 2 ∗ 0.81 ∗ 0.997 / (0.81 + 0.997) = 0.89. PIC hashing is ready to establish 81 % of equal operate pairs, and when it does declare a pair is equal it’s appropriate 99.7 % of the time. This corresponds to an F1 rating of 0.89 out of 1.0, which is fairly good.

Now, you is perhaps questioning how effectively PIC hashing performs when there are extra substantial variations between executables. Let’s have a look at one other experiment. On this one, I evaluate an openssl executable compiled with GCC to at least one compiled with Clang. As a result of GCC and Clang generate meeting code otherwise, we’d count on there to be much more variations.

Here’s a confusion matrix from this experiment:

table3_04152024

On this instance, PIC hashing achieved a recall of 23 / (23 + 301) = 0.07, and a precision of 23 / (23 + 31) = 0.43. So, PIC hashing can solely establish 7 % of equal operate pairs, however when it does declare a pair is equal it’s appropriate 43 % of the time. This corresponds to an F1 rating of 0.12 out of 1.0, which is fairly unhealthy. Think about that you simply spent hours reverse engineering the 324 features in one of many executables solely to search out that PIC hashing was solely capable of establish 23 of them within the different executable. So, you’d be compelled to needlessly reverse engineer the opposite features from scratch. Can we do higher?

The Nice Fuzzy Hashing Debate

Within the subsequent put up on this sequence, we’ll introduce a really totally different sort of hashing known as fuzzy hashing, and discover whether or not it could possibly yield higher efficiency than PIC hashing alone. As with PIC hashing, fuzzy hashing reads a sequence of bytes and produces a hash. Not like PIC hashing, nevertheless, if you evaluate two fuzzy hashes, you might be given a similarity worth between 0.0 and 1.0, the place 0.0 means utterly dissimilar, and 1.0 means utterly comparable. We are going to current the outcomes of a number of experiments that pit PIC hashing and a preferred fuzzy hashing algorithm towards one another and look at their strengths and weaknesses within the context of malware reverse-engineering.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles