If there's only one copy of a program running, it won't matter - but if you are running hundreds of copies (even docerized and stuff), you are likely better off NOT upxing.
I don't think people care about that nowadays, seeing how popular Docker containers are. I think Docker containers already make it so that you cannot share executable memory between different containers because each one runs in its private namespace.
More specifically I wouldn't expect "free -m" to produce different result depending on the namespace it's run in.
Why is this exactly?
It is rarely a big loss, because executables that are in use tend to remain in memory if the program is actually active. If you have a 300MB demon that sleeps, though, you will likely notice a swap out to magnetic disk.
That doesn't work for UPX because each execution decompresses anew, which makes it a "new executable" from the OS' point of view.
The only thing that would help is kernel space merging, but that's really only activated for some virtual machines.
And even there, it's a security threat, since it enables cross-VM timing and other attacks.
(a) go compiles static binaries which makes UPX especially effective, unlike e.g. common C and C++ projects with their hundreds of .so/.dlls ; Delphi/FPC and Nim or the other two ecosystems that share this trait, but neither is as common as go.
(b) it's not good for non-static binaries, that is C#, Java, Python have no benefit from this.
(c) At the time I posted it, there were already 3 or 4 posts extolling the virtues of compressing Go executables.
A little walk down memory lane:
I once ran the exe mailing list for exe packers and protection tools. There was a whole scene of people in the 90s writing such tools and writing unpackers and removal tools for such things. UPX was one of the later ones that still existed when most of this scene vanished.
(Incidentally, these advanced packers also tend to frustrate RE to some extent, since the same tricks they use to increase compression ratios can often greatly confuse RE tools.)
I still have the malicious file on VM for me to do some analysis on it later. (if anyone would like it, feel free to contact me)
edit: added the contact me
You'd be suprised at how much of an elf binary is all 0's.
If you use uncompressed (or transparently compressed by the filesystem) binary, your process has mmaped the memory pages, which can be discarded and then reloaded, as needed.
If you use self-extractor, your process has dirty pages that it itself wrote, that cannot be discarded, but must be moved to swap if needed.
The more you use the same executable for multiple processes, the worse the effect is. The ro mmaped pages are shared among them all, the written pages are private to each process.
I would be surprised to see practical performance degradation in uncompressing executable code before jumping to the program on today's machine. The largest binary in my /usr/bin/ is 50 megabytes. On the other hand, for very, very large binaries it's probably faster to decompress in memory rather than load all the bits from disk.
Further, most executables aren't static these days. (I often wish they were, though!). What type of binaries have you got, and are they really so big that it's worth the hassle to compress them just to save disk space?
The binaries are mostly stuff like pandoc and compiled statically so that I can run them anywhere. Nothing too special.
Its not technically needed, but it makes network transfer faster and in general thats good enough. Its not really intended to reduce disk space really, just more a way to make things more manageable.
I just remembered: ProcDump32! Geez, that really blew my mind at the time and in a way still does.
Offering a sub MB executable in the era of 100 MB electron apps is totally pioneer :)
Of course to do it properly it'd need:
* A modified Clang (or other C++ compiler) that can use .ppu files with the necessary C++ language extensions for properties, callbacks, sets, enhanced RTTI, etc
* A C/C++ library that uses the Free Pascal RTL for all memory operations
* Lazarus' CodeTools to add C++ support for automatically creating missing event handler code (and removing unnecessary code), handling syntax completion, code completion for missing identifiers, property getters/setters and private fields, inherited fields, etc to the same standard as the Free Pascal code
* All the involved teams to agree to play nice with each other :-P
Also if such a thing would be done, judging from what most Lazarus and FPC devs do so far, it'd probably be done in a way that is as compatible with C++ Builder as possible.
TBH i don't really hold my breath, but who knows, weird stuff has happened before in both FPC and Lazarus :-P
You'd think that after reporting a false positive once, an AV vendor would whitelist the hash of the binary, but no. Some of them were re-detecting malware time and time again. Until we stopped using UPX.
Then AV companies could see that and not flag it as malware unless they had additional reason to think it was.
That doesn't seem like it'd be terribly difficult but there's a good chance I'm missing something.
They know it very well, but adding code to do decompression while performing scan is more complex and will surely reduce performance.
If the AV is already slow, they might decide to just label any UPX binary, since (let's not lie) most malware will be compressed with UPX or other tools.
IMHO an AV that doesn't know how to unpack UPX is almost like an AV that doesn't know how to unpack ZIP or RAR... and yet they universally do the latter.
I have a feeling that your false positives are caused by the fact that UPX (and other compressors) naturally create very high-entropy files, and AVs which do signature-type comparisons would like to reduce signature length as much as possible, so they also choose very high-entropy portions of malware to be as distinctive as possible while remaining short; but that also increases the chances of such sequences being found in other benign high-entropy files.
I'm almost willing to bet that your re-detections are not detecting the same malware, but new ones' signatures as the AV vendor adds them --- which coincidentally happens to match some other high-entropy portion of your binary.
Then again, the quest for speed and high detection rates (while false positive rates seem to be less of a concern) among AV vendors has lead to some massively embarrassing mistakes, like considering the mere existence of a path as detection of malware:
(The original article with the ridiculous claims has sadly vanished, but the Internet Archive remembers...)
1. I used UPX to compact my Delphi EXE file.
2. Then I openned up any HEXA editor
3. looked for "UPX" string and changed to "222x"
Doing this, the UPX unpack tool didn't work and crackers could not easily see or edit my source code with Assembly (as UPX mess with everything!)
Thanks UPX :-)
It's been years since I unpacked a UPX manually, but I still remember what it looks like: a PUSHA at the start to save all the registers, a lot of decompression code, and finally a POPA and a JMP to the OEP. Incidentally this general pattern is also shared by a bunch of other simple packers (more focused on compression than anti-RE) so unpacking them follows the same process.
It has been years. I vaguely remember, there is always a general standard way of unpacking.
Using the debugger, you keep track of those jmp instructions until unpacking is done. And then dump the memory to a file.
How did UPX loader managed to find the section in which packed content is stored?
UPD. It's REALLY easy to "hack" this protection. You simply need to attach a debugger and you will see unprotected exe file in the memory. There are tools to convert loaded unprotected exe file into regular exe file on the disk. So... No one really tried to hack you. Sorry.
Well, It was another protection layer for, like you sad, to keep bad kids away.
This was 10 years ago already.
You can inspect the code running on your machine. The machine code.
At what level should one expect it's user to understand the code running on one's machine? If I have you the source to my application in brainfuck, would that suffice?
We used to compress all our binaries (desktop software developers), but fighting false positives from antivirus vendors became an endless nightmare. We just gave up and stopped using binary compressors entirely.
It's simply nice to ship a fully working app, with SQLite* and everything, which will basically run anywhere with a Linux kernel, in a single executable far below 2 MB.
*) Yes, the vast majority of the world's websites need nothing fancier than SQLite to keep them happy. And manageable.
Years back I used gzexe and also some pkzip based thing on DOS. On a modern system, you're better of enabling filesystem level compression which also won't break OS paging if the executable is run more than once.
We use it heavily to compress some of our Docker image executables .
The exception are NSIS installers, self-extracting archives (exe rar files), files with IDL interfaces.
When NSIS starts they will try to open it's own exe file and find the section in which it's packed data is stored. But UPX will remove those sections and create .UPX section with compressed data.
These days I struggle to fill my hard drives no matter how wasteful I am with downloading videos and not bothering to clean up afterwards... and the amount of hard drive space you can buy per dollar keeps growing faster than I can fill my disks.
Much trickier issues to tackle are speed (unless you go with SSD's, but then you run in to space issues again, and reliability issues), backups, and data integrity. All of these issues are made much harder by the sheer amounts of data we're storing these days. Executables usually account for only a relatively small fraction of that space.
and related https://news.ycombinator.com/item?id=7739599
How does UPX defend against reverse engineering? The binary literally contains the code to reverse the UPX compression (otherwise it couldn't run), and I'd expect all antiviruses to be able to unpack UPX executables.