hathawsh 128 days ago [-]
I wonder what the author means by "a lot" of RAM and storage. I tried it for fun. The git process pegged one CPU core and swelled to 26 GB of RAM over 8 minutes, after which I had to kill it.
wscott 128 days ago [-]
Yeah I tried it too. Killed at 65G. Disappointed that Linux killed Chrome first.

    Oct 12 15:47:52 x99 kernel: [552390.074468] Out of memory: Kill process 7898 (git) score 956 or sacrifice child
    Oct 12 15:47:52 x99 kernel: [552390.074471] Killed process 7898 (git) total-vm:65304212kB, anon-rss:63789568kB, file-rss:1384kB, shmem-rss:0kB

Interesting. Linux didn't kill Chrome, it died on its own.

    Oct 12 15:42:21 x99 kernel: [552060.423448] TaskSchedulerFo[8425]: segfault at 0 ip 000055618c430740 sp 00007f344cc093f0 error 6 in chrome[556188a1d000+55d1000]
    Oct 12 15:42:21 x99 kernel: [552060.439116] Core dump to |/usr/share/apport/apport 16093 11 0 16093 pipe failed
    Oct 12 15:42:21 x99 kernel: [552060.450561] traps: chrome[16409] trap invalid opcode ip:55af00f34b4c sp:7ffee985fb20 error:0
    Oct 12 15:42:21 x99 kernel: [552060.450564]  in chrome[55aeffb76000+55d1000]
    Oct 12 15:47:52 x99 kernel: [552390.074289] syncthing invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
Seems Chrome faulted first, but it was probably capturing all signals and didn't handle OOM. Then next, syncthing faulted and it started the oom-killer which correctly selected 'git' to kill.
Tharre 128 days ago [-]
> [..] and didn't handle OOM.

How would Chrome 'handle' an OOM anyway? As far as I'm aware, malloc doesn't return ENOMEM when the system runs out of memory, only when you hit RLIMIT_AS and alike.

exikyut 128 days ago [-]
Or when you hit 4G VIRT on 32-bit.

Took me a good day's worth of debugging before some bright spark piped up and said "wait, you said you were on x86-32...?"

...yeah, I use really old computers.

katastic 128 days ago [-]
I'm setting up my last machine for my wife for gaming. Athlon X4 630, and 16 GB of RAM. I loaded windows up and said it had ~2 GB free and I was like "oh crap, the RAM sticks must be dead" (because the last motherboard that I just replaced broke some RAM slots).

I fixed my old video card, a GTX 560, and wanted to see what it could run. I loaded steam and PUBG said "invalid platform error". It took me a moment. I hit alt-pausebreak, presto, Windows 32-bit. Whoops.

Hadn't had that problem in a long time except at clients running ancient windows server versions complaining about why Exchange 2003 won't work with their iPhones anymore "it used to work and we didn't change anything!" (Yeah... but the iPhone DID change--including banning your insecure 2003 Exchange protocols.)

gabesullice 128 days ago [-]
Humblebrag ;)
geezerjay 128 days ago [-]
Nowadays 32GB of RAM go for as little as 170$. Some mid-tier graphics cards cost much more than that.
xorfish 127 days ago [-]
They went for around 100$ during summer 2016, now the cheapest DDR4 is around 240$:


hathawsh 127 days ago [-]
Wow, I didn't notice just how much fluctuation there has been in RAM prices. My Newegg order history shows I paid $65 for 16 GB of DDR3/1600 at the end of 2015. Now the exact same product is sold by Newegg for $122. Crazy!


smcl 128 days ago [-]
I sometimes forget that people use Desktops or systems with ability to add extra RAM.
porfirium 128 days ago [-]
If we all click "Download ZIP" on this repo we can crash GitHub together!

Just click here: https://codeload.github.com/Katee/git-bomb/zip/master

AceJohnny2 128 days ago [-]
I hope and expect that GitHub has the basic infrastructure to monitor excessive processes and kill them.
exikyut 128 days ago [-]
Scratches head

...I clicked Download a few seconds ago.

GitHub is still thinking. :/

Edit: After about a minute I got a pink unicorn.

abritinthebay 128 days ago [-]
Wouldn't that just do a `git fetch` and therefore not have the issue?
minitech 128 days ago [-]
"Download ZIP" downloads the repository’s files as a zip. No Git involved for the downloader.
chii 128 days ago [-]
i expect the download zip to be implemented as running 'git archive --format zip | write-http-response-stream'
mschuster91 128 days ago [-]
Hmm I'd hope they do a caching step in between ;)
timdorr 128 days ago [-]
I'm curious how this was uploaded to GitHub successfully. I guess they do less actual introspection on the repo's contents than I thought. Did it wreak havoc on any systems behind the scenes (similar to big repos like Homebrew's)?
stolee 128 days ago [-]
There isn't anything wrong with the objects. A 'fetch' succeeds but the 'checkout' is what blows up.
yes_or_gnome 128 days ago [-]
Good point. For those that are curious:

Clone (--no-checkout):

    $ git clone --no-checkout https://github.com/Katee/git-bomb.git
    Cloning into 'git-bomb'...
    remote: Counting objects: 18, done.
    remote: Compressing objects: 100% (6/6), done.
    remote: Total 18 (delta 2), reused 0 (delta 0), pack-reused 12
    Unpacking objects: 100% (18/18), done.
From there, you can do some operations like `git log` and `git cat-file -p HEAD` (I use the "dump" alias[1]; `git config --global alias.dump catfile -p`), but not others `git checkout` or `git status`.

[1] Thanks to Jim Weirich and Git-Immersion, http://gitimmersion.com/lab_23.html. I never knew the guy, but, ~~8yrs~~ (corrected below) 3.5yrs after his passing, I still go back to his presentations on Git and Ruby often.

Edit: And, to see the whole tree:

  while [ -n "$NEXT_REF" ]; do
    echo "$NEXT_REF"
    git dump "${NEXT_REF}"
    NEXT_REF=$(git dump "${NEXT_REF}"^{tree} 2>/dev/null | awk '{ if($4 == "d0" || $4 == "f0"){ print $3 } }')
matthewrudy 128 days ago [-]
Sad one to nitpick, but Jim died in 2014. So ~3.5 years ago.

Had the pleasure of meeting him in Singapore in 2013.

Still so much great code of his we use all the time.

yes_or_gnome 128 days ago [-]
Thanks for the correction, he truly was a brilliant mind. One of my regrets was not being active and outgoing enough to go meet him myself. I was lived in the Cincinnati area from 2007-2012. I first got started with Ruby in 2009, and quickly became aware of who he was (Rake, Bundler, etc) and that he lived/worked close by. But, at the time, I wasn't interested in conferences, meetups, or simply emailing someone to say thanks.
enzanki_ars 128 days ago [-]
I too was curious about this.

https://github.com/Katee/git-bomb/commit/45546f17e5801791d4b... shows:

"Sorry, this diff is taking too long to generate. It may be too large to display on GitHub."

...so they must have some kind of backend limits that may have prevented this for becoming an issue.

I wonder what would happen if it was hosted on a GitLab instance? Might have to try that sometime...

ballenf 128 days ago [-]
Since GitHub paid a bounty and Ok'd release, perhaps they've patched some aspects of it already. Might be impossible to recreate the issue now.

My naive question is whether CLI "git" would need or could benefit from a patch. Part of me thinks it doesn't, since there are legitimate reasons for each individual aspect of creating the problematic repo. But I probably don't understand god deeply enough to know for sure.

mnx 128 days ago [-]
is this a git->god typo, or a statement about your feelings towards Linus?
warent 128 days ago [-]
Please don't let Linus read this
ethomson 128 days ago [-]
Yes, hosting providers need rate limiting mitigations in place. GitHub's is called gitmon (at least unofficially), and you can learn more at https://m.youtube.com/watch?v=f7ecUqHxD7o

Visual Studio Team Services has a fundamentally different architecture, but we do some similar mechanisms despite that. (I should do some talks about it - but it's always hard to know how much to say about your defenses lest it give attackers clever new ideas!)

corobo 128 days ago [-]
> how much to say about your defenses lest it give attackers clever new ideas

attackers will try clever new ideas anyway if their less clever old ideas don't work :P

Sean1708 128 days ago [-]
How does the saying go? Something like "security through obscurity isn't security"?
ethomson 125 days ago [-]
It's not security through obscurity. It's defense in depth.
deckar01 128 days ago [-]
GitLab uses a custom Git client called Gitaly [0].

> Project Goals

> Make the git data storage tier of large GitLab instances, and GitLab.com in particular, fast.

[0]: https://gitlab.com/gitlab-org/gitaly

Edit: It looks like Gitaly still spawns git for low level operations. It is probably affected.

jychang 128 days ago [-]
Spawning git doesn't mean that it can't just check for a timeout and stop the task with an error.

Someone will probably have to actually try an experiment with Gitlab.

lloeki 128 days ago [-]
Tested locally on a GitLab instance: trying to push the repo results in a unicorn worker allocating ~3GB and pegging a core, then being killed on a timeout by the unicorn watchdog.

    Counting objects: 18, done.
    Delta compression using up to 4 threads.
    Compressing objects: 100% (17/17), done.
    Writing objects: 100% (18/18), 2.13 KiB | 0 bytes/s, done.
    Total 18 (delta 3), reused 0 (delta 0)
    remote: GitLab: Failed to authorize your Git request: internal API unreachable
    To gitlab.example.com: lloeki/git-bomb.git
     ! [remote rejected] master -> master (pre-receive hook declined)
    error: failed to push some refs to 'git@gitlab.example.com:lloeki/git-bomb.git'
I had "Prevent committing secrets to Git" enable though. Disabling this makes the push work. The repo first then can be browsed at the first level only from the web UI, but clicking in any folder breaks the whole thing down with multiple git processes hanging onto git rev-list.

EDIT: reported at https://gitlab.com/gitlab-org/gitlab-ce/issues/39093 (confidential).

shade23 128 days ago [-]
styfle 128 days ago [-]
Thanks. Here is the comment from a GitHub engineer addressing the root cause:


JoshMnem 128 days ago [-]
Because that page is AMP by default, it takes about 7 seconds to load the page on my laptop. AMP is really slow in some cases.

Edit: see my comment below before you downvote me.

katee 128 days ago [-]
Huh, I've tested on a bunch of devices/connections and haven't encountered that. Do you know what causes AMP to be that slow for you? I'll take a look at serving non-AMP pages by default. It will require tweaking how image inclusion works.
JoshMnem 128 days ago [-]
For people who use extensions or browsers that block third party JS, AMP pages will take many seconds to load in non-mobile Web browsers.

Here is information about some of the other problems with AMP:






xpaulbettsx 128 days ago [-]
Fix your browser /shrug
JoshMnem 128 days ago [-]
It isn't just my browser. AMP performs very badly in some non-mobile browsers (no extensions).
amigoingtodie 128 days ago [-]
Fix your website
Sir_Cmpwn 128 days ago [-]
Would you please remove amp entirely?
TeMPOraL 128 days ago [-]
Same here. The page just stays blank for few seconds, and then pops into existence.

(I do use uMatrix to block 3rd party JS.)

pmoriarty 127 days ago [-]
Why not just always run git under memory limits?

For example:

  %  ulimit -a
  -t: cpu time (seconds)              unlimited
  -f: file size (blocks)              unlimited
  -d: data seg size (kbytes)          unlimited
  -s: stack size (kbytes)             8192
  -c: core file size (blocks)         0
  -m: resident set size (kbytes)      unlimited
  -u: processes                       30127
  -n: file descriptors                1024
  -l: locked-in-memory size (kbytes)  unlimited
  -v: address space (kbytes)          unlimited
  -x: file locks                      unlimited
  -i: pending signals                 30127
  -q: bytes in POSIX msg queues       819200
  -e: max nice                        30
  -r: max rt priority                 99
  -N 15:                              unlimited
  %  ulimit -d $((100 * 1024)) # 100 MB
  %  ulimit -m $((100 * 1024)) # 100 MB
  %  ulimit -l $((100 * 1024)) # 100 MB
  %  ulimit -v $((100 * 1024)) # 100 MB
  %  git clone https://github.com/Katee/git-bomb.git
  Cloning into 'git-bomb'...
  remote: Counting objects: 18, done.
  remote: Compressing objects: 100% (6/6), done.
  remote: Total 18 (delta 2), reused 0 (delta 0), pack-reused 12
  Unpacking objects: 100% (18/18), done.
  fatal: Out of memory, malloc failed (tried to allocate 118 bytes)
  warning: Clone succeeded, but checkout failed.
  You can inspect what was checked out with 'git status'
  and retry the checkout with 'git checkout -f HEAD'
ericfrederich 128 days ago [-]
Run this to create a 40K file which expands to 1GiB

  yes | head -n536870912 | bzip2 -c > /tmp/foo.bz2
I would imagine you could do something really creative with ImageMagick to create a giant PNG file as well that'll make browsers, viewers, editors crash as well.
tedunangst 128 days ago [-]
PNG has dimensions in the header so the decoder should know when it's decompressed enough.
Sean1708 128 days ago [-]
You can take it a step further using Zip Bombs[0].

[0]: https://en.wikipedia.org/wiki/Zip_bomb

Hupriene 128 days ago [-]
You can also make archives that contain themselves:


warent 128 days ago [-]
Odd. It's surprising to me that this example runs out of memory. What would be a possible solution?

Admittedly I don't know that much about the inner-workings of git, but off the top of my head, perhaps something with traversing the tree depth-first and releasing resources as you hit the bottom?

ericfrederich 128 days ago [-]
You need a problem to have a solution to it. What do you consider to be the problem here?

This is essentially something that can be expressed in relatively few bytes that expands to something much larger.

Imagine I had a compressed file format for blank files "0x00" the whole way. It is implemented by writing in ascii the size of the uncompressed file.

So the contents of a file called terrabyte.blank is just ascii "1000000000000" ... or the contents of a file called petabyte.blank is "10000000000000"

I cannot decompress these files... what is the solution?

geezerjay 128 days ago [-]
>You need a problem to have a solution to it. What do you consider to be the problem here? > >This is essentially something that can be expressed in relatively few bytes that expands to something much larger.

That seems to be the problem. I mean, if an object expands to something much larger to the point that it crashes services just by the sheer volume of the resources it takes... That is pretty much the definition of an attack vector of a denial-of-service attack.

TeMPOraL 128 days ago [-]
There is a problem here, but it's not with data. It's with the service.

Being able to express trees efficiently in a data format is an useful feature, but it requires the code processing it not to be lazy and assume people will never create pathological tree structures.

warent 128 days ago [-]
I'm not following; why can't you decompress it? Of course you cant decompress it into memory, but if it's trying to that then there's a problem in the code (problem identified).

Naive solution, just write to the end of the file and make sure you have enough disk. More sophisticated solution, shard the file across multiple disks.

Piskvorrr 127 days ago [-]
That's not a solution, that's sweeping the problem under the rug: "just have the OS provide storage, therefore it's not my problem any more, solved. (Never mind that with a few more layers, the tree would decompress into a structure larger than all the storage ever available to mankind)"
peff 128 days ago [-]
Git assumes it can keep a small struct in memory for each file in the repository (not the file contents, but a fixed per-file size). This repository just has a very large number of files.
glandium 128 days ago [-]
Large as in 10 billions. Even if git only needed 1 byte in memory per file, it would need 10GB.
koenigdavidmj 128 days ago [-]
One option is to modify each of the utilities so that it doesn't have a full representation of the whole tree in memory. I doubt this is feasible in all cases, though for something like 'git status' it should be doable.

If the tree object format was required to store its own path, then you wouldn't be able to repeat the tree a bunch of times. The in-memory representation would be the same size, but you would now need that same number of objects in the repository. No more exponential fanout.

But that would kind of defeat the purpose of Git for real use cases (renaming a directory shouldn't make the size of your repo blow up).

TeMPOraL 128 days ago [-]
Have git (the client) monitor its own memory usage and abort if it gets above a set limit (say, default, 1GB), with a message that tells you how to change or disable the limit.
gwerbin 128 days ago [-]
Would this be possible with a patch-based version control system like Darcs or Pijul? Does patch-based version control have other analogous security risks, or is it "better" in this case?
fanf2 128 days ago [-]
If the patch language includes a recursive copy than it's possible to reproduce this problem in that setting.
geezerjay 128 days ago [-]
If I understood correctly, this problem isn't caused by recursive copies but simply by expanding references. The example shows that the reference expansion leads to an exponential increase in resources required by the service.
TeMPOraL 128 days ago [-]
This means the same in this context; if it was just expanding references one by one while walking through the tree this would not happen - the bomb requires copies of expanded references to be stored in memory.
TeMPOraL 128 days ago [-]
Going to second level on Github breaks commit name for me - it gets stuck with "Fetching latest commit..." message. Curiously, go one level deeper and the commit message is again correct.


(INB4 The article suggests Github is aware of this repo, so I have no qualms posting this link here.)

emeraldd 128 days ago [-]
Bare for the win.

    git clone https://github.com/Katee/git-bomb.git --bare
infinity0 128 days ago [-]
Directory hard links would "fix" this issue since `git checkout` could just create a directory hard link for each duplicated tree. I wonder why traditional UNIX does not support this for any filesystem.

(Yes you would need to add a loop detector for paths and resolve ".." differently but it's not like doing this is conceptually hard.)

breakingcups 128 days ago [-]
Has anyone tried to see how well BitBucket and Gitlab handle this?
Retr0spectrum 128 days ago [-]
What happens if you try to make a recursive tree?
katee 128 days ago [-]
You can't make a valid recursive tree without a pre-image attack against SHA1. However `git` doesn't actually verify the SHA1s when it does most commands. If you make a recursive tree and try `git status` it will segfault because the directory walking gets stuck in infinite recursion.
ethomson 128 days ago [-]
As in a tree that points to itself? You cannot, since a tree would have to point to its own SHA1. So this would require you to know your own tree's SHA and embed it in the tree.
mv4 128 days ago [-]
Reminded me of the GIF that displays its own MD5 hash:


jwandborg 127 days ago [-]
So it's possible, but impractical?
mv4 127 days ago [-]
I think it's possible.
kowdermeister 128 days ago [-]
I thought it would self destruct after cloning of forking before clicking :)