"DevOps ... brings a lot of challenges: the efficiency of image distribution, especially when you have a lot of applications and require image distribution at the same time. Dragonfly works extremely well with both Docker and Pouch, and actually we are compatible with any other container technologies without any modifications of container engine."
FWIW, this was a similar problem that I tackled for Golang gopher gala hackathon 2015 - a custom bittorrent based docker image registry POC.
Interestingly my problem statement was somewhat similar:
"Large scale deploys are going to choke your docker registry. Imagine pulling/deploying a 800mb base image (for example the official perl image) across 200 machines in one go. That's 800*200 = 160GB [EDIT: Correction thanks to kingbirdy] of data that's going to be deployed and it'll definitely choke your private docker registry (and will take a while pulling in from the public image)."
So if you own the private key you can write the payload, and just share your public-key address to the people you want to share the payload with. In the payload you can write a traditional immutable torrent manifest for instance, which is in essence a public-key crypto based update system.
For a lot of cases i think it's a better approach than what IPFS and DAT provides, because you dont care about it being a global address. All you want is to share with a group of people, more in the p2p social/organic way.
I was playing with it once, using the libtorrent library and the main bittorrent DHT, and it was a very nice experience.. it finds the payload pretty fast when you think its a DHT, and you are working in pure p2p fashion.
The only single point of failure here is the DHT bootstrap peer.
Im planing to use this feature to distribute binary images for clients that have my public key.
Does a client just search the DHT for the public key? I thought torrent clients searched for the hash of the files.
If it's searching for the public key then how does one person upload multiple different torrents, or do they create a new public key for each torrent? How does a client know which is the latest version if it has been updated multiple times?
Are there any example projects using this?
The only project I know is gittorrent (https://blog.printf.net/articles/2015/05/29/announcing-gitto...), but it hasn't gone anywhere.
> but you mean it's kind of like mutable torrents that only the uploader can modify?
Yes, your public key(hash) is the DHT key which is the one that identifies the payload, and only the private key owner can modify the content in that particular slot.
Thats why its cool, because you can have a p2p system that rely on trust between the parties, unlike the traditional torrent system.
Also im not so sure, but lately using centralized trackers are discourage and i guess that magnet links must use something like a DHT to work the way they do.
> I thought torrent clients searched for the hash of the files.
You are correct, but the DHT is a BEP and is something more "on the corner", but its theres and at least in the time i've tried it was working great.
> If it's searching for the public key then how does one person upload multiple different torrents, or do they create a new public key for each torrent? How does a client know which is the latest version if it has been updated multiple times?
The rules are:
You can create any slot you want, its just a matter of generating the public/private pair you want to use. (This would allow you even to use a forward-secrecy algorithm if you need one)
The byte payload/value must be small, so you should use to give a manifest of something or to point to something else.. But lets not forget that Git just look at the HEAD record with only a hash to go on from there. Just pointing out to something else that can be a immutable resource, like a traditional torrent.
So you can point to a torrent, download and have a small list of anything you like.. working as a catalog, and go from there.. anyway if you play the indirection game here right, theres no limit to what you can do.
If you want the payload to be there you need to keep writing to it from time to time (the same value if you want), or it will expire and other peers will not be able to locate it anymore.
What would i do? i would use the payload to point to something else.. like some torrent in the classic bittorrent network (you can use just a magnet link), or expose the whole torrent header. You can also point to some http resource or whatever.
Need something more? how about point to a torrent that download a bootstrap program that start a RPC service over tcp.. or over a more simple HTTP interface.. than do something else from there..
I was thinking about how can i use this to create a update by using diff and patch, giving that by using a public key scheme you can create a trust relationship between parties and patch the binary with something coming from the 'mothership'.
> Are there any example projects using this?
I dont know any, but in my case i was playing with libtorrent implementation of DHT. And also as far as i know, is this kind of properties in the DHT that allow projects like IPFS to exist.
The cool thing about using the torrent implementation is because the main DHT have a lot of nodes already so you can find something pretty fast.
There is an apparently Node.js implementation  of something that can publish a given torrent to a mutable address, and also retrieve a mutable torrent from a given address. I do not know how well this is implemented in the usual clients, but if you want it in your docker, you might want to talk to libtorrent directly, and implementing BEP 46 yourself should not be hard with the things the library has to offer. A benefit could be that depending on how you handle it, you might be able to store the tarballs docker images seem to be in their unpacked form, and just keep some metadata about what the header(s) of the tarball were, along with some file offsets. This way you would be relieved of the unnecessary storage burden, and able to possibly use many more of your servers to seed at least part of the images, e.g. maybe only the parts that are not mutated when the software is running. E.g., download once, unpack, only offer to seed those files/pieces that did not get modified in the meantime, without trying to re-download the "broken" data.
>At Alibaba, the system transfers 2 billion times and distributes 3.4PB of data every month, it has become one of the most important piece of infrastructure at Alibaba. The reliability is up to 99.9999%.
I took at look at their repo and it turns out there are surprisingly lots of good stuffs in it which never gets much spotlight or attentions.
If you have 5000 nodes running this, that could be as small as distributing 23.8 gigs/day to each node over the course of the month.
Also keep in mind this is a data distribution system. In the case of large data pushes, new builds, etc. it's important that all peers get the new data on a timely and reliable manner.
I suspect you're underestimating the problem this project solves by focusing on a mostly irrelevant data rate stat.
"up to"... Certainly looks like a wrong wording.
if I write a shell script with a scp line right now I can tell you it has 100% reliability. up to X% reliability means that "in the very nice and controlled environment, with the own devs attacking every production problem, trying to get 100% got us up to X%"
Docker themselves have discussed making the official registry extensible enough to support BitTorrent pulls, but I don't know if anything ever happened there.
Facebook has been using BitTorrent for deploys for something like 9 years now. They configured the tracker to prefer sharing peers with longer matching subnet prefixes, to keep bandwidth off the backbone as much as possible.
Honestly, most organizations don't have sophisticated enough networks that the benefits outweigh the complexity of p2p orchestration. This is why it's popular at Alibaba, Facebook, Twitter, but most people are still just using the OCI distribution protocol.
Feel free to contact me (Keybase is in my profile) if interested. I'd love to get more people on the path to p2p, but it's often a solution looking for a problem.
Interesting distribution of languages in what seems to be a somewhat self-contained project.
Oddly, https://github.com/alibaba/Dragonfly/tree/master/src only contains the getter and the supernode at the moment
The P2P slang though is freaking everywhere here. There are P2P bank and P2P brand sausages.
They are different things with different names.
Side note: this also shows how simplistic Google search really is. No way to search for "Dragonfly /computers/" opposed to "Dragonfly /nature/" with the terms in slashes denoting a concept or domain instead of a syntactic element.
Though i don't blame you for not knowing, out of 4-5 operator cheat sheets and guides i only see mention of the "-" to exclude terms.
- Easier to recoganize
- More branding influence (see the g... staff from Google)
- Easier to search
BTW: A windows edit called linux: not a bad idea!
( I still cant spell many of the Chinese Companies' English name )
The developer has since changed the name from Samba to Battlecry....