NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Nix × IPFS – Milestone 1 (blog.ipfs.io)
imhoguy 1304 days ago [-]
Open source plus distributed p2p file sharing is the killer combo. I don't get why public stuff like NPM, DEB or Docker registries haven't switched to use it as primary way of distribution.

P2P, such as IPFS and recently BitTorrent 2.0 with its hash tree per file, is the only free (as beer and speech) reliable way to host things online forever - at least as long there is the last of the veterans who keeps a copy and seeds it.

solarkraft 1304 days ago [-]
> I don't get why public stuff like NPM, DEB or Docker registries haven't switched to use it as primary way of distribution

2 of them are run by commercial entities. I don't think those are all that interested in reducing dependency on them.

akerro 1304 days ago [-]
>2 of them are run by commercial entities. I don't think those are all that interested in reducing dependency on them.

Docker is also removing some old images and putting limits on existing images, it's too expensive for them to host.

m463 1302 days ago [-]
or it could be a strategy to move more people to a pay model.
na85 1304 days ago [-]
In a lot of the world, upload speeds are very limited and ISPs charge an arm and a leg if you exceed your upstream data caps.

On top of that, Bit torrent has a tendency to overwhelm routers and make them slow to a crawl.

Those are a lot of downsides to ask your users to accept just so you can cheap out on download servers.

takeda 1304 days ago [-]
Perhaps there are routers that don't handle well large amounts of data, but from my experience what typically is happening are two things:

1. the fairness in TCP is done per TCP connection, bittorrent opens many connections so they take over the whole bandwidth. You can limit number of connections or throttle the bandwidth used by BT (of course you will get slower speeds), maybe there are some other ways.

2. when you maximize your throughput ACKs might get dropped which slows everything down (I suspect that might be the issue you're talking about) again you could throttle or enable traffic shaping and give highest priority to ACKs. This worked well for me.

linsomniac 1304 days ago [-]
3. Bufferbloat. It's still a thing. dslreports has a bufferbloat score, one of my coworkers has been really impacted by it. http://www.dslreports.com/speedtest
takeda 1304 days ago [-]
I heard about bufferbloat, but I didn't think it was related to bittorrent maximizing throughput
linsomniac 1304 days ago [-]
It depends: packet loss due to bitorrent is a problem, but low loss and high latency makes it dramatically worse, and that is the bufferbloat cast. If you have 1-30 second pings (or I've seen even worse), when doing torrents or other heavy outbound traffic, it falls apart.
Dagger2 1302 days ago [-]
Even 1 second is already pretty severe bufferbloat, let alone 30 seconds. Your latency should increase by under 25-125ms even with upstream and downstream both running flat out.
imhoguy 1303 days ago [-]
But all this can be adjusted with concepts like pining, rate limiting, queuing, connection limits, even geo-fencing. I would gladly share % of my server's unlimited transfer to give back a tiny range of Debian or DockerHub packages which I have dowloaded and use. And I don't want to mirror the entire distros, which is condition to become an official host of distro mirror.
akerro 1304 days ago [-]
> In a lot of the world, upload speeds are very limited and ISPs charge an arm and a leg if you exceed your upstream data caps.

Not all 1st world countries have data caps. I pay ~20$ for month for 37mbps (15 upload) with no data caps (it's literally the cheapest plan I could find, if there was cheaper available, I would take it).

seniorivn 1303 days ago [-]
in Russia i have a ~$15 plan 1gbps (~700 realistically)up/down unfortunately it's clear that ISPs are not going to be able to upgrade their hardware nor keep prices low due to government interventions, government controlled ISPs competetion, mandated DPI censorship(paid by ISPs) so very soon internet in Russia will be as shitty as in most western countries that lack competition

P.S. until 2009-2014 russian internet was completely out of government sight. an average internet user in big cities had a choice of 3-10 ISPs competing prices etc

Avamander 1304 days ago [-]
With LPD, we could have two PCs upload to each other, it would be even beneficial to ISPs if everyone didn't have to download updates at a similar time from the furthest-away server. Especially in developing countries.
Avamander 1304 days ago [-]
I don't get why machines running the same Linux distribution couldn't actually do the same. BT2 would indeed work very well for this.
e12e 1304 days ago [-]
Nixos/guix might be a good fit for rootfs on ipfs? Comes with a local cache, and you could overlay local changes (eg: /etc/passwd)?
takeda 1304 days ago [-]
Actually not exactly rootfs on ipfs, but with NixOS is fairly easy to make it wipe data on reboot, here's an example article[1]. The idea is to only keep state we care about and rebuild the system on every boot.

[1] https://grahamc.com/blog/erase-your-darlings

shp0ngle 1304 days ago [-]
BitTorrent 2.0 is not recent, no.

The current bittorrent company is interested in chasing ICOs, I don’t think they work on new versions of the protocol anymore?

edit: yeah it’s from 2008

leppr 1304 days ago [-]
Going on a tangent but this article that came out today is a good read for anyone that didn't follow what happened to BitTorrent the company: https://www.theverge.com/21459906/bittorrent-tron-acquisitio...
capableweb 1304 days ago [-]
People are working on BitTorrent yes! libtorrents blog made a blog post quite recently (September 7th, 2020) - https://blog.libtorrent.org/2020/09/bittorrent-v2/
shp0ngle 1303 days ago [-]
That is libtorrent, unrelated to bittorrent, no? (organizationally)
capableweb 1302 days ago [-]
Yeah, maybe you were explicitly talking about the company BitTorrent Inc.

"work on new versions of the protocol anymore?" made me think of the BitTorrent protocol in general, sorry for the misunderstanding.

I think you're right that the BitTorrent company aren't directly involved in the protocol work, they seem to be focused on trying to push their plagiarized cryptocurrency project.

cpuguy83 1304 days ago [-]
Because changing things out is hard and the benefit of pulling in that much more code and complexity is dubious.

It seems like you could implement a p2p solution as a pull through registry... as in host a registry (even per node if you like) that Docker will use as a pull through cache, still speak the same distribution protocol, but have the registry use p2p to get the content.

arianvanp 1304 days ago [-]
Does IPFS actually solve the problem they set out here though?

IPFS is a distributed CDN; but not very good for storing things persistently or reliably from my experience.

At the moment; the nixos cache is stored in very durable and reliable S3 storage; with very high durability guarantees. Why is that not good enough?

sure it's centralised. But IPFS doesn't offer distributed durability; it offers a CDN. It doesn't seem to address this issue the authors seem to claim it solves. (To me)

I still think it's _super cool_. And once builds are also content-addressed and not just fixed-output derivations; having trustless content delivery is a valuable addition. But IPFS doesnt' seem like a robust answer for the "what if we lose access to the source code problem" given it's not a durable storage system

In this case; projects like https://www.softwareheritage.org/ and https://sfconservancy.org/ seem better bets to solve the source code access issue

imhoguy 1304 days ago [-]
S3 durability works as long as somebody pays for it. When the bill is not paid then it is 0.0000000 and objects are gone forever.

With distributed storage like IPFS or BitTorrent the availability of resource is proportional to its popularity. So as article explains, initially there will be group of seeders. I assume anyone who downloads an keeps a package on the system becomes a part of sharing swarm. This dramatically reduces the burden of hosting from package creators as the cost gets distributed among all available peers. As long there is one peer who has the complete package it will be available even forever.

atoav 1304 days ago [-]
Also not being subject to the fairness of one huge megacorp makes a little bit sense.
jFriedensreich 1304 days ago [-]
"Distributed CDN" sells ipfs a bit short. It has properties of a distributed CDN but usually people think of a CDN as not a good fit to store the primary assets, but ipfs can have pretty good persistence and reliability guarantees if setup accordingly.

Usually a project like Nixos would not use the simple standard ipfs daemon but something like cluster.ipfs.io and i think i have also seen ipfs servers on top minio or s3 for storing the data. So the actual databits durability and reliability can be the same or better than the s3 comparison with ipfs.

A discussion like this is best split up into the api and storage system parts: so S3 becomes the S3 Rest API and an S3 object store and IPFS becomes the IPFS access protocol (and IPFS gateway REST api) and whatever IPFS storage system a projects choses to use.

The cool thing about IPFS for package managers lies more in the protocol and architecture part than the low level data storage.

- content addressable storage as core design principle solves a lot of problems that need to be manually build on top of s3 anyways if you want to build a CDN or Package manager directly on S3, but in a standard and documented way . I would argue even without the p2p distributed part of ipfs, it would be a great fit for a content addressable storage system.

- "discovery": finding packages across different servers, this is such a game changer. Traditionally you would have 1 or 2 repositories configured and packages are fetched from those, if these servers do not have the right package or versions or are offline you are screwed (having to manually find alternative mirrors or finding copies of the package elsewhere). In ipfs, if your peers don't have a file, it can just ask the network for available "mirrors" and even users who have a copy and contribute to the network can provide you their copy automatically.

Taek 1304 days ago [-]
We built an alternative to IPFS called Skynet that attempts to solve a couple of the major issues with IPFS. The biggest one being data durability and uptime.

On Skynet, pinning doesn't mean hosting the file from your machine, it means paying a bunch of service providers to host the file for you. When you pin content to Skynet, you can turn off your computer 5 minutes later and the data will still be available globally. Just like IPFS, data is content-addressed and anyone can choose to re-pin the content.

Service providers are held to 95% uptime each, which means doing something like 10-of-30 erasure coding can get you 99.99% uptime on the file as a whole. The low uptime requirement for individual providers dramatically cuts costs and allows amateurs to be providers on the network. The erasure coding algorithms ensure data availability despite a relatively unreliable physical layer.

The other problem Skynet set out to solve is performance. In our experience with IPFS, if you aren't talking directly to a node that's pinning the content, IPFS is very slow. We've heard stories of lookups taking greater than 10 minutes for data that isn't pinned on a major gateway.

Skynet uses a point-to-point protocol rather than a DHT, which not only makes it faster, it's also more robust to abuse. DHTs are pretty famously fragile to things like DDoS and active subversion, and Skynet has been designed to be robust and high performance even when malicious actors are trying to interrupt the network.

Other than that, we've tried to make sure Skynet has feature parity to IPFS. Content-addressed data, support for building DAGs, support for running applications and static web pages, and then we've added a couple of elements of our own, such as APIs that allow applications and web pages to upload directly to Skynet from within the application.

https://siasky.net if you want to give it a try for yourself.

georgyo 1304 days ago [-]
Say what you will about IPFS and ProtocolLabs, but I really admire that they made IPFS completely blockchain agnostic.

There is no requirement to buy or receive coins to use IPFS. And multiple different implementations of persistent stores can use the same IPFS network to ensure durability.

This means that FileCoin, other coins, static hosting providers, self-hosting can all coexist and strengthen the same network.

Every other distributed content addressable file store seems to only be interested in supporting their own coin only. For that reason alone, I find IPFS the most interesting.

Ericson2314 1304 days ago [-]
Yeah the layering in Protocol Lab's work is really nice.

The tendency of funded / for-profit software is to eschew layering, so it's easy to try out but ultimately ill-fitting and inflexible. This is a core problem of capitalism with IT.

IPFS, libp2p, IPLD, Filecoin etc. all resist that temptation, and I think it will help them greatly in the end.

whyrusleeping 1304 days ago [-]
I really appreciate hearing things like this :) We put a ton of work into making the layering of our projects just right so that they can be reused over and over even if other things we're working on fail. It often does make it slower for us to make progress (could take soooo many shortcuts if we tried to collapse the stack) but I really do believe it's worth the effort.
brabel 1304 days ago [-]
This looks interesting. Why is the payment method based on a crypto coin? How does using a blockchain help make Skynet and Sia work?

I am wary of crypto coins as they tend to have wild swings in value over time as they are used primarily for speculation... can I be sure that if I upload something today worth $2/month, in 2 years I will still be paying this much or less?

hinkley 1304 days ago [-]
A common theme in distributed filesystem conversations is the idea of "socializing" the costs of intermittent loads. If I 'pay in' a little more than my mean traffic capacity, then my surplus during low traffic cancels out some of my peak traffic. Peak shaving and trough filling.

If you are doing business in India, you get payed in rupees. If the workers are in India, you just pay them in rupees. If you have to exchange currencies you end up with several types of friction that just create headaches and potential losses. If you periodically cash out or inject cash it's easier to deal with than on every transaction.

Denominating file replication services in a "coin of the realm" just seems like the same sort of rationale.

One of the problems with capacity planning is that you get punished for being wrong in either direction. You bought too much hardware or not enough, too soon or too late. With an IPFS or a Skynet, putting your hardware online two months before you need the capacity at least affords you some opportunity to make use of the hardware while your Development or PR team figures out how to cross the finish line.

kordlessagain 1304 days ago [-]
> Why is the payment method based on a crypto coin?

A blockchain gives a way to make a payment and provide identity/ecrypt functions to keep the resource private while it is active.

Yes, one could create a system that attaches other authentication (user/pass or oauth), but then one has to create/connect a payment system which then uses that login information in conjunction with credit card information. To sell a product online taking credit cards requires an excess of 10 pieces of information that must be provided by the user.

In the case of compute resources, I may want to deal with 100s of providers to host my resources (blog, images, video, code) and I'd need to use an intermediary if I wanted to be efficient about it.

With something like Lightning payments, a system like this can provide resources without the need for a "signup" process or intermediary.

> can I be sure that if I upload something today worth $2/month, in 2 years I will still be paying this much or less?

What does the future value of a fiat currency have to do with the current rate of storage on something like AWS? Would those prices not go up if the currency was undergoing deflation? Would you not have paid less integer values before the deflation? Would you not pay more integer values after? Where does something like AWS provide cost protection, other than spot instances?

If you create a crypto contract and put the funds in escrow, then there is ZERO ways for the cost to go up over the life of the contract. Other than a bad actor scenario, which is why having multiple providers is the way to go.

viraptor 1303 days ago [-]
> I may want to deal with 100s of providers to host my resources (blog, images, video, code) and I'd need to use an intermediary if I wanted to be efficient about it

Why multiple providers? With no-features data storage, S3 can easily provide that without an intermediary and in a pretty well automated way.

> Would those prices not go up if the currency was undergoing deflation?

Unless you're in a country with very unstable currency, the exchange rate change will be very slight. Sia changed ~60x over the last 2 years, which is significant.

The price of course won't change during the contract, but what happens after is not trivial to plan for.

Taek 1304 days ago [-]
Sia is a decentralization first protocol. We believe strongly that the biggest advantage is an immunity to de-platforming and a commitment to uncompromisingly open protocol.

Crypto is the only means of payment I am aware of that does not have a centralized middleman with the power to deny a transaction.

Decentralization aside, there are efficiency gains as well. Every transaction on the Sia network is point-to-point, and in some cases we've had nodes that average more than 1 million discreet transactions per day for over a month. The total cost of doing that was something like $100 (including the cost of all the resources bought with those millions of transactions), I struggle to imagine a traditional payment system providing that kind of value.

There's also a lot more flexibility to innovate. For example, every single one of our payments is accompanied by a cryptographic proof that the accompanying storage or computation (not many people know, but the Sia network does support a limited form of computation) action was completed correctly. The payment and computation are fundamentally/cryptographically tied together in way that we could not reasonably achieve on a traditional payment system.

vin047 1304 days ago [-]
Interesting project, though to be clear, you're using the Sia network and not IPFS right?

Also do you have Swift bindings for use in an iOS app?

Taek 1304 days ago [-]
Yes Skynet leverages the Sia network for storage and retrieval.

We don't have swift bindings but other developers have been able to make iOS apps without much trouble - the API is pretty clean.

Example: https://github.com/tetek/skynet-ios

the_duke 1304 days ago [-]
I imagine you would ensure the persistence of the data you care about by either running your own IPFS nodes that pin the data, or by using a pinning service like Pinata [1].

[1] https://pinata.cloud/

TheOtherWW 1304 days ago [-]
The problem with Pinata is that they charge $150 per TB for their Individual plan. That's nearly 6.5x the cost of storing the data on S3. Sure it works, but that high barrier to entry pricing scares off a lot of people. Why not just use S3?

Meanwhile, we're still waiting for Filecoin to launch, and networks such as Sia have seized that opportunity and created great things like Skynet [1]. Skynet itself still has some overhead if you want to ensure data persistence and availability, but the cost is orders of magnitude lower. In addition, new layer 2 providers have emerged to address those gaps, such as Filebase [2]. They provide S3 compatible object storage that is backed by decentralized cloud storage. You get high availability (of the storage layer), geo-redundancy, and less than S3 pricing out of the box.

It is this type of offering where we are going to see the most impact and adoption as the underlying technology not only makes things more efficient, but cheaper too.

[1] https://siasky.net/

[2] https://filebase.com/

akerro 1304 days ago [-]
I started moving images of my wordpress blog to ipfs using 3 most popular gateways. I'm moving slowly, image by image, but so far it was quite a success. As images expire from the gateways I had super simple and cheap IPFS nodes - unused raspberry pi! My main IPFS node is RPi 0W (the wireless one). It overall dropped my main page loading times, and it costs £5 (rpi) + £5 32Gb sd card. First images - the smallest and most often loaded few were migrated 7 months ago.

https://github.com/claudiobizzotto/ipfs-rpi

Shared404 1304 days ago [-]
I don't know what your software config on the pi is, but if it's writing to the SD card a lot it'll wear out and fail pretty quickly.

Just a heads up from someone who's been bitten.

akerro 1304 days ago [-]
I hear it in every discussion, yet I'm still waiting for any of my SD cards to fail. I've had the same rpi0 for 3 years as CCTV with motiond (motion detector). Recording all activity from my window, filling the card in 100% and removing all recorded data, every 2 days, no errors, no failure. I no longer need that CCTV so using the rpi, with the same SD card, as IPFS node.
bityard 1304 days ago [-]
IME, the people who complain about SD cards failing on raspberry pi bought cheap cards that were intended for bulk storage, not frequent writes or running an OS card. Regular SD cards are not SSDs.

On my network of random Pis and other stuff, I use only high-endurance SD cards which can withstand lots of writes have durability much closer to SSDs.

3np 1304 days ago [-]
Here’s from someone who only bought the supposedly highest recommended ones and had several failures before I switched to usb. Maybe roughly average 10-20% failure/ year uptime for something that has heavy log writing?

A thing people tend to forget is temperature. Keep them cool and they have higher chances of surviving longer.

akerro 1304 days ago [-]
Yes, that's correct. I only buy original Samsung or SanDisk SD cards that come with 5+ years warranty, and then I keep original packaging anyway.
Shared404 1304 days ago [-]
If it works it works :)

I just had it happen so I figured I'd bring it up just in case.

jacobush 1304 days ago [-]
Use external USB SSD (what I did for a mail server). One could also be very aggressive with making the kernel lazy about flushing to SD card. (It has downsides, but if you put a UPS shield on, it's a pretty neat hack.)
literallycancer 1304 days ago [-]
How were the images delivered before?
akerro 1304 days ago [-]
hosted on my wordpress on a dedicated OVH servers.
Ericson2314 1304 days ago [-]
Author here!

I absolutely agree it's hard to go head-to-head with s3 in the short term --- this is why I am most excited about sharing sources. Once the ecosystem is bootstrapped, it will make more sense to use IPFS for binaries too. (e.g. if you wanted to build some fancy multiple build farms and reputation system.)

> https://www.softwareheritage.org/

I do really want to work with then!

- IPFS as CDN means software heritage can be "seeder of last resort"

- Original authors uploading to CDN, using IPNS or similar for git tags/versions, should make it easier for software heritage to archive the code in the first place.

canndrew2020 1304 days ago [-]
> Does IPFS actually solve the problem they set out here though?

No, none of these distributed P2P networks (that I've seen) do. The problem isn't just building a DHT (kademlia-based networks have existed for years) the problem is incentivizing people to seed - ideally people with high-bandwidth and massive amounts of storage who are seeding data that people want. In other words you need to build an economy on top of the network.

Cryptocurrency could be used for this, so long as its a cryptocurrency that supports instant micro-transactions (ie. you don't want to be writing to a blockchain every time you download a 10KB gif). So maybe someone will get around to building an IPFS-clone but on top of bitcoin's lightning network or something like that. Clients would need a way to decide which peers to send requests to based on who's offering the best speed/reliability vs price. Servers would serve higher-paying requests with higher-priority and would drop any requests that are too stingy to even cover bandwidth costs. Using the network wouldn't be free but it would be extremely cheap and fast and reliable.

One problem this doesn't solve is getting other people to backup/seed your data for you. An idea I think would work for that would be a prediction-market based reputation system for peers acting as storage hosts. That is, peers could advertise themselves as storage hosts, you could upload your data to them (for a fee) and they'd give you a cryptographic receipt promising that they'll still be able to deliver the data at some later date. People could then make bets on whether a host will fail to uphold any promises before a certain date, and the betting odds would be a measure of a hosts reliability. Clients that are uploading their data to the network would take that reliability measure and the price into account when choosing hosts to upload to. At any point a client could publicly challenge a host to provide proof that they still have the data, and if the host fails to provide proof the bets would close in favor of the punters who betted against them. Otherwise, once the bets expire, they close the other way. This would all need to be built on top of a blockchain though you couldn't use bitcoin for this until/unless they add support for covenants or sidechains.

vin047 1304 days ago [-]
Protocol Labs (the guys behind IPFS) have been working on https://filecoin.io to address this exact concern – incentivising people via micropayments in their cryptocurrency (filecoin) to pin and seed files.
Ericson2314 1304 days ago [-]
Yes it's really good to have the economics (mechanism design) and infrastructure in separate layers.

Also, I'd argue more important than even having the hosters is having the content addresses. We need well-known immutable data for people to want in the first place. And traditional system bury data under so much mutation/indirection that it's hard to know what that content is, or that content-addressing even exists.

I highly recommend https://www.softwareheritage.org/2020/07/09/intrinsic-vs-ext..., which is about software heritage trying to get the word out to the larger library/archival/standardization community that content addressing and other "intrinsic" identifiers are possible and desirable.

Git and torrents I think is the best counterexample to the above, and there is probably more legally-kosher git and bittorrent usage, so I am especially bullish on Git hashing being the bridge to a more distributed/federated world.

jononor 1304 days ago [-]
As a org supoporting a curated content-set, one could implement IPFS backed by S3 persistence (or some other cloud bucket). Possibly as an "open frontend" part of ones CDN infrastructure, where others can also freely "peer" and contribute CDN power. The benefits for package management I believe can be pretty good, given widespread deployment. At our office, or hackerspace there are many computers which are likely to have the packages. Though until these things are enabled by default (can we ever get there?) I suspect only large IT departments or very interested people will set it up, unfortunately.
Proven 1304 days ago [-]
> sure it's centralised.

If I can access a copy - either on AWS S3 or anywhere else - and make a verbatim copy or cache it for myself, I wouldn't call that centralized.

Having one of publicly accessible copies on AWS S3 doesn't make the data centralized.

ingenieroariel 1304 days ago [-]
This is a big deal, congratulations on the milestone. Really looking forward to trying out Reflex FRP / Obelisk from IPFS, in particular the focus on making the source available because fetching Nix packages from semi-public caches and not being able to rebuild them is something that has bothered me for a while.
nmfisher 1304 days ago [-]
Interesting use case for IPFS, which has often felt like an (admittedly cool) solution in search of a problem.

Can someone enlighten me as to real-world examples where actually reproducible builds are critical?

arianvanp 1304 days ago [-]
For one. They allow for distributed incremental builds. Improving your developer productivity. IT doesn't matter anymore where an object file is compiled. You just need to know its hash, ask if it exists, or otherwise compile yourself.

One problem with NixOS currently is that certain dependencies in our package tree are very painful to change. If we touch glibc, we need to recompile 60.000 packages and it takes A LOT of compute power to do that.

With reproducible, content-addressed builds we can do things like early cutoff optimisations; making these changes less painful. SImple example; if somebody just changed a source code comment in glibc; then we get the same build artifact, and can skip building 60.000 packages.

If you do this at the source code level instead of the package level (like Bazel) then somebody can change how glibc does domain resolution; but packages that don't depend on that don't need to be recompiled either.

teacpde 1304 days ago [-]
I understand why changing comments doesn’t matter, but how do you decide which dependencies don’t need to be rebuilt when certain part of the source code is changed?
Ericson2314 1304 days ago [-]
Nix does this simply and stupidly --- if you can observ it, it's part of the cache key.

This is wonderful because it's totally sound. All efficiency methods (basically, hiding things, as the other comment says) is left to the build plan itself.

Switching to Nix for batch jobs is switching from DOS to an OS with actual process isolation. A complete paradigm shift.

sterlind 1304 days ago [-]
It does still require trust, though. You can lie about the contents since the hash only covers the build inputs, so someone could break the sandboxing.

Of course, since builds are reproducible you can catch this if you build it yourself and compare the actual content-hashes.

Ericson2314 1304 days ago [-]
I would say the trust issues with build remotes and substitutors are fairly independent of the caching policy?

The caching model is about when the drv changes ((drv, output) is the cache key, basically), the trust model is one one can download something to avoiding build to fill in the value for the new cache key.

Note also that with the new floating content-addressed derivations, the trust stuff is much easier to think about.

kevincox 1304 days ago [-]
The most "proper" way to do this is splitting the result into interface + implementation. For example if you are dynamically linking to a library instead of being given access to the entire library you are given a simplified "signature" which has the minimal amount of info required to linking. This interface would change when you add or remove a function, but not when a function body changes.

The example was given for linking but this can be done for many steps of the build process.

Taek 1304 days ago [-]
Aren't reproducible builds pretty well established as being important for security? How do you know that the developers are providing you with the binaries that they claim?

Most people do not compile their own software, they used signed binary distributions being given to them. If you have reproducible builds, third parties can more easily verify independently that the code being distributed is the same as the code published in the open source repositories.

This doesn't just keep dev teams honest, it also provides defenses against situations like those where a hacker gains control of the website being used for distribution. I believe reproducible builds have helped to catch this real life issue at least once in the cryptocurrency community (with Monero).

To me, that's enough to justify their existence.

chpatrick 1304 days ago [-]
Our team uses Nix. It lets us share a binary cache of modified/own packages between the developers, CI, and our production machines.

If you push something to CI that needs some new dependencies, the CI machine will build them and when you want to get a local dev environment or deploy, the binaries will be fetched from the CI machine. We can also send the binaries between machines to save compile time. This is possible because we have confidence that it doesn't matter which machine built them because the environment is so strictly controlled.

zelly 1304 days ago [-]
There are much more real-world use cases for reproducible builds than for IPFS. It saves time and money. It's not just academic.

Reproducible builds (as in bit-for-bit) means I can trust a (secure) checksum of a binary I downloaded somewhere, let's say from IPFS. This is defense-in-depth, adding another layer of security to prevent sabotage.

Reproducibility means the whole system can reliably be built from source and I don't need to manually make sure the system has the correct constellation of implicit dependencies and their version numbers. This saves time and money in deployment and will pay off in a very big way 10 years down the line when that code is still in use somewhere.

Foxboron 1304 days ago [-]
Confusingly enough, when NixOS talks about reproducible builds they are not talking about https://reproducible-builds.org/. They are talking about reproducible systems.

NixOS doesn't solve reproducible builds as-in "bit-for-bit identical binary distributed files", but it does solve "same input" -> "functionally same system". This is the core idea of NixOS and what makes the package manager and the spawned ecosystem (like Guix) quite neat. I also think this isn't any more controversial then say having your infrastructure reproducible with terraform and ansible/salt. It gives you machines as cattle, and not pets, with the added flexibility that gives you. NixOS is just a complete package where in other systems there are several components accomplishing the goal.

As to why reproducible builds is important is to ensure a strong connection between the upstream source code and the distributed binary build. This gives you confidence that everything going into the build has been declared and that the binary can be reproduced bit-for-bit identical if you so wish.

This can be both important in a supply chain process. I recently saw tailscale provided pre-compiled binaries without signatures. What does those binaries contain? They are not signed, so the releases could very well be unauthenticated and someone could have compromised the server distributing the binaries. So I tried reproducing the tailscale binaries without much success as the build process is proprietary.

https://github.com/tailscale/tailscale/issues/779

So what does the tailscale binaries contain? I can't get a bit-for-bit identical binary without putting into a great deal of effort so for all intents and purposes they can very well be proprietary.

Now I don't think the tailscale people are malicious, nor that the binaries have been replaced. But it's a real-world example I recently went through.

solarkraft 1304 days ago [-]
Even when the binaries are signed you'd still have to trust Tailscale they didn't modify the code before compilation, maybe putting in a simple back door. They probably don't, but you can't really know.

When you have binary reproducible builds such a thing would immediately be obvious and hell would break loose if they attempted it.

Ericson2314 1304 days ago [-]
The floating content-addressed derivations we've worked on as part of this Nix × IPFS work get's us a lot closer to reproducible builds by making non-determinstic builds far more noticiable.

See these:

https://github.com/NixOS/rfcs/pull/62

https://github.com/NixOS/nix/issues/4087

Foxboron 1304 days ago [-]
Making it noticeable isn't going to get you closer to reproducible builds though. There is still a quite a lot of effort that needs to go into patching individual packages and fix the non-determinism introduced by the compilation process which NixOS can't solve by adding tricks.

And as an outsider I'm unable to read the RFC and understand what is going on and what tangible issues it solves for Nix.

Ericson2314 1304 days ago [-]
The Nix approach is any package-specific tricks are left to the plan writer, and Nix's job is to sandbox things. We ought to look things down further and further until build jobs can be deterministic by construction.

Until that ideal is reached, you could call "leaving it to the plan writer" a cop-out, but what else is there to do? It's the stop-gap.

The benefit of the new system is that all store paths can be content-addressed. That means the inputs of all build stems can be content-addressed, whereas before they might also be "input-addressed" paths whose contents are also a matter of trust. This means one takes "small step" rather than "big step" trust steps, and also that upstream non-determinism won't "pollute" downstream builds.

adsjhdashkj 1304 days ago [-]
Tangent, what's the best way to use Nix-like features in a mainstream distro? Eg, i've been evaluating PopOS recently because i want a Mac equivalent OS. Ie, i don't want driver/config issues.

However one problem i never see solved from any "normal" distros is reproducible systems. Hypothetically if i used Nix as a Desktop my config would be bulletproof. .. but then i'm going through a fair amount of work configuring everything when, as established, i want none of it.

So i (as a user) seem to want some middleground between no effort installations / configurations of my Desktop, with reproducible snapshots/states/configs/something.

You can use Nixpkgs on other distros/OSs, so maybe that is enough. But is there a better way?

MarcScott 1304 days ago [-]
I just bit the bullet and use NixOS as my Desktop. There's a bit of work after initial install, to setup your config files the way you want, but after that you're all set. I've had no problems transferring my configs when I change hardware, and end up with an identical environment that I had on my previous computer.
kevincox 1304 days ago [-]
I recently switched as well and am very happy. I don't think that re-configuring takes much longer that the average distro would especially because a lot of services you can just say `services.foobar.enable = true` and get a decent default config. You only need to spend your time configuring the things that you are picky about.

And yes, the fact that I can configure all of my systems with a shared set of configs and have consistent and reproducible environments is fantastic.

I wrote up my thoughts about the switch here: https://kevincox.ca/2020/09/06/switching-to-desktop-nixos/

adsjhdashkj 1302 days ago [-]
Sidenote, i'm taking the plunge into NixOS, but notably using Flakes.

While i've dabbled in Nix in the past some things felt more odd to me than perhaps they should have. Primarily the fact that it felt like all these configs were spread out and i didn't understand how Nix wanted me to version them.

Flakes (from early tests) seems to make this very clear. As it starts from a repo, so i'm attempting to make all my configs, including Home Manager (of which also feels weird lol) start from the Flakes installation.

Flakes also solves the reproducibility problem that i didn't get why NixOS had. So far it's really neat.

wesnel 1304 days ago [-]
Personally, I use NixOS in conjunction with Home Manager[0] to more conveniently use Nix to manage my user-level config. The configuration that I manage with Home Manager includes my Emacs config, my Git config, and my Bash config. Additionally, almost all of the other programs I use on a daily basis are at least installed through it. Since it’s primarily only used for managing some packages, the configuration process was simple compared to setting up NixOS for me.

I would imagine that you could use Home Manager on a non-NixOS system to at least create reproducible configs for the programs you use, although the OS as a whole would of course still be non-reproducible. However, I do not know how well Home Manager works on non-NixOS systems.

As you mention, just using Nix itself can be sufficient to get a reproducible set of packages on your system. I recall reading a blog post about someone who does this on both Ubuntu and MacOS.[1] The way this person does it is interesting because it’s more sophisticated than spawning the occasional ‘nix-shell’ or something. For example, they get the benefit of Nix “generations,” with a new generation being created each time they modify their declarative config files.

[0] https://github.com/nix-community/home-manager [1] https://www.nmattia.com/posts/2018-03-21-nix-reproducible-se...

sterlind 1304 days ago [-]
I've found home-manager to be more trouble than it's worth for most things:

* Config files aren't always stored in text (e.g. KDE Plasma, Gnome)

* It's all-or-nothing usually, unless you can find a way to make the home-manager config your "base"

* Changing settings isn't integrated, and requires editing/rebuilding/reopening (e.g. I can't just hit Ctrl+ to increase font size in VS Code, I have to edit a config file, run home-manager switch, then usually reopen)

* Plugins, especially things like extension stores, are a hack and a half. You have to hunt for hashes, then change it every time your old version falls off the CDN.

It's a nice idea, in theory, but it only works in a vacuum.

yjftsjthsd-h 1304 days ago [-]
I've been using ansible to configure my laptops for years now; it automatically installs most of the packages that I'm going to want, clones things like my dotfiles into my home directory, etc.

Edit: Oh, and obviously I keep the playbooks+roles in version control.

Ericson2314 1304 days ago [-]
Just use NixOS. Our defaults are more like Debian/Fedora than Arch/Gentoo. You're not going to be struggling to hunt down drivers.
iElectric2 1304 days ago [-]
See also https://cachix.org for a hosted binary cache.
alexmingoia 1304 days ago [-]
One thing I’ve always wondered, is how DHTs like BitTorrent and IPFS prevent Sybil attacks.

If anyone wanted to block a file in IPFS, could they generate node ids close to the filehash and return empty peer lists?

whyrusleeping 1304 days ago [-]
Yeah, sybill attacks are pretty hard to get around. IPFS itself doesnt entirely depend on a DHT though, it uses it when it can, but also falls back to a more gossip style approach of just asking peers you connect to for the data. So if you wait long enough, and the content is out there, you're very likely to end up finding it as your node makes and receives random connections throughout the network. This does still mean that the sybill attacker can severely degrade the quality of service though, so other solutions are getting looked into. The 'easiest' one that comes to mind is just forming an incentivized DHT of some kind, with the simplest start to that being a requirement that all DHT server peers stake some funds. There are a lot of weird things in this problem space, and its still pretty young IMO
ArchD 1304 days ago [-]
IPFS is not censorship resistant. If people start using it to subvert tyranny, politically neutral things like Nix's use of IPFS can get collaterally damaged in its infancy by state power.
jFriedensreich 1304 days ago [-]
Its slightly more censorship resistant than S3 but yes, that seems to not have been on the list of problems they try to solve. On the other hand a list of all the systems that solve the problems in a half decent performant manner and are also censorship resistant would be 0 to my knowledge.
PureParadigm 1304 days ago [-]
I think censorship resistance should be handled at a different abstraction layer. IPFS can focus on peer-to-peer immutable file sharing and some other tool can tunnel your traffic to avoid censorship. You could run IPFS over a VPN or something like that.
vander_elst 1304 days ago [-]
Does anyone know if there are authz features planned for ipfs? Like having acls based on certificates or something...
e12e 1304 days ago [-]
As far as I'm aware the "official" ipfs line is ipfs host and share data - if you want to protect it; encrypt it.
majewsky 1304 days ago [-]
Is it down for everyone or just me? Firefox is showing PR_CONNECT_ABORTED_ERROR when trying to navigate to the link.
Shared404 1304 days ago [-]
It works for me, but I'll drop this just to be safe:

https://outline.com/vWwGas

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:42:03 GMT+0000 (Coordinated Universal Time) with Vercel.