NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
What Colour are your bits? (2004) (ansuz.sooke.bc.ca)
dang 1264 days ago [-]
teach 1264 days ago [-]
And I submitted it in 2017 but it didn't get any traction. Glad to see it again!
jchw 1264 days ago [-]
This is actually very helpful to mentally understand why people who are not thinking with regards to the logic that software works at don’t see eye to eye on these issues. I would have never imagined something like nuance on copyright would somehow weave back around into opinion about obscenity law, and yet this is about as elegant as any explanation of it.

This is my first time reading this, but I suspect it has become a new part of my mental modeling of the world for years to come.

not2b 1264 days ago [-]
I think it would be better to think in terms of information theory, rather than the hypothetical color of bits. If I have a legally obtained copy of a copyrighted song (that is not permissively licensed), then I am restricted in my right to share copies. Certainly I can create a one-time pad and communicate that to someone, and I can XOR the bits of the song recording with that one-time pad, and the author would argue that I now have colorless bits. But I send it to you along with instructions for how to decrypt it, I'm using a mechanism to communicate a perfect copy. It isn't that the ciphertext is now "colored", it's that the net effect of my mechanism is that I've communicated the original song, with perfect fidelity. That's what matters: the net effect of the system as a whole. I will have created a communication channel (perfectly legal), and used that channel to share a copyrighted work without the creator's permission (not legal, depending on the details).
czzr 1264 days ago [-]
> I can XOR the bits of the song recording with that one-time pad, and the author would argue that I now have colorless bits.

No, the author would say the opposite. That’s the whole point - the process by which you get to the bits matters.

not2b 1264 days ago [-]
OK, you are right, he is using "color" in a funny way to reflect the idea that the violation is somehow in those bits. But in my opinion, the violation is in the channel as a whole (the ciphertext, the one-time pad, and the instructions on how to decrypt, shared for the purpose of distributing copies), not because of "colored bits".
jerf 1264 days ago [-]
Your opinion is an engineer or computer science opinion. I am not criticizing when I say that; I share it with you. But the opinion of most of the rest world is not with us, and this is a good essay explaining that opinion. It is important to understand it if you want to understand the world, make correct predictions about how most people will operate in these matters, or figure out how to best change people's minds. (I can definitely speak from experience that the direct approach is not very effective. Can't tell you what is, unfortunately...)
joshAg 1263 days ago [-]
on the one hand, yeah it's funny and arbitrary. On the other hand, it's how at least the US legal system understands, litigates, and enforces IP laws
harshreality 1264 days ago [-]
The moment you xor your one time pad (OTP) with the song to generate a ciphertext C, both OTP and C become, you could say, conditionally-colored.

You're not restricted in redistributing one, so long as you don't redistribute the other. If you redistribute both, even in different channels, logically that's a copyright violation unless you have sufficient controls or restrictions to prevent further redistribution until someone downstream from ending up with both and being able to recombine them.

Realistically, the only reason you'd generate C and distribute both OTP and C is so that someone eventually gets them both and can reconstruct the song. Trying to claim you didn't have that intent wouldn't work in a civil context, and might not even work in a criminal context.

aftbit 1264 days ago [-]
Could we add [2004] please? This is a great article, always glad to see it pop up again!
wizzwizz4 1264 days ago [-]
Colour does exist. It's the Second Law of Thermodynamics: information cannot be created, only lost, and creating a faithful copy of document A requires you to discard an equivalent amount of information (usually in the form of “there's usable energy over here”, in a very lossy process, but physics doesn't require this) – which you can then recover by comparing your copy A with the original document A, and discarding one.

You're not going to get the Complete Works of Shakespeare by any means other than by copying an existing copy of the Complete Works of Shakespeare; the odds are astronomical. You might get a description of calculus by other means (expending the effort yourself to produce it), but it's not very likely you'd come up with it before somebody else made such a discovery public, unless:

• it's recently become obvious that such a thing would be useful; and

• it's easy enough to discover that you could figure it out in a year or two; and

• enough people were looking in that direction at the time (or one of them was a secretive sort of person).

That's not to say that authorial monopolies are necessarily a good thing. But they are a meaningful concept.

youzicha 1263 days ago [-]
The blog post writes:

> Suppose you publish an article that happens to contain a sentence identical to one from this article, like "The law sees Colour." That's just four words, all of them common, and it might well occur by random chance. Maybe you were thinking about similar ideas to mine and happened to put the words together in a similar way. If so, fine. But maybe you wrote "your" article by cutting and pasting from "mine" - in that case, the words have the Colour that obligates you to follow quotation procedures and worry about "derivative work" status under copyright law and so on.

There was a real court case in 2012 which I think is interesting because it's very similar to this example. A photographer was accused of "copying" the concept of taking a photo of a red bus in front of a grey Houses of Parliament. He defended himself by saying that that those ideas are very common and should not be copyrightable---but failed:

https://youzicha.tumblr.com/post/162846191544/what-colour-ar...

goto11 1263 days ago [-]
If I understand the verdict correctly it says that it doesn't matter that the visual idea was trivial and that many other people have come up with the same idea independently. What matters is that this particular photographer deliberately wanted to copy a known image.

A very good example of "color", since the exact same photograph (same bits) would be non-infringing if the photographer had got the idea independently.

woliveirajr 1263 days ago [-]
I just read this about 5 years after it was published.

Working with computers the whole life and graduating in Law, I always had the problem of not understanding how my colleagues could think that the bits from one email where different from the bits of something else when you were doing a dd from one disk, or how reading one disk was "breaking" some correspondence secret....

whitten 1264 days ago [-]
This is an elaboration of the idea that where data comes from is just as important as what the data happens to be. In expert systems this is related to the logical justification of a truth statement, as the statement is still true or false, but there is a Pedigree or Provenance for the data provided as metadata about the data stored.
skybrian 1263 days ago [-]
There is the thing itself and there is the history and context of the thing, and they are both important. We try to keep them together, but the connection is often fragile.

One example: a product and its price. The price roughly summarizes information about how it was made and how valuable people think it is, which are not attributes of the thing itself and often can’t be deduced from it. In some cases we physically attach a label, the price tag, to keep track. In other cases we attach a lot more info.

Another example: a photo and information about when and where it was taken. For a photo to serve as evidence we need a reliable history. In court, this is the chain of custody. Too often on the Internet, we pretend that having the photo is proof enough, but without knowing its history, it could be a fake.

goto11 1263 days ago [-]
This is a great metaphor because you will often see people with a technical background arguing copyright law is if it was purely about the bits. Because for us, the bits are the only real thing. But the law does not see it like that.
kabdib 1264 days ago [-]
I've wondered how long it will be before input events are traceable to specific devices (e.g., you type 'A' on your keyboard, registered to you, and get a scancode . . . plus a few hundred bits timestamped and signed by the device before anyone can actually treat it as an 'A' keystroke). "Secure Unicode," anyone?

I figure this kind of dystopian mechanism for input provenance is at least 100 years out. Please don't anyone prove me wrong.

wmf 1264 days ago [-]
This is reminiscent of Vinge's Rainbows End where 90% of every chip is DRM and 10% is actual functionality.
Out_of_Characte 1264 days ago [-]
Your keystrokes are safely stored on Microsoft servers to improve Cortana and the advertisements you'll see
mindslight 1264 days ago [-]
Medium grey and opaque, thanks to cryptography.
troublesom 1263 days ago [-]
It'll be time to update to 2020!
jes5199 1264 days ago [-]
this is almost a description of a type system
AnthonyMouse 1264 days ago [-]
This essay is written to create the impression that it's imparting something profound, but it's really just identifying the existence of side channels.

We encode bits in all kinds of things. You can store some bits on a flash drive. Then you can write some words on the outside of the flash drive. They're both just bits.

Whether you own a flash drive with some bits on it which is in a safety deposit box at a bank may depend on the bits in the bank's computer and not the bits on the flash drive, but it's still bits that it depends on.

The examples it uses aren't accurate:

> Maybe you were thinking about similar ideas to mine and happened to put the words together in a similar way. If so, fine. But maybe you wrote "your" article by cutting and pasting from "mine" - in that case, the words have the Colour that obligates you to follow quotation procedures and worry about "derivative work" status under copyright law and so on.

As it turns out copyright law doesn't really care about this, even if people might expect it to, because the law is too pragmatic for that. Proving that you came up with some particular phrasing is hard. Disproving it is hard too. So in practice the courts don't look at whether you actually copied the bits. They look at whether you could have copied them ("access"), which they then go on to assume is the case for anything widely disseminated (since establishing that would be hard too), and then whether the bits are similar ("substantial similarity"):

https://www.theiplawblog.com/2007/02/articles/copyright-law/...

Whether you actually copied the bits doesn't come into consideration, apparently.

Because the courts can only take into consideration the information they have available to them. Which is all bits, because all information is bits.

> You take a file to which someone claims copyright, mix it up with a public file, and then the result, which is mixed-up garbage supposedly containing no information, is supposedly free of copyright claims even though someone else can later undo the mixing operation and produce a copy of the copyright-encumbered file you started with.

There is still no Colour here, and the essay is missing a rather decent practical attack on the "Colour theory" version of the copyright system.

Suppose Alice publishes R1 and Bob publishes R2. R1 is Alice's message xor Alice's one time pad. R2 is Bob's message xor Bob's one time pad. The one time pads are "random".

Then it's discovered that R1 xor R2 generates a third party copyrighted work. Which is statistically impossible unless either Alice or Bob (but not necessarily both) chose their one-time pad specifically in order to cause this.

According to "Colour theory" the one who chose their one-time pad specifically in order to cause this has given their ciphertext the "Colour" of the copyrighted work. But that's a real problem in practice when there is no way to tell which one it was. The other person may not even be in on it. So which one do you haul into court when you don't know that? Which one do you take down?

In real life if something like that becomes popular what happens is not that they figure out who it really was that created the derivative work, it's that they come up with some kind of disgusting hack like the DMCA takedown process which imposes no practical consequences on fraudulent takedowns, and hope that the innocent victims of the collateral damage don't have enough political clout to do anything about it.

It seems like the same fallacy as the model of the law that we teach to high school students. Computer scientists understand that it's wrong. Lawyers understand that it's wrong. But certain people benefit from pretending that it isn't in front of the general population because the misleading abstraction is prettier than what actually happens under the hood, and a better understanding of the latter would make people upset.

goto11 1263 days ago [-]
> Suppose Alice publishes R1 and Bob publishes R2. R1 is Alice's message xor Alice's one time pad. R2 is Bob's message xor Bob's one time pad. The one time pads are "random". Then it's discovered that R1 xor R2 generates a third party copyrighted work.

It is common among computer people to think the law can be hacked like an algorithm. It does not work like that. If you xor two apparently random files and they surprisingly produce the full text of the Harry Potter series, you do not have plausible deniability if you start distributing it.

AnthonyMouse 1263 days ago [-]
You're missing the attack in exactly the same way as the author does.

The same person doesn't distribute both of the files. Two different people distribute two different files. One of them is totally innocent and the party distributing that file doesn't even have to be in on it or have any relationship with the other person, but there is no way to tell which one it is.

The legal system is forced into either punishing and taking down the innocent file or not doing so for the infringing one. There is no other option when you can't distinguish between them.

But it isn't supposed to do that to the one which is just an ordinary use of a one-time pad by an innocent independent third party who has e.g. posted it in a public place for the intended recipient of the non-infringing message to receive it without there being a direct one-to-one communication between sender and the recipient. Or because there are multiple intended recipients and only those with the correct pad can read the original message so it's safe to publish widely.

The fact that some totally different person has come along and used your published file to encode an infringing one is not supposed to affect your legal status. But if nobody can tell which one is the original, the legal system has to choose between punishing the innocent and not punishing the guilty.

It isn't an algorithmic problem, it's an evidentiary problem. There are two different sets of bits and one is supposed to have a different "Colour" but the legal system has no information as to which one it is.

It's like someone discovering that the flashlight on certain phones is bright enough to blind surveillance cameras, and when someone points out that criminals could use this to prevent surveillance cameras from capturing their faces while they're committing their crimes, you respond that the legal system doesn't work like that because having an effective way to avoid being identified doesn't make your conduct legal. But that wasn't the original claim.

goto11 1262 days ago [-]
Have this scenario actually played out in court, or are you just speculating what would happen?
AnthonyMouse 1261 days ago [-]
Suppose that Alice is an innocent bystander who has done nothing more than publish some innocent data encrypted with a one-time pad and Bob is a pirate who xors a copyrighted work with Alice's data and publishes it. Or vice versa. Anybody who downloads both of them can xor them together and get the copyrighted work, but only one of them was actually derived from the copyrighted work, and you don't know which one.

There are only three things that can happen next, right? Either you punish both Alice and Bob even though one of them is innocent, or you let them both go even though one of them is guilty, or you punish only one of them arbitrarily and thereby, because they're indistinguishable, have a 50% chance of punishing the innocent person while the guilty one goes free.

Which one of those would you propose the legal system should do in that case, and why?

goto11 1261 days ago [-]
The legal system would of course persecute Bob the pirate, and possibly also everyone who purchases or consumes infringing material distributed by Bob.

The whole xor scheme is irrelevant. If you give people a file and the information about what other file to xor it with to get the cleartext, that is just the same as giving them the cleartext straight away.

AnthonyMouse 1260 days ago [-]
But you still don't know who that is. How do you know the pirate is Bob? It could be Alice.

Nobody said you were getting the information on which two files to download from Alice or Bob. Those are just URLs, which could be hosted by a third party, and are tiny so much easier to host on a system which is extrajurisdictional or anonymous.

And if you don't know that it's Bob, under what justification would you punish people who download things from Bob?

I mean suppose Alice is Google and Bob is Dropbox and the two URLs are hosted on The Pirate Bay. Which service do you even propose to remove the file from? According to the rules the innocent one is supposed to stay up.

goto11 1260 days ago [-]
You suggest Bob can publish the "key" to some anonymous extrajudicial server so it can't the tracked to him. Well if this is possible, why wouldn't he just post the full unencrypted movies (or whatever) there instead of bothering with the xor'ing scheme? The xor'ing doesn't change the legality of anything.

> And if you don't know that it's Bob, under what justification would you punish people who download things from Bob?

Er...under the justification that they are downloading infringing material?

AnthonyMouse 1260 days ago [-]
> Well if this is possible, why wouldn't he just post the full unencrypted movies (or whatever) there instead of bothering with the xor'ing scheme?

The movie is 30GB. The URL is 30 bytes. It's like asking why The Pirate Bay uses BitTorrent instead of hosting the movies directly on their servers.

Or how about this. The full list of URL pairs is provided after the end of each movie, so if you get one pair you get all of them. And the same scheme is also used for all kinds of things that aren't allowed to be distributed everywhere, like public domain or permissively licensed works that are banned in some countries over content.

If someone openly posts the URL pair for one of those works, which is permissible to distribute in the US because it's not copyright infringement and the content is only proscribed in some other country, would you punish them for that just because at the end of the work they actually intended for people to watch, someone else had included the URLs for all the copyrighted films?

> Er...under the justification that they are downloading infringing material?

Not if Bob was the innocent party, which you still don't know.

goto11 1259 days ago [-]
The judicial system looks at which persons acted with the intent of committing a criminal act.

Are you suggesting the judicial system would treat Bob and Alice as equally guilty because the bits in the xor'ed infringing material are coming equally from both files? That is not how it works. One of them acted with criminal intent, which is the one who will be persecuted.

Of course you can't see from the bits themselves who the guilty party is. But in this hypothetically scenario you could just look at the timestamps on the files.

A person who purchase or download the material is also guilty of copyright infringement. And it doesn't matter if the system can figure out who distributed it in the first place - consuming it is an independent crime.

It doesn't matter if the material was distributed as one unencrypted file or as multiple fragments on different servers which has to be combined or whether it was hidden among public domain material, or any other clever scheme.

ziml77 1263 days ago [-]
I see it all the time and it's quite frustrating to see people being so naive. The law is not purely mathematical and algorithmic. I think a good example that moves outside of IP law is murder vs manslaughter. Two identical killings could fall under different charges simply due to what the killer was thinking at the time. And we want it that way. It would be unfair and not accomplish anything good to treat an accidental and an intentional killer the same way.
goto11 1257 days ago [-]
I don't know why you were downvoted because you are exactly on point. In the GP example, two files can be xor'ed to yield some pirated document. The poster thinks this would require the judicial system to punish the creators of both files equally because both files contribute bits equally. But the judicial system looks at intent, and only one of the files was created with criminal intent.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 15:51:38 GMT+0000 (Coordinated Universal Time) with Vercel.