NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
I'm Open Sourcing the Have I Been Pwned Code Base (troyhunt.com)
rethab 1327 days ago [-]
For the lazy ones searching for a github link: It is not open source yet:

> HIBP isn't in a state to simply flick the visibility of it in GitHub, but it needs to get to that point. Instead, I need to choose the right parts of the project to open up in the right way at the right time.

and then further down:

> I don't have a timeline for each step along the way yet as HIBP remains something I do in my spare time and I've always got a bunch of other stuff on my plate, but the process has already begun and I'll be sharing more on that as soon as I can.

yodsanklai 1327 days ago [-]
Very naive question. Suppose I offer a service (e.g. Have I Been Pwned) and I want my users to trust my service not to store/share their data. I understand that open sourcing the code base is one step, but how can I know that the server running the service actually does run this codebase and not something else?
matsemann 1327 days ago [-]
On the side of your question, but HIBP's API is designed in such a way that you don't have to give them your password to get notified if it has been hacked. You instead provide HIBP with parts of the hash, and then it answers if you maybe have been compromised. Take a look at requests with devtools on the page https://haveibeenpwned.com/Passwords

So here it's taken one step further, you don't have to trust that it's the same code running on HIBP, because you don't have to trust them at all for the service to work.

chrisshroba 1327 days ago [-]
This seems like it would still leave a vulnerability: There are many passwords that are commonly used, e.g. "password". If I submit "password" to haveibeenpwned, it hashes and sends the range "5BAA6". It seems to me that now anyone in control of haveibeenpwned could reasonably guess that my password has a decent chance of being "password". Using this methodology for the top 100 or 1000 passwords would presumably uncover a lot of passwords.
snailmailman 1327 days ago [-]
HIBP (or an eavesdropping attacker) can’t really assume that your password is in the list. There’s an essentially infinite number of passwords that would result in a hash ending with 5BAA6. “password” may be the most likely if it was in the list, but surely attackers would have been guessing commonly used passwords either way.

If your password is in the list, you should probably be changing. It being bad is how it ended up in the list.

chrisshroba 1327 days ago [-]
>>> HIBP (or an eavesdropping attacker) can’t really assume that your password is in the list.

According to [1] by Troy Hunt, around 86% of passwords from one dump were already in the HIBP database, so attackers could assume your password is in the list with 86% certainty, extrapolating from that.

If we look at the returned data for a sample hash prefix (that of "password", which is 5BAA6), and sort them by count of password use, we get these password hashes with the most usages:

    ~ curl -s 'https://api.pwnedpasswords.com/range/5BAA6' | sort -t: -k2 -n | tail
  42BAADCD710F9EA7E62B60E01D05469AC64:14
  EA2008F79BE2B0E0C02A1642725433BBB2F:15
  3A8ADE4CF1DAD5342AF2F9FC9247EC21943:18
  5E2BCB2FEF09257B0306B4744418999611B:18
  A516C42C8CD4C7E7E328ABB90D002A9890E:29
  CF2F87E596758D031C0006D1827C9908E5C:34
  EF0E14CCB17E525D76050283148A57828F8:44
  8E0D5C9D144BACC76E52C44F5B61E8DF629:213
  2648FB0B2EDA4FDFF99BF51E912CD95C023:7201
  1E4C9B93F3F0682250B6CF8331B7EE68FD8:3759315
Summing the counts gives 3768295, so 3759315/3768295= about 99.7% of passwords with that hash are "password".

Going with a more obscure password, like say "obscure", you get 1801/4319 = about 42% of passwords being "obscure". This means that if I search for the password "obscure" on HIBP, HIBP can be about 40% sure that my password is "obscure".

How is this not a huge security issue? I would agree that folks with unique passwords not appearing the database are safe, but anyone using a common password can be identified with pretty high probability, and even someone with a password only occurring once in the database is at risk because a typical query returns about 500 results: low enough that a human could brute force a web input that didn't have rate limiting in an hour or two.

[1]: https://www.troyhunt.com/86-of-passwords-are-terrible-and-ot...

acqq 1327 days ago [-]
> a typical query returns about 500 results

If you personally want to check your password wouldn't it help to curl more queries? Probably also taking care to make not too much per some unit of time?

chrisshroba 1327 days ago [-]
Even with 10 or 100 queries you still will get a probability distribution of (I'm guessing) 20-200 passwords that make up the most likely 50% of passwords, so I'm not sure how much good this would do if your password is relatively common. If it's very obscure this could help more, but you're still narrowing the password search space from 500M to 5k-50k.
acqq 1327 days ago [-]
Yes, if you suspect your password is common maybe you shouldn‘t use that service at all, but just download some smaller list of “common passwords.”

I also see problematic that, if I understood correctly, some browsers query automatically that service with your passwords which browser collected whenever you “saved” some login.

And they claim that is “for your security.”

jve 1327 days ago [-]
Wait, if you query HIBP, you are actually asking: is it safe to use thus password?

If you have a match, you will certanly use different password.

And, by the way, they have username/hashes before you even use their API.

chrisshroba 1327 days ago [-]
> Wait, if you query HIBP, you are actually asking: is it safe to use thus password?

I agree - this is the main use of HIBP.

> If you have a match, you will certanly use different password.

I doubt this is true in all cases. Sometimes people get lazy. But nonetheless, a malicious actor could still capitalize on the few minutes between a user checking a password on HIBP and that user changing their password. For example: for every password lookup, pick the most likely password with that hash, and if it has at least 90% share of the passwords with that hash, try to correlate an email address to their IP address (using leaked dumps), and then try to log into gmail with those credentials. Even if it works 1% of the time, a lot of harm could be done in a few minutes. (For example, US Social Security Numbers may appear in a years-old employment document in the email).

> And, by the way, they have username/hashes before you even use their API.

But they don't necessarily have yours. If I am somebody who uses a different password on every site, and I check a password for a site that has not had a leak, then they do not have an association between my email and my password before my query, and they do after my query.

Regardless of the particulars of possible exploits, I'm mainly claiming that the original comment above (by @matsemann) is not true, in that some level of trust in HIBP is necessary to input a sensitive password.

gnabgib 1327 days ago [-]
This isn't really the way HIBP is used. While you can query the HIBP endpoint, the hashes are also available as (large) torrent downloads too. For many use-cases this means as the HIBP operator you won't see those requests since other service providers are likely using a local cache and not taxing Troy's infrastructure. Even when you are directly querying, the request is very likely proxied by the service operator so you won't be able to derive IPs specifically, just someone using Okta tried the password `obscure` (but not that they set their password to this - whomever they were, note I'm not saying Okta directly queries - just an example). Further, security recommendations (eg OWASP) require that companies test potential passwords against known leaks and discourage or prevent users from using frequently found values (typically 100+ uses or leaks.. note this either means one user has unluckily been leaked 100 times, or 100 people have used the same password) - this again encourages both proxied and possibly cached lookups (vs trusting client side code which could be disabled). Note that the expected use case is at password creation or password change, the new candidate is tested - not upon every login.

Ok so many use cases are removed - maybe your locally running password manager allows you to test a potential new password against leaks. Now we have a rare example of what you're hypothesizing, at this point you're trying to tie an IPv4 address to an identity. Many networks have either shared IPv4 addresses (NAT/GNAT in major metros, corporations, public Wifi), or dynamic IPv4 addresses (change daily/weekly/monthly). It's pretty hard (though not impossible) to link an IP to a person.. and vice versa, between 3g, 4g, 5g, wifi, work, home, complementary, wired and wireless connections a single person may appear across several addresses.

Finally the idea of accounts - knowing a specific IP just tested the password `password` and Aretha Franklin is most likely at that IP.. well, which of the hundreds of services might she be potentially considering the new password for? (again - testing, not setting the password to). If you could narrow it to one service, or brute force all of them, and assume the user ignores the dire warnings, you still need to know their user credential (be it email - of which they may have many, username - of which they may have many, or service-generated username which you'd need insider knowledge to obtain)

If you're worried about trust of your sensitive password, your password manager and the service you're using it on (if you reuse passwords) need far more.

faebi 1327 days ago [-]
But how many people will check in detail what‘s submitted? And again, how do you prevent that different users get different functionality?
mcherm 1327 days ago [-]
> But how many people will check in detail what‘s submitted?

That's a valid criticism. It only takes one person finding clear evidence of problematic behavior to advertise that fact to the entire community. So long as a small fraction of people do actually check, the whole community will be fine. But if only a negligible number of people check it, then perhaps no one will be checking when it is abused.

> And again, how do you prevent that different users get different functionality?

Well, it depends on how the system is discriminating among users. To a great extent, that kind of abuse is prevented by anonymous users. You don't have to log into HIBP in order to use it. The system could still be discriminating by IP address, or even by various kinds of browser fingerprinting (including just collecting the advertising company cookies that so effectively eliminate anonymity for much of the web). Occasionally using TOR won't help against this -- a malicious operator could simply provide "clean" functionality to all TOR users.

- - - -

On the whole, things like, k-anonymous API design, making source code open, or having a small fraction of users checking for security issues all help a great deal and make it difficult for a malicious operator to abuse the system. But none of them are perfect. In the end it comes down to trust. Troy Hunt has EARNED that trust due to his openness, and in no small part by choosing to implement protections like these that he didn't need to implement, and I find that extremely persuasive. If you take steps to ensure that your CANNOT cheat your "customers", then that's pretty good evidence you aren't likely to be trying to cheat them.

kungato 1327 days ago [-]
How do we know if anyone is checking?
joshspankit 1327 days ago [-]
Another good question but since it’s trivial (opening the browser dev tools during a submission), we can assume a significant number. Doubly so because any service that says it’s secure gets a lot more scrutiny.
jhardy54 1327 days ago [-]
You can't verify. Even if you could, you can't ensure that the code will be replaced the moment you turn around.

I investigated this thoroughly a few years ago, and the solution I came to is distributed computing. Join a peer-to-peer network an run the code directly on your own hardware or depend on someone that you actually trust. As far as I can tell there are no shortcuts that allow you to trust strangers with sensitive data.

prepend 1327 days ago [-]
Only a few need to check to see what’s submitted and serve as a canary for others.

It’s possible some bad actor could do dumb stuff, but SSL helps in that at least the server would have a unique id.

Personally the first time I used HIBP I checked the javascript that was running and read about what happened.

It’s possible that someone could hack their server and replace the javascript with new bad versions. I don’t check anymore. But it’s unlikely, and a shared, common risk of using any site, I think.

But the important thing with HIBP (and other sites) is that I never send them any sensitive information.

jhardy54 1327 days ago [-]
> Only a few need to check to see what’s submitted and serve as a canary for others.

I couldn't agree more. Even if you have root on their server and do a full audit, there's nothing to stop them from changing the code the moment you log out.

coddle-hark 1327 days ago [-]
Flipping the question around: How do YOU know that the server running the service is actually running the codebase and not something else? You might have been hacked. Someone at your hosting provider might be exfiltrating data. Intel might have a backdoor in their CPUs. It’s not practical for you to continuously verify all of these things, at the end of the day you have to trust that things do what they say they do until you have reason to believe otherwise.

Of course, if you can’t verify what’s running in your server then neither can your users. All you can do is be as transparent as possible about how you process their data and hope that’s enough to earn their trust.

As an aside, homomorphic encryption has been touted for a while now as a way for services to process encrypted user data without ever being able to decrypt it. Haven’t seen much use of it in the industry though, last I heard it was enormously impractical.

mihaifm 1327 days ago [-]
You're actually implying that you can't trust any computer with the code you are telling it to execute. This is an exaggeration IMO. Sure you can have backdoors and hacks, but with what probability?

On the other hand, users need to put their trust in humans, not computers.

xav0989 1327 days ago [-]
Ken Thompson wrote a paper about trust that was published in ACM in '84 entitled "Reflections on Trusting Trust". He makes the case for a trojan that could be built in such a way that it would infect a compiler, self-replicate into new versions of the compiler, and inject itself into specific binaries; all without appearing in the source code of the new versions of the compiler or the target binary.

Arguably, the probability of such an attack is extremely low--slim to none might be an overstatement--but it's an interesting thought exercise.

IggleSniggle 1327 days ago [-]
I remember reading about this attack being implemented and executed in a university system decades ago. Perhaps apocryphal, but it didn’t seem that way.
SifJar 1327 days ago [-]
IggleSniggle 1327 days ago [-]
That's the one!
coddle-hark 1327 days ago [-]
For what it’s worth you can detect a “trusting trust” attack using Diverse Double Compiling. David A. Wheeler implemented it as part of his dissertation back in 2009 and verified that gcc had not been infected.

https://dwheeler.com/trusting-trust/

im3w1l 1327 days ago [-]
I read the abstract and it seems kind of obvious: Yes if you have a trusted compiler you can bootstrap another.

But have can we know the "trusted compiler" is really to be trusted?

david_draco 1327 days ago [-]
All the godbolt users looking at their binaries from various compilers would quickly notice this, I think.
jhardy54 1327 days ago [-]
It doesn't need to go in every binary, the conventional attack only outputs the malware when it's compiling a compiler or sudo (or something).
munchbunny 1327 days ago [-]
This is the classic conundrum when thinking about supply chain attacks. It can happen, but at what probability threshold do you stop trying to take security measures?

On the other hand, users need to put their trust in humans, not computers.

It’s still a human trust problem all the way down because you had to buy your computer from someone, and they had to buy the parts from someone, and so on.

There are some places where it’s less of an issue due to cryptographic magic, but those are few and far between.

marksomnian 1327 days ago [-]
This is a very difficult problem to solve. As far as I know (and am welcome to be corrected!), the current state of the art is remote attestation - Signal wrote a post about how they're doing it for their contact discovery service[0]. It's a very limited solution though.

[0]: https://signal.org/blog/private-contact-discovery/

mLuby 1327 days ago [-]
IDK, trusting a remote system just seems bad. Either give me the data to process locally, or I'll give you some data you couldn't possibly understand and you do some stuff to it and return the results, having never known the meaning of the data.

In Signal's case, they could just let you send messages that might never be received because the contact doesn't currently have the app. Also gives the receiver plausible deniability unless they respond. And could help get more people on Signal if they did the "Someone sent you a message, download the app now" trick.

cuu508 1327 days ago [-]
TL;DR they use Intel's SGX.

Intel's 20GB data leak was on HN yesterday, with a hint that more is coming. Will be interesting to watch ;-)

londons_explore 1327 days ago [-]
You make your webpage have all logic client-side, and use standardized serverside API's (for example google datastore, or amazon s3).

Do not run any of your own application logic on the server - your javascript client should talk to the standardized datastore directly.

Then your users can see all the logic on their side, and inspect all the data, and if they want they can host the client side stuff themselves and run their own server.

bhandziuk 1327 days ago [-]
This doesn't seem sustainable either though. Isn't server-side code required to ensure the data you're getting from the client is legit. You can't do everything client side especially because not everything can be done in javascript.
londons_explore 1327 days ago [-]
> Isn't server-side code required to ensure the data you're getting from the client is legit.

In this model, the client must validate the data before it puts it into the data store, and again when it gets it out of the data store (since the data could have been put there by a malicious/modified client)

jamesponddotco 1327 days ago [-]
For my projects, I provide read-only SSH access when asked, but that probably does not work for everyone.
pbhjpbhj 1327 days ago [-]
In theory you could maintain a separate system that the SSH accesses, changing the real system without revealing those changes.
jamesponddotco 1327 days ago [-]
True, and there is probably a way to prove that is not the case, but I have no idea what that way is right now.

On the other hand, if you are bothering to email me to ask for SSH access to ensure what I say is true, you probably have the knowledge to detect if I am lying from inside the server.

renewiltord 1327 days ago [-]
Why bother? Change your password to a randomly generated string then send over the old password. You've cut off the threat vector.
mirekrusin 1327 days ago [-]
Good question, built in content hash on those lambda function would be great to have.
deanclatworthy 1327 days ago [-]
Troy does talk a little bit about the data here, but the codebase is not entirely useful without the data. I don't doubt it can be improved from Troy's one-man-efforts in that area, but the real value of haveibeenpwnd is in the data - and the code is useless without a huge trove of breach data - which I guess is never going to be open.
1f60c 1327 days ago [-]
There’s no perfect solution here: publish the raw data, and people who haven’t changed their passwords get cracked, or publish hashed (in some way) passwords, and people will crack the passwords themselves. (That said, I guess most crackers will just get the data straight from the source, but still.)
speedgoose 1327 days ago [-]
He does publish the passwords hashed, with the number of times each password was leaked (without salt and the weak sha1).
nnt38 1325 days ago [-]
It is trivial to find (most) of the datasets HIBP uses
toomuchtodo 1327 days ago [-]
Incorporate a non profit and have it own the data and run the service? No different than Let's Encrypt.
sneak 1327 days ago [-]
Breach data is, by definition, open. You just have to dig a little, and sometimes pay criminals.

Troy got it; so can you.

Craighead 1327 days ago [-]
That implies the same data still exists, which I don't think is true for many breaches.
prepend 1327 days ago [-]
I think it’s safe to assume that all breached info is archived and available forever.

Both as a curiosity, a research tool, and a crime tool.

As a hobbyist, I used to have a folder of breached data (ashley Madison was really fun). And if you add in groups like /r/datahoarders, the data will probably be kept someone.

The dataset could be reconstructed if needed.

anaganisk 1327 days ago [-]
Yeah make the data available so that we can have AWS Elastic Pwned, and HIBP to be eclipsed by it
octorian 1327 days ago [-]
I hate to say it, but almost all of these sites/systems end up being far more annoying than useful. Why? Almost all the time, their alerts simply are not actionable. At best, they'll tell you that your username/email was included in a breach and often identify the breach itself by some nebulous data cache name that means nothing to you.

They almost never tell you which site was actually breached, nor do they ever give you any hints as to what password was actually compromised.

So really, when I show up in one of these alerts, I'm always asking myself:

- Was this a recent breach, or a redundant alert from something I dealt with months ago?

- What account do I actually need to update, if any, to be safe from this alert?

These questions almost never seem to be answered. As someone who uses a password manager and a different random password for every site, there's no way I'm going to proactively hunt down and change every single entry in its DB when I get the "alert of the week."

(FWIW, I once worked somewhere that somehow had access to a far better version of this data than they'll ever let the public get access to. That system actually did generate alerts that were actionable. I only wish I had a way to get useful alerts like that as a private individual.)

darekkay 1327 days ago [-]
> They almost never tell you which site was actually breached

That's probably true for most people. But that's what email aliases are for. If I notice (through 1Password alerts) that my me+facebook@mydomain.com email got leaked, I pretty much get an idea which site was breached. At least GMail and Fastmail support email aliasing.

toyg 1327 days ago [-]
I keep using that scheme but it’s been clear for a while that most systems (and hence leakers) clean up those addresses. Which it has to be expected for a feature that has been around for 15 years.
clarkdave 1327 days ago [-]
I use `facebook@mydomain.com` instead, which is easy enough to do with catch-all aliases.
quickthrower2 1326 days ago [-]
Or me+Facebook and block email to me.
mholt 1327 days ago [-]
This is the same problem we've had with Certificate Transparency.

In grad school we looked at integrating CT monitors into web servers that manage certificates for your sites (including monitoring lookalike or spoofy names), but then weren't sure what to do when a suspicious certificate appeared in the logs. Do you email the site owner? Then what? Sure, you can report to CAs and web hosts and all that, but who will actually go to that trouble? By the time you do that, the site will probably already be blocked by SafeBrowsing and whatever other blocklists.

TimTheTinker 1327 days ago [-]
1Password and other services plug into HIBP to alert you when an account has been pwned and the password should be changed.
blackearl 1327 days ago [-]
I have HIBP alerts setup for client domains and it quickly notifies me, shows me what user emails were in the breach, and a general idea of what data was in the breach (usernames, passwords, financial info). Maybe if you're reviewing old breaches it's pointless, and if you're using a pw manager and taking security seriously you don't need it. But most users will use the same password for work as they do with some stupid app so at least with these alerts I can go and remind them not to use work email to signup for a free hamburger or whatever
flingo 1327 days ago [-]
A lot of the "breaches" added to the site are from unknown origins. Like, a large mixed collection sold online that somehow found it's way to Troy.

Last time I used HIBP, it told me what site the leak was from.

If your password shows up at all, you should change it wherever you use it.

newman8r 1327 days ago [-]
> FWIW, I once worked somewhere that somehow had access to a far better version of this data than they'll ever let the public get access to

If you really want to hunt it down, a lot of this data is available via torrent, although probably not too useful for generating alerts.

jacoblambda 1327 days ago [-]
If you use KeepassXC or another password manager of some kind, they are able to tell you which username/password combinations have been compromised.
jimhefferon 1327 days ago [-]
Can you expand on that for me, please?
abdullahkhalids 1327 days ago [-]
If your password is unique for each website because you use a randomly generated one, you can immediately [1] tell by looking at the password which website got compromised.

[1] keepassx does not give a way of searching passwords. You can manually look through the list of passwords though.

theshrike79 1327 days ago [-]
I give it 24 hours after release before someone rewrites it in Rust.

48 hours before its on HN.

notmalc 1327 days ago [-]
This man knows
atxbcp 1327 days ago [-]
One of the tweets in the article says Github is open-source: it's not.
Techbrunch 1327 days ago [-]
True but FYI if you want to, you can reverse engineer the code of the GitHub enterprise version: https://blog.orange.tw/2017/01/bug-bounty-github-enterprise-...
johannes1234321 1327 days ago [-]
The fact that you can (probably under break of license terms) get to source, doesn'tale it open source, as understood by majority of the community (which is somewhere close to OSI's definition)
rvnx 1327 days ago [-]
Older versions of GH Enterprise even don't have ruby_concealer

You just have an open VM

nelsonic 1327 days ago [-]
Is there a reason (Security, Competitive, etc.) that the code wasn't open source from the start?
hannob 1327 days ago [-]
Very likely just boring reasons like not having thought about it, not having made the code in a way that's easy to publish.

Think something like he started the project at some point, never planning for it to become this big. It was probably some code written in a random directory, with no regard for whether test fragments are around, whether it contains test data that maybe shouldn't be public, different pieces that are closely tied to a server config that itself is very specific to what he already had running and would need some documentation to be useful etc. pp.

WorldMaker 1327 days ago [-]
Troy points out in the article that it started as and has mostly entirely been a series of hacks done in free time, including a lot of done with not enough sleep on long distance plane flights. It sounds reasonable to assume that a lot of it is duct tape and straw wire and that the imposter syndrome "it's not good enough to open source" played some presumably large factor.
buster 1327 days ago [-]
Not wanting to have toxic user requests and issues open for something you do in your spare time could be one good reason.
oliwarner 1327 days ago [-]
I'm not sure where you get the impression those somehow don't exist if you don't have a bug tracker. They just go to email, or Twitter.

For something like this, keeping things private has allowed Troy to work out the best way to do things with complete autonomy. That's really useful if you have the time and resources to get to market on your own.

Now it's stable and best security practices are nailed down, he can open it up to a bit of scrutiny and feature bloom.

msla 1327 days ago [-]
It's entirely possible to run a Git server which is read-only (for the rest of the world, anyway) and doesn't allow anyone to bother you.
jcims 1327 days ago [-]
What's the point of doing that though? It's just chumming the water for unsolicited feedback.
majewsky 1327 days ago [-]
It allows others to easily mirror your code. The original code owner may disappear for any number of reasons (burnout, early retirement, being run over by a bus, etc.).
msla 1327 days ago [-]
> What's the point of doing that though? It's just chumming the water for unsolicited feedback.

You'd do it to prevent feedback.

jcims 1327 days ago [-]
People shit-talking your code on Twitter is still feedback.
msla 1325 days ago [-]
Nobody real monitors Twitter.
zo1 1327 days ago [-]
What would the reason for it being open-source anyways? Unless it's in a state for someone else to use it in their own space/context with separate data, etc, then making it open source is just a marketing/promotion tactic with the side-benefit of maybe having your code audited on some level. The rest is just negatives, tbh.
nelsonic 1327 days ago [-]
Respectfully, I disagree with your premise/question. Open Source is about way more than marketing/promotion. It's a means to make the world a better place. Unless something needs to be private because its a state secret (security through obscurity), all software should be open source by default. That way all of humanity can benefit from it and we can reduce duplication of effort.
hannasanarion 1327 days ago [-]
What are the negatives? The value of HIBP is in its data, not in its algorithm. They lose nothing by open-sourcing.
1327 days ago [-]
detaro 1327 days ago [-]
It's an interesting question: Could some open group etc replicate it entirely, e.g. something like Let's Encrypt (which is also a free service funded by various companies)? The data sources and import are the key bit where trust is an issue, and weirdly enough a single individual might have an easier time than a well-funded foundation etc.
anaganisk 1327 days ago [-]
Im not sure if those corporate institutions will allow it to notify breaches quicker, if HIBP is backed by likes of Facebook, google etc. Its better for HIBP to stay away from them.
nickthemagicman 1327 days ago [-]
Isn't the data whats important?

Isn't the code just essentially a text input box that takes a string, hashes it, and runs it against hashed passwords in a database?

mooreds 1327 days ago [-]
Yup. Gathering the data is a large part of the value he provides. I wrote an article about how to implement this, and data gathering tasks are crucial: https://fusionauth.io/learn/expert-advice/security/breached-...

He touches on the non technical difficulties as well with his comment: "We invite parties to form their own views on the legality of the data." So the fact he's gathered it all lets HIBP be an service that other companies can use without worrying about the thorny legal question.

But there's also his reputation as a steward of the system, which is valuable beyond the data itself.

Anyway, while he didn't actually open source anything yet, I'm glad he's committing to it, as hopefully that will allow this internet resource to continue.

andruby 1327 days ago [-]
Yes, and Troy mentions this in the comments of the blogpost.

He's open-sourcing it so the community can help him manage the project, which I think is a fair request/hope.

42droids 1327 days ago [-]
Wow this is awesome news. I always wondered how the internals could work. Can’t wait to take a peak...
FlorianRappl 1327 days ago [-]
You will be very disappointed ...
mikorym 1327 days ago [-]
Didn't Troy want to sell to someone just a few years ago?
teh_klev 1327 days ago [-]
As mentioned in the article in the first paragraph:

"and it took a failed M&A process to get here"

Second paragraph:

"especially in the wake of the M&A process[0] that ended earlier this year right back where I'd started"

[0]: https://www.troyhunt.com/project-svalbard-have-i-been-pwned-...

snapetom 1327 days ago [-]
Yes. Just last year. Sounds like this action is a result of that failure.

https://threatpost.com/troy-hunt-sell-have-i-been-pwnd/14556...

aspenmayer 1327 days ago [-]
I asked in comments on post which open source license he picked. Will update if he replies.
tokai 1327 days ago [-]
Hopefully it'll be a proper free one.
aspenmayer 1327 days ago [-]
Hopefully it’ll actually be open source, as in free as in money AND free as in beer. That is, I hope the license Troy Hunt picks for Have I Been Pwned is OSI-approved:

https://opensource.org/licenses

aspenmayer 1327 days ago [-]
Needless to say, I hope it’s also free as in speech. Saved the best for last.
makach 1327 days ago [-]
It's good just for the sake of transparency. Maybe he will find someone who is interested in helping maintain and contribute.
1327 days ago [-]
jcun4128 1327 days ago [-]
I'm curious what the code would be/how far of an extent it goes. I mean from my experience using it it's a search box, type in email, get results... so is this going to show scrapers or something where the data comes from?

Maybe has efficient hash comparisons or something...

matsemann 1327 days ago [-]
He linked to lots of blogposts and also talked about the k-Anonymity API design that was invented for HIBP.
jcun4128 1327 days ago [-]
Interesting I have not heard of k-Anonymity before, will check that out, thanks.
throwaway77384 1327 days ago [-]
What is this M&A process he keeps referring to?
MattGaiser 1327 days ago [-]
throwaway77384 1327 days ago [-]
Thanks!
ragebol 1327 days ago [-]
Mergers & Acquisitions
throwaway77384 1327 days ago [-]
Thanks!
dustinmoris 1327 days ago [-]
TL;DR:

I tried to sell HIBP for a nice premium to make profit of data breaches, but because the sale fell through And I’m not going to make as much money from HIBP as I’d like, I lost interest in maintaining it and therefore now rebrand it as if it was always a community project, so other people can work on it now.

rsa25519 1327 days ago [-]
Honestly, maintainer burnout sucks. People volunteer their time, become pillars of modern tech. Then they look around, and everything's just boring. Nobody wants to pay them, and nobody really wants to take on the work for themselves, which is understandable :-/
nickjj 1327 days ago [-]
This is why I think directly tying money into open source code* isn't a good idea. It breeds burn out.

Instead of creating something with the purest intent of doing it for sheer joy or scratching your own itch, suddenly everything becomes about money and now every second you spend on the project gets evaluated as a way to maximize monetary gains and if things don't work out to your internal expectations you're constantly comparing yourself to others or thinking negatively about how much time you've put into something and how little compensation you're getting from it, which is a horrible feeling.

* The code aspect is very important. I'm all for figuring out ways to generate income around open source projects, but if the code is the core of everything and your core is only moving forward because you want money, you're setting yourself up for failure, burn out or worse.

GordonS 1327 days ago [-]
I think the burnout is kind of related to money, but more about expectations - and demands - of your users as it grows.

Once a free or OSS project becomes really popular, you suddenly start getting a lot more people asking suppor questions, asking for new features, calling your project crap because it doesn't do X, etc.

I think that's the point where burnout starts to be an issue, because you have to spend so much time to satisfy your users, and because you invariably have to put up with crap from very rude users. At this point, you might start to think about monetising your project - after all, you're spending a lot of time on it. And of course, it's extremely likely you will fail here.

rvnx 1327 days ago [-]
Ah, ah, the "choosing beggars" are sometimes golden. Fortunately most users are very friendly and it balances out.

One bigger issue is that usually open-source projects are fighting against well-funded for-profit projects who will not hesitate to do everything they can to destroy your project.

It's not a money issue in this case, in front of you you have very determined people who will do anything legal or illegal to protect their cash-cow.

For example, OpenOffice if not shielded by Oracle would have been killed long time ago by Microsoft's pressure.

ChrisSD 1327 days ago [-]
> the purest intent of doing it for sheer joy or scratching your own itch...

...is great for as long as the joy or itch lasts. But the same problem still occurs, at some point a single maintainer is going to lose enthusiasm or have other itches they want to scratch. This becomes especially true when maintenance starts to feel more like a chore.

And in open source projects it can get even worse. Github users can be especially demanding and there are often those who express their demands in very "confrontational" ways. Faced with that it can be difficult to remain invested.

rikroots 1327 days ago [-]
The good thing about open source (and GitHub) is that I can walk away from it when I lose interest in the project. I did this in 2015, went back to scratch the itch in 2017 and only returned properly to it (to rewrite it from scratch) in early 2019. Of course I was in a good position to do this because the project (at the time) had few stars and nobody was raising issues.
rsa25519 1327 days ago [-]
I dunno. I think I'd start maintaining again several of my popular-but-not-maintained projects even if they paid only $1/month each. There's something about being rewarded, materially, for your work that's personally motivating.

I'd love to be able to say "people appreciated all the work I did the past year, so they paid me $12 for this awesome annual meal"

pronoiac 1327 days ago [-]
There are a couple of points of view on that.

* yay! That's definitely a sign of appreciation.

* going by hourly rates, how much of your time should $12 get?

MattGaiser 1327 days ago [-]
I don’t see that as unfair, even if true. Vast amounts of the tech world are supported by the ceaseless uncompensated toil of many developers.

Why not allow others to help share the burden?

Jonnax 1327 days ago [-]
It's the dichotomy of the tech world.

Do things for free, even build open source libraries that are used by large companies. But get paid nothing, even perhaps not be able to pay rent.

Then praise all the large companies earning millions leveraging open source software.

rmrfstar 1327 days ago [-]
From [1], which is a fascinating piece:

Serge quickly discovered, to his surprise, that Goldman had a one-way relationship with open source. They took huge amounts of free software off the Web, but they did not return it after he had modified it, even when his modifications were very slight and of general rather than financial use. “Once I took some open-source components, repackaged them to come up with a component that was not even used at Goldman Sachs,” he says. “It was basically a way to make two computers look like one, so if one went down the other could jump in and perform the task.” He described the pleasure of his innovation this way: “It created something out of chaos. When you create something out of chaos, essentially, you reduce the entropy in the world.” He went to his boss, a fellow named Adam Schlesinger, and asked if he could release it back into open source, as was his inclination. “He said it was now Goldman’s property,” recalls Serge. “He was quite tense. When I mentioned it, it was very close to bonus time. And he didn’t want any disturbances.”

Open source was an idea that depended on collaboration and sharing, and Serge had a long history of contributing to it. He didn’t fully understand how Goldman could think it was O.K. to benefit so greatly from the work of others and then behave so selfishly toward them. “You don’t create intellectual property,” he said. “You create a program that does something.” But from then on, on instructions from Schlesinger, he treated everything on Goldman Sachs’s servers, even if it had just been transferred there from open source, as Goldman Sachs’s property.

(At Serge’s trial Kevin Marino, his lawyer, flashed two pages of computer code: the original, with its open-source license on top, and a replica, with the open-source license stripped off and replaced by the Goldman Sachs license.)

[1] https://www.vanityfair.com/news/2013/09/michael-lewis-goldma...

MattGaiser 1327 days ago [-]
I can see Goldman being picky about that, as for them, the code is actually the secret sauce of high frequency trading or algorithmic trading.
1327 days ago [-]
garbagetime 1327 days ago [-]
Indeed. In very deed.

I'd say it's a subset of a much more general situation. I don't know how best to analyze that situation. But some things seem clear: What we live in is far from a meritocracy. The current system does not incentivise moral behaviour. The current system does not incentivise productive behaviour. A lot of people will try their best to act morally and be productive anyway.

lowkeyokay 1327 days ago [-]
Seems fair enough
1vuio0pswjnm7 1327 days ago [-]
A friend of us all...
1327 days ago [-]
starfallg 1327 days ago [-]
Basically this, more or less. HIBP was rendered obsolete ever since Chrome had that feature built-in.

Edit: Specified Chrome instead of password managers in general. Chrome doesn't use HIBP as its source.

Xylakant 1327 days ago [-]
I wonder how HIBP has been rendered obsolete if many of the password managers rely on its API to do the actual check.

The value of HIBP is not the actual code, I’d assume that’s fairly boring. The value is the database of leaks and the credibility to be contacted when new data dumps show up. None of this can easily be replicated.

tootahe45 1327 days ago [-]
The leaks are trivial to compile, and the tech is very basic, anyone with enough scripting skills and alcohol can do it in under a month. The real value of HIBP is the owner's credibility in the security industry when it comes to offering a service that 'sounds like' it requires some trust (it really doesn't because you can use the service with data anonymization features).

The problem is he has been a complete failure on the business side. If he marketed it as 'you can fire 50% of your customer support if you stop account cracking using our service', charge per requests, and provided a bunch of code integrations there's no reason this couldn't be used by thousands of businesses, including top companies which have serious account cracking problems.

Spooky23 1327 days ago [-]
The business model wasn't selling the service, but as with many security personalities, buying attention. People in a fairly wide circle know who Troy Hunt is. He can probably get a good honorarium, flight and free dinner for speaking to the Kentucky Association of Banking Compliance Officers or whatever.

It's probably reaching a point where the return on announcing breaches is declining, and the potential value of selling data of questionable origin to a legit entity is very challenging. The dataset gets less valuable every day because anyone can start collecting breaches today, and the value of old breaches goes down -- who cares about an Adobe account leak from 2012?

1327 days ago [-]
starfallg 1327 days ago [-]
Chrome's password manager, doesn't use HIBP. Neither does Apple's implementation.

The barrier to collecting leaked dumps and compiling a database from them is not that high. Many of the security outfits and large tech vendors are doing it already.

Xylakant 1327 days ago [-]
Mozilla/Firefox uses HIBP. (1) so does gopass. So you're saying that because some browsers use their vendors database, it suddenly becomes obsolete?

> Many of the security outfits and large tech vendors are doing it already.

Certainly, not doing so would be negligent. But that, too, doesn't make HIBP obsolete - there's value in having such a database that's openly queryable via an API and under independent and trusted stewardship.

HIBP also offers features that go beyond what a browser/password manager can do: It offers monitoring for entire domains that you manage. I have all our domains that we're using for email registered at HIBP.

(1) https://blog.mozilla.org/futurereleases/2018/06/25/testing-f...

detaro 1327 days ago [-]
... the password managers that query HIBP to check passwords?
mschuster91 1327 days ago [-]
> HIBP was rendered obsolete ever since Chrome had that feature built-in.

A feature that many developers have to disable because you can't make Chrome ignore certain entries for localhost. For local development I have stuff that spins up a server of, let's say a CMS, and it uses the usual default credential "admin/admin". Yes Chrome, I know that this is an insecure password that has been breached, but this is a freaking development system, leave me alone...

The only way to avoid these messages is to disable the feature globally and that option is hidden deep in the extended settings.

marksomnian 1327 days ago [-]
Where do you think the password managers get that data from?
nsarafa 1327 days ago [-]
I use Spark for Mac to make custom shortcuts. It's a must have for anybody who's aiming for optimal efficiency when in front of their MacBooks ~ https://www.macupdate.com/app/mac/14352/spark
samirillian 1327 days ago [-]
> I'm sure I speak for [Junade] as well when I say we couldn't be happier that other companies have taken the model we pioneered and applied it to their own services too because at the end of the day, that's in everyone's best interests.

Socialism for "other companies," capitalism for us. It's details like this that prove to me how purely ideological it is to claim that we need capitalism to "produce value." _We_ produce value already. Capitalism uses that value for free or a ridiculously low rate, then turns around and charges us for it.

bawana 1327 days ago [-]
Open sourcing code is one thing. Open sourcing the email addresses of vulnerable victims is something else. It’s like publishing the largest vulnerability of all time before it can be patched
BenjiWiebe 1327 days ago [-]
Am I missing something? I understand from the article that he is open sourcing the codebase, not the data...?
boomboomsubban 1327 days ago [-]
All the email addresses are pulled from public leaks, often from years old leaks. Nothing like your analogy, if you're still vulnerable you should consider yourself lucky you haven't already been hacked.
dastx 1327 days ago [-]
All of which is readily available on tor anyway. It's just now centralised.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:07:36 GMT+0000 (Coordinated Universal Time) with Vercel.