Maybe this is just a joke about both having off-putting names that's going over my head?
I (spencer) chose the name CockroachDB. I guess we have questionable tastes in OSS project naming.
Also, does this mean that Pete's on deck for naming the next OSS project you work on?
What's the use of checking node name and ip with the certificates? I already have full control of all the nodes so a key/secret match inside a secure connection would work just as well, since that's how SQL clients connect anyway. It seems like this issue is why the kubernetes deployment also uses insecure mode?
It's true that setting up a secure cluster is kind of annoying right now. But the kubernetes templates (https://github.com/cockroachdb/cockroach/tree/master/cloud/k...) do support secure mode now, and the plan is to provide more like this so it's not something that everyone has to solve by hand.
If you know or can predict the addresses or hostnames you'll be using, then it's possible to generate one cert and reuse it for multiple nodes. This isn't ideal from a security perspective since you lose the ability to revoke individual certs, but since we don't (yet) support CRLs/OCSP it's not much of a loss.
Adding an option to skip hostname checks for node certs might make this less of a pain (it would then be trivial to share one cert for all nodes if that's what you want to do). We'll consider that and see if it compromises any important security properties.
Hostname checks make sense for websites since useragents cant trust anything but I don't see the advantage for managing a db cluster. When would you need to revoke an individual cert and why wouldn't that be better handled by just shutting down the VM or container instead?
I'd prefer nodes using self-signed certs to securely connect, then user/pass or other secret to authenticate to cluster - but yes, if you remove the hostname check then the shared certs can do double duty as encryption/auth to the cluster, although this now brings up maintenance issues with rolling certs. Either way passing secrets (password or cert file) is easy when it's the same across the cluster.
It seems like CRDB could easily run in a simple replicaset with no maintenance but that requires running insecure, or have a rather convoluted manual process. Something in the middle would be much better.
You revoke a cert when it's somehow been compromised and something other than the VM/container that's supposed to has it gets a copy of it.
Either way, I'd much prefer to make that decision as the admin rather than be forced into either extreme. Removing hostnames and having an easy way to roll certificates would go far towards operational simplicity and security.
Projects like cockroachdb are one of the key use cases we keep in mind when we look at how Kubernetes can evolve to make apps more secure by default.
Are these still the case:
Some of the features from those lists we have implemented include ALTER COLUMN SET DEFAULT, pg_table_is_visible(), UUID, extract(), and (some) schema changes in transactions. 1.2 will add (at least) INET types and sequences.
Supporting an ORM would boost the usage in my opinion. We're using Django with PostgreSQL for a social blogging platform that needs to hold posts, comments, likes/votes, recommendations, analytics, metrics, billings, payment transactions and many more small and big informations. We're good with PostgreSQL at this moment and we're thinking to keep posts into a Cassandra database later in order to scale the database, we could use Django ORM multi-database (router) to keep these data into another database and have all the beauty of Django ORM.
If CockroachDB could start supporting Django ORM properly we'd start using it alongside our PostgreSQL database.
Can't wait to see CockroachDB getting Django ORM supported completely.
2) Is there a published benchmark on performance on join-heavy workloads?
If by multi tenancy you mean sharding by a particular column (or set of columns) to groups of nodes, interleaved tables are probably not what you’re looking for. Table partitioning is currently on the roadmap and you can find the RFC here https://github.com/cockroachdb/cockroach/blob/master/docs/RF....
I'm looking forward to the partitioning work being done right now since one could in theory have a top-level (or root) table (with some tenantID) and tenant-specific tables interleaved, then easily partition on the tenantID to have tenancy isolation on all tenant-specific data.
Does this mean CockroachDB is not the database that does 50M inserts daily. Instead it is used to automate the deployment of those apps which actually run on a different database that's not CockroachDB?
In that case what are the reasons behind using CockroachDB for deployment but not for the apps themselves?
By the way what kind of performance can I get from CockroachDB? I know it's going to be slower than a Postgresql or MySQL running on replication. But how much slower?
Baidu is using it for a new application where previously they would have used sharded MySQL (https://www.cockroachlabs.com/customers/baidu/). Their dev team is fairly active on GitHub and Gitter if you want to ping them for more details.
Heroic Labs is using it for real-time, multi-player gaming development that can support a global userbase (https://www.cockroachlabs.com/customers/heroic-labs/). I'm not sure how gitter/github/HN active they are, but they work in open source, so they're probably pretty easy to connect with.
More in the FAQ: https://www.cockroachlabs.com/docs/stable/frequently-asked-q...
We want something with high uptime and resiliency that's going to be relatively easy for us to learn and deploy. We've used PostgreSQL and MySQL/MariaDB more in the past, but those were for smaller applications that didn't have significant uptime requirements.
We could figure things out with PostgreSQL and decide on a fault-tolerant setup, deal with sharding, etc. But it seems like CockroachDB will be an easier path forward, and also make scaling much simpler.
Cockroach was rather pleasant to work with, even in single-node setups, sans not being compatible with ORMs.
Would CockroachDB be a good contender for the strong consistency case (and who would be good competitors and why)? Also, I've so far considered Cassandra and HBase for the eventual-consistency option; any recommendations there?
How easy is deployment/scaling/sharding with cockroach?
Aside from cost, where is the best use case for cockroach over anything else??
If you want a data warehouse to run SQL analytics over lots of data, stick with RedShift, BigQuery, Snowflake, MemSQL, MapD that use column-stores, vectorized operations, compiled queries and other features for fast performance. Also more options like Apache Drill, Dremio, SnappyData, MapR DB,... if you want SQL but over unstructured/random files stored somewhere.
For an operational database to hold your core active data and to do small/mid-data analysis, CRDB is an option along with PostgreSQL, MySQL, MariaDB, SQL Server, and others. Advantage is cloud-native and distributed architecture so you can get high-availability, multi-master, and easy scaling out of the box, even across multiple data-centers.
All the other traditional RDBMS either don't support this or require 3rd-party software to come close, although some will never match the same level of distributed functionality. In exchange, CRDB doesn't support the same performance, data types, and rich feature set of a single node PostgreSQL, for example. If you can tolerate downtime, need complex SQL statements, or already have solid tooling with the existing systems, then stick with them.
If you're running a globally distributed app, or a new app that can work with simpler SQL statements, or need availability and scaling with less work, then CRDB is a good fit. If you're looking for something in the middle, then CitusDB (postgres extension) allows for automatic sharding across multiple instances. Keep most of the PostgreSQL functionality with single-region horizontal scaling, but more work and complexity without the distributed features of CRDB.
If you're speaking to "Linearizability violations such as stale reads can occur when the clock offset exceeds CockroachDB’s threshold." from section 2.1 in Aphyr's analysis, this too is touched upon in the post above:
> loose clock synchronization is necessary [...] to guarantee serializability. CockroachDB servers monitor the clock offset between them [...] although this monitoring is imperfect and may not be able to react quickly enough to large clock jumps.
All CockroachDB deployments should use NTP to synchronize their system clocks [...] the default of 500ms (increased since Kyle’s testing, when it was 250ms) to be reasonable in most environments, including virtualized cloud platforms.
I'm a lead developer from TiDB, and thank you for voicing your experience regarding reporting bugs in our community. I can assure you that our team member would never ever offend you, or any other developer contributing to TiDB, in such a way. We are grateful for your participation and support.
Could you send us a link to the issue you are referring to, so we can look into it more and make sure it's resolved? If you have any other issue, feel free to email me directly as well: firstname.lastname@example.org
Thank you for your feedback. TiDB's progress would not be possible without people like you.
They misunderstood your bug report, but they certainly didn't tell you to "go f* yourself". In fact, someone later relabeled the ticket as a compatibility issue with MySQL after you clarified the problem.