Forgive the sarcasm, but I'm really put off by the aggressive weak-to-strong generalizations that are going on here. I'm also very excited about AI, but I don't understand how lines like,
>> But at least within many current domains, more compute seems to lead predictably to better performance, and is often complementary to algorithmic advances.
can be extrapolated to anything more than a fun conversation to have over drinks, or the plot for a bad sci-fi movie about AI (which, to be fair, are also quite prevalent in the current zeitgeist). We're definitely at a new tier of "kinds of problems computers can solve", but surely experience and history in this space should tell us that we need to expect massive, seemingly insurmountable plateaus before we see the next tier of growth, and that that next tier will be much more a matter of paradigm shift than of growth on a line.
The systems on this graph all do different things in different ways. It's one thing to abstract over compute power via something like Moore's Law, or societal complexity via the Kardashev scale. But I think we need a much more nuanced set of metrics to provide any kind of insight in to the various AI techniques. Or an entirely different way of looking at 'intelligence'
What I read from this article is that we're currently in a trend where the more compute you throw at machine learning, the better solutions you end up with. Nothing about general AI, it's just that you can train deeper, more complex neural networks that can handle broader and more complex versions of the problems they're designed to handle.
This can have implications down the line which still have nothing to do about AI becoming smarter (in the general intelligence kind of way) than it is today. If all input/output problems out there have very deep neural networks behind them, and those neural networks are constantly training simultaneously, and that has a positive economic output, we'll see tremendous amounts of FLOPS that pales cryptocurrency miners by comparison.
Just as an example, depending on how cloud solutions keep up, maybe startups on the cloud won't be so competitive anymore. It's an interesting trend to point out and keep track of. And yes, OpenAI's involvement will probably be to learn about how this can lead to unsafe use of AI, but again that's not what I saw the topic of this article being.
There are a limited number of paradigm shifts before AIs get "general" in their name. So "it already happened before" argument has a natural end. What makes you think it is still applicable?
Some organizations have access to a computational power reaching into the realm of a computational power of a human brain. We see commercial applications of systems which aren't programmed by hand like expert systems and which don't fail miserably like speech recognition engines of yore. How often do you hear about the curse of dimensionality today?
Some things have changed from then to now, the question is are they changed enough.
From a physics standpoint it’s not equivalent power actually, it’s equivalent energy.
Power is work per unit time. Work is energy expended which causes displacement.
These “brain-equivalent” computers can’t do nearly the work of a human brain in the amount of time. They use about the same amount of raw computational energy, but they don’t make nearly the same amount of structures waves in the information spaces around them. Their output can only even be seen in the absolute quietest of environments. They often run for long periods of time with no obviously informative output.
Human minds tuned for it are essentially incapable of not producing a constant stream of novel and disruptive insights (work) which is how you get a large computational power given a roughly constant computational energy.
In the sixties, chess was thought to be a problem that required intelligence since it couldn't be brute-forced. We now have machines which can play it well, without brute-forcing, and yet it's seen as entirely procedural.
As for style transfer, that is a very specific skill of making the patterns of one style map to the patterns of another. I am not particularly well versed with art, but that process seems well defined to me.
Perhaps your issue is with my more generalized definition of "clear" and "well-defined". I meant to use these terms to distinguish between autonomous driving and being a successful human. I really don't think there is anywhere close to a consensus on the latter. To the extent that there is, then yes, AI should be able to do it.
Of course we don't have human-level AI right now, but if that's the only thing you're claiming it's pretty vacuous.
Note that we have great raster-based deep visual effects, but vector is... not there yet (not saying it won't be) - vector is less structured than raster, so the choice of algorithm is less obvious.
As for well-defined criteria, I don't think that's really quite the right standard, I think the correct standard is that there is a way of metrizing success on a well-ordered set (like the [0,1] interval), even if it's noisy.
The hull size argument is apt. I think it's a lot more obvious that throwing more compute power at a neural networks isn't going to eventually make a general intelligence.
At the very least there will need to be significant breakthroughs in architecture design that will be at least as paradigm shifty as deep learning.
This is not true. The biggest battleship bought was the yamato
> 256 m (839 ft 11 in) (waterline)
> 263 m (862 ft 10 in) (overall)
The biggest CV (aircraft carrier) right now should be the Nimitz class with
which seems to emphasize your statement, but, battleships are not versatile and battleship development pretty much stopped after WWII. The germans had propsals of the H-class battleship, with a total length of the H-44 of 345m.
It is possible, it's just not strategically advantageous.
Just like throwing more compute at a neutral network isn't going to make general AI. The diminishing returns comes with how brittle their learning and representations are.
The issue with physical structures is that eventually the mass and stresses in the macro structure overcome the strength in the micro-structure. That is why nature stops at, eg, elephants.
It isn't obvious that intelligence suffers from such limits, as the only time the limit of intelligence has been tested (in terms of evolution) was when humans tried it. There is no evidence humans are pushing the limits of what intelligence can achieve. Quite the reverse, honestly, when you look at the performance of computers so far.
It isn't just the computer power that allows us to be intelligent. It's also the bandwidth (which is always an order of magnitude or more behind the computer power) and the algorithm (which we don't have a clue how to create).
State of the art visual processing that gets so touted in the press is brittle -- it has to see very similar examples or it will fail. Neural networks don't transfer to new problem domains well at all.
Neural Networks have no sense of self or agency and they never will. There are key parts missing (like the ability to experiment with the environment). I'm not saying we will never have general intelligence, just that it's quite a ways away and the algorithms will be significantly different to the neural networks we use today. That said there will probably be many recognizeable components, like backpropagation, recurrent nodes, bayesian estimations, etc.
The Japanese actually invested heavily in the biggest battleship in the world. It was soundly defeated at sea by an inferior force with an aircraft carrier at Leyte Gulf during World War II.
I would argue that if you want to identify the mechanism of military dominance that the US has used to assert itself in the world you should look to Trident and the Ohio class.
Take startups. Right now, many startups can compete on the same basis to hire talent as huge companies. But if companies with huge capital reserves can put their cash directly to work to train AI models, startups will be hard-pressed to compete with "smarter" products. Specialization will not even be much help.
Looking at Beating the Averages (http://www.paulgraham.com/avg.html), PG enthused that, since established companies are so behind the curve on software development technology, there is always a chance for higher-productivity techniques like more productive languages to give smaller teams a real chance at a huge market. Of course, that this was in the era when Google was not creating new programming languages and there were no Facebook to widely deploy OCaml and Haskell. And now, AI looks to make the averages even harder to beat.
Even today, if you round up the smartest members of a CS grad class, it is going to be quite difficult to directly compete with a machine learning model with access to huge amounts of data and computing resources. Looking further forwards, if machine learning is able to provide "good enough" alternatives to most human-created software, the software startup narrative — that a few talented and determined people can beat billions in resources — may not even be so relevant anymore.
This is quite a bold claim, and one I'm not sure they're making. Their promo material suggests that it's limited to quite well-defined domains where conversations aren't really that open-ended, and we haven't seen how it'll perform in the real world.
Relatedly, I don't think headlines like "Google Duplex beat the Turing test: Are we doomed?"  are helpful at all. It's disappointingly low-effort clickbait where instead there's plenty of interesting discussion to be had (should machines have to identify themselves as such? What about their use of pauses and fillers?).
Is Google really saying that or just the more breathless commenters? I thought they were pretty good at making it clear that Duplex took a lot of work to do well in very constrained conversational situations.
some prominent figures in AI/ML are saying we are due for another "AI winter" since it's being oversold again.
"Some say...". Name one.
We may have a Gartner style "trough of disillusionment", but a 1990's style AI Winter is unlikely. It works too well in too many valuable areas for the money to go away.
technically Google is kind of saying they can tentatively pass the Turing Test with phones
Could you show us where they claim that? That goes well beyond any statement I've heard Google make, and into the kinds of breathless claims click-bait blogs have tried to make.
car decked out with extra sensors and 360 LIDAR cannot detect a simple stop sign with mud on it
Do you have a specific example of that? I did Google, and I couldn't fine anything.
Most examples I've seen handle occulted road signs pretty well. There are of course adversarial examples which are an interesting case, but mud causing a failure like this is surprising to me.
There were plenty of interesting results in AI research before the last two AI winters.
Are there any examples where current ML has replaced human-created software, the demand for startups or software engineers?
Seems to me that ML so far has expanded our toolbox of what can be done with software, not replaced programmers, designers, engineers, or really much of anybody yet. All this worry about future automation is imagining that things are going to be different this time, because of recent success with ML in limited domains.
More relevantly, I would be surprised if the shift to AI techniques in fraud detection at places like PayPal is not already having an impact on the career paths of the engineers that were tasked with maintaining and tuning their pre-ML fraud system. At one point the top engineers of the original heuristic system could have been considered their most valuable non-management employees at the company. I'm sure they're not out on the streets or anything, but I also assume the next person to take their job will not be nearly as valued.
Also, ML will impact programmer demand in subtle ways. A lot of programming is refactoring, and there is reason to believe we can refactor code, especially in certain languages, automatically to make it more aesthetic. Realistically, that seems likely to decrease demand for programmer hours. Or an ML system that can run over someone's GitHub account or repo may be the new resume screen, and if one scores badly on it that may limit the demand for them personally.
Finally, I have to think that the overall march of software towards more complex integrated systems is already a major cause of the dearth of entry-level programming positions, and ML will accelerate that trend.
Doing well reduces the incentive to explore other ideas, especially when those ideas conflict with your proven business model.
This requires hardware to be miniaturized as non-ML compute has been and when that does happen we'll have the learnings from the current edge computing push. In the mean time I've excited to see what developments are made on both the hardware and software side.
We've had 'compute shaders' e.g., https://msdn.microsoft.com/en-us/library/windows/desktop/ff4...
The purpose from this perspective has been to differentiate general purpose computation on GPUs from fixed-function pipelines and/or graphics-specific functionality. The history of using GPUs for general purpose computation involved a lot of hacking to abuse hardware designed for rasterization to do other kinds of calculations.
One keyword / search term you can use is "GPGPU" (which stands for general purpose GPU). Here's another article which might shed more light on the history: https://en.wikipedia.org/wiki/General-purpose_computing_on_g...
* Also found this possibly relevant note: "When it was first introduced by Nvidia, the name CUDA was an acronym for Compute Unified Device Architecture" (https://en.wikipedia.org/wiki/CUDA)
So it seems like a big part of how it's being used is to refer to a generalized computation service—some 'function' you're given access to which takes arbitrary programs as a parameter.
Seems like there's often the implication that how the computation is performed is abstracted over and that more or fewer resources could be applied to it—though that's not necessarily there (absent in the case of compute shaders for instance).
So is the OP:
"We’re releasing an analysis showing that since 2012, the amount of compute [amount of calculation, amount of computation] used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time "
Without loss of meaning, title could be AI and Calculation, or AI and Computation
I'd guess that having ten extra machines in the same rack would be more valuable than a thousand remote machines with limited network bandwidth.
You've just described a centralized system.
Centralization can happen at different layers - not all technical. The ultimate centralization is ownership, as defined legally.
Raised $2.7B in its ICO, currently trading at a market cap of $10B.
Raised $257M in its ICO.
Raised $232M in its ICO.
Those are the 3 largest ICOs of all time, so yes, there is definitely a market for renting part of Skynet.
The actual technology may or may not be vaporware or a scam. IMHO the way you build a decentralized P2P system is to give a single really smart programmer enough to live on for a couple years and see what he comes up with, not throw a billion dollars at a Cayman Islands corporation that may or may not use it for anything productive. Sorta like what Ethereum did.
I think Golem is closer https://golem.network/ And some others: https://www.investinblockchain.com/distributed-computing-blo...
But I'm skeptical of distributed computing blockchains. I think a) it's unlikely a distributed compute network can compete with highly optimized datacenters running TPUs or whatever, b) people are unlikely to trust distributed compute networks with their proprietary data (maybe acceptable for CGI rendering and some other specific use-cases)
We have learned a lot using big computing which can still inform better efficiency of AI on smaller computing units. Raspi is pretty good because it is quite limiting, but also quite capable.
I thought the point of parallelism is you can throw more chips at a problem and see improved performance. Single chips are limited by physics, but true parallelism scales linearly ad infinitum.
Can anyone with more knowledge than me speak to known limits of parallelism? I’d guess it’s not truly infinitely scalable.
This reminds me of a thought experiment I heard from -- if memory serves -- Scott Aaronson. The gist is that the fastest super-computer will be on the edge of a black hole. If you run any faster, there will be too much energy concentrated on a given area, thus creating a black hole. Similarly, when you run so many parallel devices (on GPU, CPU, etc) together, you will want to put the devices as close to each other as possible (speed of light limits the rate of communication). You then pump too much heat into a small area, and getting so much heat out is, among other things, a physics problem.
Also, if you don't squeeze as much as you can into a small space, you can scale sublinearly ad infinitum (in practical terms, which don't include heat death of the universe).
Are algorithmic innovations and improvements in data so difficult to track? Could they be measured by the cost of certain outputs? Or is it that the information about algorithms and data is not easily accessible?
Anyone working on chip architecture care to give their opinion on the next 10-20 years in chip design? It would really interest me to know if chip designers think Moore's law will continue, since that is probably going to be a big factor in the timeline for AGI.
1. Moore's Law is undoubtedly slowing, but in the foreseeable future, it will likely continue. On the other hand, Dennnard Scaling which is already basically dead, will be the crunch you will likely feel more. Exponential transistors aren't too useful if they still consume so much power. To mitigate leakage we moved to FinFETs... Which actually made dynamic power worse.
2. You might be interested to know that data movement (predominantly memory access) costs orders of magnitude more than computation, especially relevant to AI compute which requires large amounts of access. These global wires already suck and don't seem to be getting any better in the foreseeable future.
3. Foundries have already been using (and thus expending) "scaling boosters" to reach their density goals. Most of these are one-time use effects that won't provide significant continuous scaling capability.
However, currently it does not make sense to build a specialized analog chip to run specific type of ML algorithms, because algorithms are still being actively developed. I don't see GPUs being replaced by ASICs any time soon. And before you point to something like Google's TPU, the line between such ASICs and latest GPUs such as V100 is blurred.
You may have confused me with the Isocline/Mythic guys or a red herring comment. Our approach to deep learning chips is very public and amongst the craziest...A̶n̶d̶ ̶e̶v̶e̶n̶ ̶I̶ ̶w̶o̶u̶l̶d̶n̶'̶t̶ ̶t̶o̶u̶c̶h̶ ̶a̶n̶a̶l̶o̶g̶ ̶c̶o̶m̶p̶u̶t̶a̶t̶i̶o̶n̶
To clarify: I'm always open to opposing evidence, but based on the data at the moment, I believe that analog computing buys you very little.
People seem to assume that analog intrinsically consumes less power, which due to bias and leakage currents isn't true in the general case.
It used to be that only “coding” could elicit this reaction - nevertheless I’m quite fascinated by this new development.
These words are all nouned verbs:
Chair, cup, divorce, drink, dress, fool, host, intern, lure, mail, medal, merge, model, mutter, pepper, salt, ship, sleep, strike, style, train, voice.
(according to this, anyway: https://www.grammarly.com/blog/the-basics-of-verbing-nouns/)
Shakespeare verbed nouns.
"Compute" as a noun is at least 20 years old, according to my memory, and there are several high profile products named this way that are more than 10 years old.
Besides, I really don't think all the stigma comes from the term "artificial intelligence". You don't have to ever mention the term to a child interacting with Alexa, they will nevertheless greatly overestimate "her" ability. I think because of the anthropomorphic nature of their interactions, and the black box implementation that prevents you from knowing the boundaries of what is possible.
This something that video game characters have played on since their conception, to make humans imagine much more complex intents and thoughts behind their "stupid" hard coded behaviors. I'm okay with calling it AI even if it's not even close to on par with human intelligence. :)
The facts about hardware are hard numbers and difficult to argue with, at least in order-of-magnitude. I agree the implications for AI progress are very open to interpretation (and we acknowledge this in the post), but caution means we should think carefully about the case where the implications are big.
I'd say it all depends on the size of datasets - some domains (e.g. unlabeled image data) have "effectively infinite" datasets where the amount of data you can use is limited only by your computing power, but in many other use cases all the data you'll ever get can be processed by a single beefy workstation.
More available compute means that we tackle more difficult problems. However, for any single given task it's often not the case that the amount of compute grows. If anything, the graph is not showing the compute required for DL, but the compute available for DL - it gets used simply because it's there.
AlphaZero could not have been created without going through many many iterations of AlphaGo, each one of which cost several GPU-years, and calling AlphaZero cheap is serious moving of goalposts as it required thousands of TPUs for days, and Facebook's recent replication for chess also used thousands of GPUs for 3 weeks. (Zero is cheap only in comparison to the previous AlphaGos using weeks or months of hundreds/thousands of GPUs/TPUs.) Note that that is a log graph; flip over the linear scale and you get an idea of how extraordinarily expensive Zero is compared to everything not named 'AlphaGo'.
If anything, this observation implies that AI risk is more dangerous than thought because it implies a 'hardware overhang': it will take vast computational resources to create the first slow inefficient AI but it will then rapidly be optimized (either by itself or human researchers) and able to run far faster/more copies/on more devices/for less money, experiencing a sudden burst in capabilities. Like model compression/distillation where you can take the slow big model you normally train and then turn it into something which is 10x faster or 100x smaller or just plain performs better (see 'born again networks' or ensembling).
> simple hardware progress means that random gaming GPUs can handle datasets that were inconvenient a few years ago.
...which means using a lot more compute, yes.
On the broader point though, I agree with this. We say that compute and algorithms are complementary in the post. Much of the time, when you come up with an algorithm that allows you to do something that used to cost X compute in 0.2X compute instead, you can use the new algorithm to do something significantly more impressive with the full X compute.
For example, if a task is parameterized (by size or difficulty, say), then a better algorithm might change the asymptotic complexity from O(n^3) to O(n^2). A 2x compute increase for the old algorithm would take us from n -> 1.25n, but the new algorithm would go from n -> 1.41n.
To be taken with a grain of salt.
Innovations in algorithms will give us better prediction with less compute power.