The Hartmann Neuron took a similar approach with neural networks in 2003: https://www.soundonsound.com/reviews/hartmann-neuron
I mean, well done and everything, it's a good project, but Synthesize Brand New Sounds In Ways Never Before Possible!!! is a pitch that synth users hear year after year (pun intended). It turns out that musicians don't like black box patching all that much, but prefer morphing things in parameter space because being musicians they want to interact with their instruments, whether that's timbrally, melodically, or harmonically.
Electronic musicians in particular don't need More Sounds or even More Oscillators and More filters and More FX - sure, those are interesting, but honestly people are already spoiled for choice. What people like most is an instrument whose timbral range may be limited but which has a strong center - secondary characteristics remain largely consistent as primary variables are manipulated, so oscillators don't thin out at higher or lower ranges, filter Q negative feedback gain isn't damped so aggressively that it changes gain structure and so forth. The nicest thing an electronic musician can say about an instrument is not 'it can make so many sounds' but 'you can't get a bad sound out of it.'
One thing I thought would be pretty cool - I've got a friend who did a phd on physical modeling of sound in Edinburgh - http://www.ness.music.ed.ac.uk/. Often those physical models have hundreds of parameters and it's quite difficult to tune them to make meaningful/good sounds. Perhaps a neural network would be useful for tuning the parameters - you could use a real musician playing a real instrument with sensors, and use the generated sensor data and audio for your training set. And then the neural network could learn to match the physical model parameters to that.
I'm not sure how to articulate this. I have synths that I can program in my head because I know the architecture really well and when I imagine a sound I can walk over and dial it in from the front panel and get more or less what I expected, plus I can then tweak the results with abandon and for musical satisfaction. Then I have others (sometimes from the same manufacturer which emit all manner of nice sounds but are far harder to program and easily veer off into sonic mush - technically impressive but not really fun to play.
I also think wavenet sample by sample generation and interpolation between latent features doesn't sound that exciting, as cool as it technically is.
We'll find some place to use machine learning in music/audio eventually :) I think perhaps more natural sounding pitch shifting could be one area (since you could learn the structure of sound of different instruments at various pitches), reverb removal, denoising, polyphonic audio to midi - things like that, where you have obvious training data.
Well, they are probably not tuning 100 instrument parameters but only the play characteristics like accent, velocity and so on.
This is 2 Parts, a high end computer that analyses (with ML and Neural magic!) some source waves and outputs blended samples that you can put on a 2d grid, and for these generated waves a simple sample player (made with openFrameworks) running on rpi3 that mixes the waves depending on your xy position.
However it doesn't sound interesting or good for what they show, they probably need a better demo without any roland classics. Their Bass / Piano mix sounds mushy and essentially represents the most boring average synth sound i could imagine. The most interesting thing is the flute/snare crossover that is buried in the overladen promotional fluff video.
Would be nice to hear a demo that really puts out the 'new' neural sounds.
the essential 15 seconds of the video here:
(IMO it's depressing that Google don't appear to know this.)
Musically the sound is a lot less interesting than the engineering is. In fact it's a perfect demonstration of why you can't just throw NNs at a problem and expect to get something useful out.
Musical sounds - even synthesised musical sounds - tend to cluster around certain perceptual parameter sets. If you don't know what those parameter sets are - and they're not just frequency distributions, or envelope shapes, or waveform sequences - your model will tend to generate sounds that are perceived as musically trivial and/or uninteresting.
By a strange coincidence, this was the problem with the Hartmann Neuron. There was some very clever technology inside the box, but the sounds had none of the quality that made it a must-have for musical production. It shipped a few hundred units and then disappeared.
That quality is a very elusive thing. Some synth companies, like Roland, have been very good at capturing it. But if you ask their designers what they're aiming for, it's unlikely they'll be able to tell you. Even more strangely, that quality sometimes appeared in products apparently by accident, when they were abused to make sounds that were an accidental twist on their original design.
...Which would be a convincing argument for cultural preference if it weren't for the fact that many of the classic products that were abused in this way were made by Roland.
All we know is that musicians respond to that quality when they hear it. Unfortunately for engineers, sounds that have that quality can have very little in common with each other. So there's unlikely to be a statistical process that can engineer "good" sounds with a high hit rate.
If someone is interested in machine learning and music, I'd send them to: http://wekinator.org which is actually a research project rather than a marketing campaign one off, and can be setup, run, and played with in a matter of minutes.
I love the idea of using neural networks to find new sounds and possibilities, but for some reason the NSynth project just doesn't hit it for me. Would love to be convinced otherwise.
My guess is there are not a lot of people who could both a) build this in a short amount of time and b) find practical uses for it.
Definitely underestimating the market if that is the case. My gear acquirement syndrome is already triggered.
Not that I mind having this sort of technology being promoted by the likes of Google (casts glance at two 19" racks full of synthesisers), but I think I'd prefer to go with Mr. D James comes up with over the corporate bread maker path ..
I assume that is sarcasm