Standard formats and interop will help fix that.
What exactly is so bad about competition?
For example, a big reason why a lot of computer vision research was built (and sorta still is because of momentum) on caffe was pre existing model zoos.
A big reason why people choose TF (despite lacking dynamic graphs) is just because of existing community.
Requirements for both papers as well as industry will continue to evolve. Each framework will have their own trade offs.
I think there's always a trade off of innovation vs stability that people should be thinking about here.
Granted, things like the model formats should help long term, but for now we're going to be dealing with a ton of churn on APIs.
I'm sure another thing like dynamic graphs will come along and we'll need to update the apis.
I suspect keras will respond to this at some point by adding primitives for eager mode and the like.
I know both data scientists who need more advanced models and others who prefer the keras api just building off the shelf models.
Personally, I would love for MS to release or support a .NET based ML toolkit. There is open source stuff like http://accord-framework.net but I would assume that it isn't as big nor complete as a framework being supported by a major corporation.
> Python's Buffer Protocol: The #1 Reason Python Is The Fastest Growing Programming Language Today
> The buffer protocol was (and still is) an extremely low-level API for direct manipulation of memory buffers by other libraries. These are buffers created and used by the interpreter to store certain types of data (initially, primarily "array-like" structures where the type and size of data was known ahead of time) in contiguous memory.
> The primary motivation for providing such an API is to eliminate the need to copy data when only reading, clarify ownership semantics of the buffer, and to store the data in contiguous memory (even in the case of multi-dimensional data structures), where read access is extremely fast. Those "other libraries" that would make use of the API would almost certainly be written in C and highly performance sensitive. The new protocol meant that if I create a NumPy array of ints, other libraries can directly access the underlying memory buffer rather than requiring indirection or, worse, copying of that data before it can be used.
(The italic emphasis was copied from the original article.)
Is there perhaps another factor, such as an existing ecosystem or that it’s widely used in the academic field?
Python just has momentum and a fairly easy to use FFI.
Yes, it's widely used in science in general. Don't underestimate the learning curves of other languages when your audience is scientists and mathematicians. Python is incredibly easy to use, even when using numpy and other scientific tools.
Have you tried Kotlin?
Without a strict and static type system it becomes quite problematic to ensure new code keeps the API contract, unless you have unit codes for every possible value.
A good type system accelerates your coding speed, compared to writing equivalent unit tests, and it improves your quality, compared to no testing.
Coding speed is least of my concern for a ML project, to be honest. And unit tests aren't useful either, since ML by large is not deterministic. A lot u said is true for web application, but didn't really apply for a ML project
this is debatable
And compared to that, the type system is certainly faster.
Well considering this is a realistic scenario for fallible humans, it’s still decent advice to keep your exploratory projects in python small to avoid ridiculous tech debt. It’s not quite as bad as with ruby, but it’s close.
Many languages make it problematic to keep code actually bug-free and maintainable, and Python and especially Ruby are problematic for that, while Java and Kotlin, but even C++ (with a strict style guide) are a lot nicer to work with at scale.
If you want to keep consistent APIs between modules, strict types and checked exceptions are very helpful, while with python one typo can lead to accesses being lost — which is why so many use slots nowadays, and TypedPython, and annotations. But if I do that, I might as well use Java or Kotlin, and get a better IDE.
Compared to unit tests, strict and static types are faster, compared to no testing, static types are safer.
In addition, notebook apps like Jupyter fit well with the experimental nature of scientific code. I have a colleague who was attempting to do some stuff in Ruby (to fit with our application stack) who would leave IRB sessions open for weeks at a time. He's recently switched to Zepplin for notebook stuff, and it has been a huge productivity boost for him.
The rest of the ML world is in that exact situation, but on Python. They aren't going to throw away their familiar tools unless everyone else does too.
However, if you're more algorithmically focused, python is a great DSL
Outside of speed, I've read very few valid criticisms. What other languages are cross-platform, great lib support, delegate easily to lower level libs for perf, are there?
(FWIW, any MS based language is probably excluded from consideration depending on its cross platform ability. Many data people -- like me! -- won't use a MS based OS)
The DyNet paper is still the best source for background on the relative advantages to using "Define-By-Run" networks:
DyNet: The Dynamic Neural Network Toolkit
Now I just need to get to where scaling to 1000 GPUs is a problem I actually have ;)
What do you mean by collusion here? Seems more like an attempt at competition than collusion.
Just like databases we’ll support a wide range of engines on AWS; some of our own like Gluon, along side others from the community like PyTorch and TensorFlow. They’re all first class citizens.
We even fund separable (competing!) teams internally to focus on making sure AWS is the best place to run each of these popular engines.
I know my counterparts at AWS as a result and as we are friends, I push for collaboration whenever opportunities arise. At least on the MS side of the house these sorts of outreach and collaborative projects are a ground up push.