(Just kidding of course).
When we went looking at what we should use to implement Wallaroo, one of the things that appealed to us about Pony was that it was a high-performance actor based runtime that we could help mold. We gave consideration to writing our own implementation in C but, we'd still be working on that rather than talking here on HN now if we had.
Pony has gotten us quite a bit. The runtime was used for a number of production projects at one of the major banks so we knew that while there might be some bugs (what code doesn't have them) that we could use it to jumpstart our process.
I gave a talk about Pony and its relation to Wallaroo where I put out the figure of 18 months. I think that as a wild ass guess, using the Pony runtime saved us about 18 months of work to build the foundation we want/need for Wallaroo.
Speaking as a member of the Wallaroo Labs team, it's really nice to have a community based project that you can help mold and grow with. It's been a boon to us as a small development shop in a way that either writing from scratch ourselves or using an existing more widely used runtime wouldn't be.
Speaking as a Pony core team member, I encourage other companies who think they could benefit from a high-performance actor based runtime to have a look at Pony. You could have a large hand in shaping it into the runtime that you need.
I'd also like to note that the statement that Sylvan is no longer involved in the Pony community in that post is very offbase. He's a founding member of the Pony core team and still actively involved.
In regards to the 'name' being a function - a class attribute might indeed be more correct but a function allows for dynamic computation names where its applicable.
Re the API: if you want to take part in helping shape the next version of the API, please reach out to me personally (via the email address in my HN profile) or via:
* #wallaroo on freenode
* our user mailing list: https://groups.io/g/wallaroo
We did the initial version of the API based on feedback from a couple clients we were working with. We're actively looking for feedback from a larger group of folks as we move forward.
Why not really depends per location. None have been because we didn't pass the POC. Most so far have been that it works well and now it's a matter of slotting into the roadmap at those companies.
Accessing a simple attribute on the instance means it can be defined as any of a class attribute (instance will delegate to class), an instance attribute (computed/set in init) or a property (computed every time it's accessed), which is convenient, especially when the primary use case will be setting the name on the class directly.
In fact you could even fallback on the class name.
__SOME_VAR = "foo"
With Apache Spark, the Word count example is a lot shorter: https://github.com/apache/spark/blob/master/examples/src/mai...
We are looking for folks to help drive the next version of the API. The first one was done with feedback from a couple of clients who were interested in using Python and in many ways reflects their tastes. Feedback from a wider range of Python users is something we are actively soliciting at this point.
disclosure: I work at Wallaroo Labs, creators of Wallaroo.
(Besides, Python has been popular longer than Ruby -- which only got big along with Rails' emergence).
While Python found an extra niche with scientific libraries (and not just/especially machine learning) post numpy and co, it was already very popular then, taking the throne from Perl.
Ruby at the time (and until Rails) was seen as an interesting contester, but nobody doubted Python's top popularity as far as scripting goes.
P.S. What about "the comments I've been churning"? If you mean regarding P3 adoption, then we're still quite a long ways away. Not even 50% -- and well after 10+ years. But that's beside the point. That doesn't mean I don't like Python (including 3).
Quickly followed by pointless Factory classes and classes implementing an interface with one method.
I guess the reason is that classes are picklable and functions, by default, are not? This is supposed to be a distributed system after all.
return "split into words"
In my opinion, I think as much of this stuff should be done in a declarative way as possible.
Likewise builder, just ask for a callable, and the smallest "builder" is the state class itself.
Computations seem similar, the APIDoc states they must provide a compute() method but the example shows only a compute_multi. Is there a use case for having both on the same object? If there isn't, just ask for a reduction function and provide a decorator for one of the use case. Or don't and always ask for a list of results.
Or handle the "multi" case via a generator I guess (call reducer, if it returns a generator run it and add all items to the processing queue, otherwise add the one item).
How do you see this comparing to something like Dask?
Would it compete with Dask, or be able to somehow work together with it?
Dask seems to let you write idiomatic Python code and not even think about splitting, joining, etc... and it builds the pipeline automagically by introspecting the AST.
Dask is very "batch" oriented, which as I said above, is something we are in the process of adding to Wallaroo. Wallaroo is very stream processing oriented. Wallaroo's strength are working with stateful, event-by-event applications.
If you were to take word count as an example. Dask would be great if you had a body of text in files or whatnot that you needed to count. There's a beginning and end to that task. Count the words in this text. Wallaroo would shine if you had a stream of never ending text, like twitter's trending topics.
That's a very coarse outline of a couple of differences. While we are working with clients to help them move off of Dask by adding that functionality to Dask, I also think that if you wanted to, you could use Wallaroo along side a more batch oriented system like Dask. Stream processing and batch processing are complementary. A number of technologies (us included) are looking to unify them. Why? Well, there's a lot of operational overhead to running a batch system and a streaming system. A lot of folks would like to run a single system that works well for both.
I hope that answers your question.
I've never actually played with Dask at all for even batch processing let alone stream. If I ever get some free time I'll try to implement Word Count in Dask and see how the two codes compare.
Must be complementary, I think.
I mainly used their java libraries, but the python binding have been coming along.
You can take a look at and run the streaming wordcount example in Beam: https://github.com/apache/beam/blob/master/sdks/python/apach...
In addition to the available local execution, Google also offers running Beam pipelines as a managed service in Cloud Dataflow (https://cloud.google.com/dataflow/). Python streaming is in private alpha--contact us at email@example.com if you'd like to try it out.
Note: I work for Google on Apache Beam and Cloud Dataflow.
In general, our roadmap is determined by what we think is important but also is heavily influenced by the needs of folks we are working closely with.
There's an almost infinite number of things we could work on so we live to drive our direction based on the needs of folks we are working with. In the case of Python, the early users were all Python 2.7 and thus we focused there. We've recently started working with folks who are looking for Python 3 support (in particular, 3.6) so we are going to be adding it.
If anyone is interested in adding features, language support etc to Wallaroo, we'd love to help. You can find us on freenode in the #wallaroo channel or stop by our user mailing list (https://groups.io/g/wallaroo) and we can help you out.
Can confirm that we moved all the things to Python 3 (and it was easier than expected). Especially all the data processing pipelines.
No Python 3 is a deal breaker in 2017.
It all depends on how they wrote their Python 2.7 code. You can write it oblivious to Python 3 and screw things up, or you can write it in a way that you know you'll eventually support Python 3. In the latter case, you might as well just support both right away though.
Comparisions can be really hard. What's right for one application or project isn't right for another. I'd be happy to chat over email with anyone interested in stream processing about the types of applications they are looking to build, the requirements they have etc.
I get nice use cases and information we can use at Wallaroo Labs to help drive our product. In return, I will give unbiased feedback on what you should be looking for to solve a given problem.
My personal email is in my HN profile.
In case it isn't obvious, I work at Wallaroo Labs, the makers of Wallaroo. I'm also one of the authors of Storm Applied, Manning's book on Apache Storm.
Not sure how Wallaroo compares though.
If the 4 terminals required were instead just:
`docker run -d --name wallaroo -v ~/wallaroo-tutorial/celsius:/srv/application-module -p 0.0.0.0:4000:4000 wallaroolabs/wallaroo-quickstart`
Or something similar I think a lot more people on this thread would be trying it out right now.
It looks really promising! If I get the spare time I'm definitely interested enough to give it a whirl.
It's one of the options we are looking at to make it easier to get up and running.
Any particular reason that docker appeals to you?
We're not sure at this point in time what the best means is.
If there was a docker based QuickStart, what would you expect to be able to do with it?
Run the first example app? Something more?