jonaf 368 days ago [-]
Something else people should know: AWS ES is on the Internet. You can't deploy it to a vpc yet, and you can only lock it down using IAM, which may or may not be good enough for your use-case.

For those that prefer the ease provided by AWS ES Service, consider Elastic Cloud, which affords most of the same capabilities but is run by Elastic themselves (it was previously known as Found, which Elastic purchased a few years ago). There's also an Enterprise offering. If you're looking for a hosted Elasticsearch solution, it's probably better than what AWS is offering. Side note: they update about as often as elastic releases, whereas AWS ES is consistently behind.

edaemon 368 days ago [-]
As Daniel Parker ( mentioned in the article comments, they went with Algolia [1] despite being AWS experts. He cited the uncertainty and complexity around AWS ES as the problem.


closeparen 368 days ago [-]
An attacker is only ever RCE on one server away from being on your VPC subnet. You're going to have to set up authentication for internal applications anyway, although I suppose vulnerabilities in the login process are harder to exploit if you can't even get to it.
jchw 368 days ago [-]
I'm curious about how the pricing compares. I'm not very satisfied with AWS ES, but managing ES manually doesn't seem like the most fun either. (In fairness it looks like there's not too many knobs to turn, but it's still another concern to have to personally deal with.)
andrewvc 368 days ago [-]
Elastic cloud doesn't require any more management than AWS. You just need to click some buttons to add capacity
luhn 368 days ago [-]
The IAM authentication is really annoying. It's not supported by many client libraries, nor have I found an easy way to make arbitrary HTTP calls with signature v4.

The only other options are completely public or IP-based whitelist, the latter which is untenable in most cloud environments.

ecnahc515 368 days ago [-]
You can also use a signing proxy.
luhn 368 days ago [-]
I wasn't aware of that option. I'll look into it.
Leon 367 days ago [-]
A simple solution in this vein is to white list your the EIP addresses of your NAT. This would give access to all resources in a private subnet (this is useful for Lambda's running in subnets).
Sir_Substance 367 days ago [-]
>nor have I found an easy way to make arbitrary HTTP calls with signature v4.

okigan 365 days ago [-]
Yep, that's precisely why I made awscurl "easy way to make calls to AWS".

I can be easily tested with AWS Elasticsearch.

Sir_Substance 363 days ago [-]
It's a great tool man, I use it tonnes, thanks for making it!
justonepost 368 days ago [-]
This sounds good. Any feedback on cost? How is plugin support / security? Integration with IAM?
brasetvik 368 days ago [-]
(Disclaimer: I work on Elastic's Cloud team)

While AWS ES can be cheaper in some configurations, Elastic Cloud is actually quite competitive in pricing for larger clusters when compared to AWS' ES-service. This post compares the two services, and there's an example price comparison at the end of the post:

We support most official plugins, and if you get a gold or platinum subscription you can upload your own plugins. Elastic's X-Pack is included in every cluster, which includes security features like role based access control.

It's not possible for external service providers to integrate with IAM at this point.

bpicolo 368 days ago [-]
One issue I've found with Elastic Cloud is that there doesn't seem to be a horizontal scale-out option other than multi-DC or getting bigger boxes. Is horizontal scaling in the works? Easy horizontal scaling seems like one of the better benefits of ES.

Or, alternatively, am I mistaken about how configuration works?

brasetvik 368 days ago [-]
Per availability zone, Elastic Cloud currently scales vertically (in power-of-two increments) until a cluster hits 64GiB memory, at which point multiple 64GiB-nodes are added. While you can run Elasticsearch with e.g. two 8GiB nodes per zone, we prefer a single 16GiB node as there's fewer things that can go wrong. (If you want the second 8GiB node for redundancy, then that's exactly what our multi-zone HA configurations are for, and we encourage HA setups by making them less than twice (or thrice) the price, throwing in additional master-only nodes for free)

(A bit of history: When Found (the company Elastic acquired and which is now Elastic Cloud) was in private beta in early 2012, we actually did allow custom cluster topologies. We ultimately disabled that as it was overwhelmingly used to make sub-optimal cluster configurations, such as 5 x 1GiB memory nodes)

368 days ago [-]
randerson 368 days ago [-]
Any idea if/when will support multiple user accounts and 2FA in the management portal? Not having those was a deal-breaker for us when we evaluated it awhile back, and was the sole reason we went with the less stable AWS service.
brasetvik 368 days ago [-]
This is (understandably!) a common request, and both are on our roadmap.
boyter 368 days ago [-]
My understanding is that this is going to change and soon. Amazon has been poaching key elastic search employees presumably with the idea to improve the service to on par or better.
canhazelastic 367 days ago [-]
Elastic employee here, and this is the first I'm hearing of that. I don't know a single person that has left for Amazon, from any team. Certainly no "key elastic search employees".

Got a source?

cavisne 368 days ago [-]
Article is a bit naive about what it takes to run a shared service. Any API that AWS ES exposes has to be there forever, clearly pending_tasks had some risk of leaking internal implementation details that either couldn't be exposed, or that they didn't want customers building a dependency on.

Likewise with the doubling of nodes, this is obviously a blue-green style deployment. In place updates would be quicker but ES can get into all sorts of weird states that require manual debugging to fix, with blue green for most of the deployment you can simply flip back.

I've been pretty impressed with AWS ES compared to running it myself (other than the poor fit of IAM auth)

dmix 367 days ago [-]
> Any API that AWS ES exposes has to be there forever, clearly pending_tasks had some risk of leaking internal implementation details that either couldn't be exposed, or that they didn't want customers building a dependency on.

If this is a reality of using cloud based ES then clearly it's something to seriously consider before using it - which is all the author is saying. The article is titled 'things to consider' not 'things AWS needs to fix'.

ES is big complex beast of a Java app. This is good advice regardless from someone who has used both approaches (self hosted vs AWS) in production.

I did not get the impress that he's saying that AWS can resolve this easily.

jack9 367 days ago [-]
> Article is a bit naive about what it takes to run a shared service.

This is a bad assumption. Loggly is a shared service.

> Any API that AWS ES exposes has to be there forever

This a bad assumption. No API is forever. Maybe you meant a different timescale. AWS has removed and made breaking changes to APIs over the years (e.g. random breaking change:

Andys 368 days ago [-]
Worth mentioning that themselves run a hosted service on AWS that is of a high quality and has none of these flaws.
jknoepfler 368 days ago [-]
I can confirm the author's frustrations with AWS ES. Having set up clusters on my own (on EC2 hosts) and using the service... The latter is expensive, inefficient, behind on features, hard to integrate with, and generally just a really crappy piece of work (like almost every peripheral AWS service, ie anything but EC2, S3, and DynamoDb).

Elastic search is honestly pretty simple to set up, save yourself money and trouble and just do it.

theparanoid 368 days ago [-]
I had the same experience with their code hosting products (CodeCommit). It was better to just setup an EC2 instance and manage it myself.
cheald 367 days ago [-]
Seconded. ES is honestly one of the lowest-maintenance products I've ever deployed. It has a few quirks, but for the most part, it Just Works.
rpedela 368 days ago [-]
This appears to be somewhat out of date. If you use version 5.x then pending_tasks is available.

good_regex 368 days ago [-]
Good to know, I didn't realize 5.x has that API available. Why it's only available in 5.x makes no since ES has had the API since at least 1.x
Roritharr 368 days ago [-]
Having a DevOps Engineer that wants me to go the AWS Dedicated Everything Route, I need articles like this to explain to him my fear that our problems will just change, not go away, by going that route. + Adding a fat layer of dependency.
meddlepal 368 days ago [-]
Complexity never goes away... it just shifts. I dunno if that is a common saying or not but a former coworker of mine once said it and it's very true IMO.

I do infrastructure engineering for a small startup and really I think with any of these managed systems you need to step back and evaluate them within the context of TCO, lock-in, security, reliability, performance and flexibility/customizability. I've heard ES isn't that much of a PITA to manage on its own, but on the flip-side I'd never sign up a small team to run PgSQL at scale.

vacri 368 days ago [-]
> I've heard ES isn't that much of a PITA to manage on its own

I just run ES for my logstash setup, and ES is lovely and rock-solid... except when it isn't. For example ES deciding to just silently refuse input when its disk is 90% full - that was a bit hard to find when it happened. ES looked alive, but hunting down the reason why it stopped wasn't trivial. I've had a couple of similar but lesser gotchas as well.

I guess you could say of my experience that it's not that much of a PITA (as you say), but it is still a bit of a PITA.

Disclaimer: if these things weren't a bit of a PITA, there'd be no need for us sysadmins, so I should be grateful...

subway 368 days ago [-]
Also fun is being an engineer who constantly has to explain this fear to VPs with a "Nobody ever got fired for using Amazon" mindset.
scaryclam 367 days ago [-]
You (and other respondents to your comment) are right in that the problems will change rather than go away.

AWS has many cool toys and I use a subset of them every day. However, there's no way in hell I'd entertain the idea of going in fully for everything we do. Not only are there a bunch of inadequate services, they can also be nasty to debug and cause more problems than they're worth.

It sounds like you may have an inexperienced guy getting overly enthusiastic about what he could achieve instead of focusing on what's required (I don't mean to insult him, it just sounds like he may not know enough about infrastructure to be making these decisions properly). Being provider agnostic (at least as much as you can be) is currently a way I see a lot of companies starting to leverage the great tools that cloud providers have, but being able to be free enough to chop and change as the companies needs evolve.

Maybe point him towards things like Terraform and get him looking at what Google cloud and Azure can do as well as AWS?

pg_is_a_butt 368 days ago [-]
yeah, i mean, why research yourself.... it's not like it's your job. just decide to follow your fears, and search out the fake news that will feed your desires.

you're completely pathetic.

you're all idiots.

hendzen 368 days ago [-]
Not all AWS services are created equal... Some are rock solid and others (cough Data Pipeline cough Kinesis Firehose cough) must have been written by interns.
piggybox 368 days ago [-]
I don't use data pipeline anymore, but 2 years ago when I was using it in my previous company there were a few moments that drove me nuts. I vaguely remember one day AWS upgraded data pipeline to a newer version and broke hundreds of our pipelines that write data to Redshift. We contacted AWS and they rolled that update back...
aianus 368 days ago [-]
What issues did you have with kinesis firehose? I just deployed a couple of those and would like to know what to watch out for.
hendzen 367 days ago [-]
Lots of downtime. Probably 3-4 serious incidents within a 6 month period. We also had a high number of transient errors that AWS Support considered 'expected'.
bpicolo 368 days ago [-]
One issue we've hit is that logrotate with copytruncate enabled breaks firehose, but afaik it's mostly been good.
otterley 368 days ago [-]
Using copytruncate breaks a lot of software, not just Firehose. I generally discourage its use in favor of addressing whatever root cause is making you want to use it.
vacri 368 days ago [-]
In my experience, any 'improve logging' ticket goes straight to the back of the dev's backlog. Followed the next week by complaints about the logging system not being all that good... :)
zodiac 368 days ago [-]
Why the hate for interns :'(
Etheryte 367 days ago [-]
Regardless of how you look at it, writing software is a hard problem and takes experience to do reasonably well. There's nothing wrong with interns themselves, rather the practice that many companies seem to follow where unsupervised or poorly supervised intern code goes right into big products.
sugaraplha 368 days ago [-]
What issues did you have with AWS Data Pipelines?
hendzen 367 days ago [-]
Pipelines randomly failing with inscrutable error messages. High error rates.
kiernanmcgowan 368 days ago [-]
Another point of frustration is that because these endpoints are locked down[0] you cannot fully use management tools like curator -


jbyers 368 days ago [-]
Curator support got better in the AWS ES 5.3 release a few weeks ago:

"AWS ES 5.3 officially supports Curator now. Documentation has been updated to reflect this."

yissachar 368 days ago [-]
You still can't use Curator to take snapshots, because AWS ES doesn't expose the `/_snapshot/_status` endpoint.

coredog64 368 days ago [-]
If you are stuck on 5.1.2, the Talend fork of curator works.

The change is trivial, so I get the sense that Elastic is just fucking with Amazon.

skywhopper 367 days ago [-]
Like almost everything else you can build on AWS managed services (RDS, Elasticache, API Gateway/Lambda, Kinesis, etc), if it's truly critical to your application's uptime, you should be managing it yourself.

But if your need for ES is to support a backend system that would make your life inconvenient for a while if there are problems, is relatively small and won't grow too fast, but isn't business-critical, then the AWS managed service is fine.

justonepost 368 days ago [-]
Good warning. Yeah, beta software released to production.
StreamBright 368 days ago [-]
There is that and also from fragility from ES. I was wondering what alternatives are out there to ES. I know of Solr only.
justonepost 368 days ago [-]
Solr has the added complexity of zookeeper. ES isn't bad, but in an MT context you really have to layer a lot on top for security and configurability.

It's possible it's not MT and they just didn't write the facade APIs. That'd be pretty crazy.

My biggest complaint would be lack of plugin support.

bpicolo 368 days ago [-]
To be fair, ZK is great at it's job, and that responsibility is something ES has had a lot of trouble replicating.
StreamBright 367 days ago [-]
If you want to have consistency in a distributed system you need something like ZK. If ES does not have ZK it surely has something else, probably with different trade offs.
rpedela 368 days ago [-]
They have fixed most of the stability issues in 5.x. I suspect some of the problems people have had with AWS ES is actually using a pre-5.x version.
coredog64 368 days ago [-]
Our 2.x cluster is much more stable than our 5.1.2 cluster despite the fact that our 5.x cluster is significantly smaller (in node count) than it's older brother.

Also of note: Amazon's documentation on HTTP limits is wrong. There are some instance types listed as having a 100mb max payload that are only 10mb. We found that out when Logstash recorded a crapload of errors with the 10mb limit on what was allegedly a 100mb instance type.

StreamBright 367 days ago [-]
Well after talking to some of their tech guys we determined that we have different definitions for the work "fixed".
rpedela 367 days ago [-]
Such as?
StreamBright 367 days ago [-]
Such as disallowing configuration they do not like, having memory/resource leaks that they cannot find the root cause for and few other things I forgot already. G1GC is disallowed because they had a data corruption bug with it. This is few months back, they might changed some it already. The question for me is what do I get using ES over Solr? If Solr's features are enough for our use cases should I even try ES?
rpedela 367 days ago [-]
Here is their resiliency status:

It depends on your use case. If you are already familiar with Solr and it is good enough for your use case, then use it. Solr and ES are about the same feature-wise. Scaling is easier for ES because it is built-in. Here is a good comparison of their APIs.

part 1:

part 2:

trengrj 368 days ago [-]
I looked at using AWS Elasticsearch Service for a project but had to back out due to the lack of plugin support. Running elasticsearch yourself, even in a HA setup is actually fairly easy.
CD1212 367 days ago [-]
Does anyone have any experience of's hosted Elasticsearch offering?

I have been using it on a new project the last couple of weeks and it seems to be working well.

petethepig 368 days ago [-]
Had a very similar experience with Redis on ElastiCache. When things go south, it's really hard to debug. You don't get access to logs, you don't get to change a lot of config parameters.

Had to provision our own EC2 instances.

It was 2 years ago though, things might be different now.

jakozaur 368 days ago [-]
It even get worse if you use Aws ElastSearch for logs. Logs are usually high volume and it can quickly beczme nightmare.
coredog64 368 days ago [-]
It used to be worse. The max EBS volume size was 512GB (with 15% reserved for Amazon) and a max cluster size of 20.

We hit that limit and had to ruthlessly prune live data.

You can now add 1.5TB per node (with very large and expensive instance types) as well as scale past 20. Requesting the limit increase was a lot more difficult than most other limit increases.

MightySCollins 368 days ago [-]
I just want to get rid of the stupid proxy I have just so I can make it work... Amazon just let me put it in a VPC
manigandham 367 days ago [-]
Side note: reading anything on is frustrating, such a slow and janky site for a glorified blog.
ianamartin 368 days ago [-]
Bookmarked to look at the next time my boss wants to lock us into yet another AWS service. Thank you.
jdc0589 367 days ago [-]
there's also the bit about how adding a whitelisted IP for access takes like 20 FREAKING MINUTES to take effect.
whatnotests 368 days ago [-]
Wow WTH Amazon? Take the training wheels off already or fix the defaults for this service.
moonka 368 days ago [-]
Preferably both.
xchaotic 368 days ago [-]
Brutal plug, but MarkLogic is a good alternative if you want a good search solution that runs and scales on AWS (and you can migrate to another cloud or on prem)