Happy to answer any questions.
I guess the uses are endless. Debug some webhook endpoints by duplicating the request to a private requestbin - granted this isn't the right level to do this but sometimes we're working with legacy setups. Add missing json fields that an API broke for some client - again not ideally level to handle but sometimes we bandaid it. Send an API auth spammer gigs of random garbage because chances are their shitty client doesn't have any limitations - granted after 14.9 seconds you'd need to subrequest to your own garbage generating origin - or a 10gb garbage file somewhere
My uses would mainly be live debugging where we don't have a perfect stack (i.e. nearly all the time).
Beyond that, we support the WebCrypto API, so you are welcome to generate a near infinite amount of cryptographically secure random garbage, and we'll even help out with some from our lava lamps .
If anyone at Cloudflare cares to take a look, here's what I get:
> Server Error
> Our network is literally on fire
> Ray ID: 3fb0bf524995797f
> Your IP address: 126.96.36.199
> Error reference number: 502
> Cloudflare Location: Seattle
2nd try had an Error reference number of 524, with everything else being the same.
Shenanigans like this are why I stopped using Cloudflare on my websites, better to have something I can debug and avoid a party MitM all my connections, than to gain a fleeting bit of "reliability" or "security" that Cloudflare supposedly offers.
Also, it says 50ms of cpu time, 15 seconds of real time as limits. When developing, are there easy ways to get measurements of how long things are taking? I can imagine 50ms on my MBP may not be the same as in production - either faster or slower - but I wouldnt want to get to production to find out.
Oops! That's outdated. Let me go fix that right now...
Workers are already being used in production by customers large and small today.
Also, regarding resource usage, is this memory usage exposed at all to the worker via an API? I'm thinking there may be applications where it would be useful to cache fetched resources but without hitting the memory ceiling and having the requests die with the worker.
We haven't exposed memory information to the process yet for several reasons, one being it's rare to see a Worker that actually needs anything close to the memory limit (128mb). When people do run into that limit it tends to be in error. We would love to see what you can build that does do creative state management such that it would need that information though!
As with any anti-abuse measures, though, it's best if we don't describe in detail what steps we've taken to prevent this sort of attack.
Can you talk about topology relative to ingress and origin? Eg. is execution pinned in select PoPs per ‘region’ or all? Unrelated, I also wonder about retention of edge cache, is it purged via LRU or similar?
Personal note, I was excited to get the early access email, feels like an excellent offering; I’ll be flexing it soon, have some ideas—been getting an HLS system online in the past month using RasPi+CF without a hiccup (720p@4500kbps) and it’s open source/libre, comparable to YT/Twitch—info incl. design/ lectures in profile.
Also read above, congrats on what must’ve been a good year! Speaks well of your org that you’ve been able to deliver and talk freely.
I am his manager and I agree with him
If I understand correctly, you're worrying about false positives, i.e. your site accidentally gets shut down because we incorrectly believe you're launching a DDoS. This is not a plausible outcome of the measures we've currently implemented.
As a CF partner, you should feel free to arrange a private conversation if you are still concerned.
Nope. Every Worker is distributed to every location. Requests are handled at whatever PoP they arrive at.
> I also wonder about retention of edge cache, is it purged via LRU or similar?
TBH I'm not that familiar with the inner workings of our HTTP cache, but I would assume so?
CORS is designed to protect two things:
- The user's cookies for other origins, which the browser will normally send on any request to those origins.
- Behind-the-firewall servers that might be accessible from the user's browser but not from the public internet.
Neither of these things apply to Workers: a Worker obviously has no access to the browser's cookie jar, nor does it have the ability to see behind-the-firewall services since it's not behind-the-firewall.
CORS does NOT protect against DDoS: Typically, CORS does not prevent you from sending requests; it prevents you from seeing the content of the responses. Any web site in a browser can use an <img> tag or submit an invisible <form> to cause a cross-origin request, CORS or not -- but it can't read what comes back.
So, Cloudflare Workers do not enforce CORS. We've implemented different measures specifically to mitigate DDoS.
What are my options if I build something complicated using these workers and for whatever reason I need to stop using Cloudflare? Or if I want to write something open-source that people can deploy either on Cloudflare or on-premise? Is there a reasonable way to emulate this functionality (possibly with less sandboxing) on a self-hosted web server?
Also BTW you should update https://cloudflareworkers.com to note that it's live :)
Great question. We chose to implement the W3C standard Service Workers API in part to give people some flexibility here. E.g. depending on what your Worker does and who your clients are, it might be possible to push your same Worker code to the browser as a regular Service Worker. My hope is that other services choose to implement W3C standard APIs in the future, rather than everyone doing something bespoke as they do today.
I believe there are also some Node.js-based Service Worker harnesses designed for server-side use, though admittedly I haven't tried any myself. It would be cool to see such things developed further.
> Also BTW you should update https://cloudflareworkers.com to note that it's live :)
Happening as we speak. ;)
We're seeing average cold start times under 5ms.
I'd be glad to connect over these features on twitter (same username). I implemented them.
Edit: The question was misunderstood. I asked because cloudflare ToS seems to be against serving just assets and they want to serve entire sites
Asked a sales rep and got told to contact support, which is pretty much impossible as I don't know who are affected until they reported the problem to us.
A few times, out of desperation, I have also tried whitelisting entire countries in Firewall > IP Firewall.
It would be incredibly useful if there was an edge-point service that could return linux epoch time in milliseconds (Thats accuraet to withing a 1ms of UTC time). I've been working on a Live broadcasting syncing system and there really wouldn't be anyone in a better position than CDN's with lambda-like functionality.
Then again being accurate to 1ms worldwide sounds like one of those problems that runs up against the speed of light? We don't have atomic clocks like Google does...
Without a synchronizing service, time can't be trusted to be accurate. Time should always (try) be absolute to a global time, and not individual clocks a devop manually put some numbers in for.
Meanwhile now that the base product is out we will be spending more time improving the tooling, and integrating with Cloudflare Apps so that you can package (and sell) Workers for other people to use on their own sites.
The long answer is webpack, rollup, or browserify :)
We've exposed an API for controlling Cloudflare features from a Worker, including many things commonly controlled via page rules:
We'll be adding more in the future.
Would that still be true if you limited the regex support to DFAs instead of NFAs? I can’t think of a single use-case for backreferences in routing patterns. :)
NFAs and DFAs have equivalent expressive power. For this application, you probably want a determinized DFA (one without epsilon transitions); that's constant time on the input string. (It's still not as fast as Boyer-Moore.)
Backrefs let you write PDAs, which are more powerful than either NFAs or DFAs.
PCRE2 implements DFA-based matching. The cited page says it's slower than the NFA algorithm (although presumably a lower complexity order); I think because of lookaheads and capturing.
 Michael Sipser, _Introduction to the Theory of Computation_.
Yep, that's the thing I was thinking of. Thanks!
Question, since you seem like you might know: is there an alternative regex library—or a plain Yacc parsing grammar that you could stick as a validator in front of PCRE's parser—that would allow through only the regexes which parse out to constant-time (determinized) DFAs?
Tests ive run using CFW as a memory cache have been very very fast.
I can't promise we can raise it, but please feel free to reach out to us (firstname.lastname@example.org) and we'll be happy to weigh in.
I think you just launched the next serverless resolution ;)
Heck yeah. We almost get it for free with V8, except that the default WebAssembly API is designed around dynamically loading code at runtime, whereas we want to make sure that all code we run is uploaded strictly through the config UI/API and is available for forensics. So basically we just need to build a way to upload WebAssembly blobs and import them.
Will there be some kind of session storage or is it completly stateless?
The problem is if we left the WebAssembly API enabled as-is, then people could fetch code from the internet and execute it. We already disable JS eval() so we had to disable the WebAssembly API too.
> Will there be some kind of session storage or is it completly stateless?
We're working on storage. It's a complicated problem. :) (See elsewhere in this thread.)
In the meantime, for many use cases you can use the HTTP cache as a sort of storage. Also, each worker instance has global variables which are writable -- there's no guarantee we won't reset your worker at any time, but generally one worker instance will handle multiple requests so can do some in-memory caching in globals.
The good news is that the Service Worker API is more modern than Node, e.g. using Promises instead of callbacks, which IMO makes it a better experience to work with.
Enterprise customers are allowed to have multiple scripts mapped to different URL routes. This is mainly so that different teams owning different parts of the site don't step on each other. However, generally if you stuff your logic into one script rather than multiple, it will perform better, since that one script is more likely to be "hot" when needed.
Keep in mind that you can write your code as a bunch of modules and then use a tool like webpack to bundle it into one script, before you upload it to Cloudflare.
> for testing/development purposes
I agree with this, we definitely plan to add better ways to test script changes. Currently there is the online preview that shows next to the code editor, but it's true that there are some limits to what you can test with it.
Joking aside, we don't know yet what is and isn't possible. Please build this and other things. If you need help or more resources, including CPU time, please reach out (email@example.com).
I don’t know the details of your environment or timing precision, but I’m quite sure that a single core on a modern server CPU can resize and JPEG compress more than 20 reasonably-sized images per second. :-)
Odd statement and it’s not true.
As a matter of fact the JVM beats the shit out of V8 in everything but startup time.
And this is not an educated guess, I have the same code, some cross-compiled via a compiler that can target both (plenty of such compilers these days, including Scala, Clojure and Kotlin) and some code hand optimized for each specific platform and the difference is huge in both cases. And let’s be clear, this is not code that handles numeric calculations, for which JS would be at a big disadvantage.
Imagine that I’m not running the same unit tests, for JS I have to do much less interations in property based testing, because Travis-ci chokes on the JS tests. So it’s a constant pain that I feel weekly.
And actually anybody that ever worked with both can attest to that. The performance of V8 is rather poor, except for startup time where the situation is reversed, V8 being amongst the best and the JVM amongst the worst if startup time matters.
But because startup time and RAM usage are so important to our use case, Java has never been in the running as a plausible implementation choice, so to be honest I sort of forgot about it. :/
Plus indeed people will be able to use WebAssembly and compile languages like Rust to it.
I think the big things coming down the pipe for the performance of each is: Value Types for java. A lot of what is needed can be done with sun.misc.unsafe already but it's an awkward way to have to program, I haven't kept up with it but hopefully it supports being dropped right onto memory mapped i/o for stuff like CapNProto style parsing.
I think WebAssembly is going to be great, but it's effectively a sandbox for native code, so it targets languages like C++ and Rust, or in other words languages that have a very light runtime and no garbage collector.
I don't know what hooks WebAssembly will be able to provide, but consider that at this point most high level languages would also have to ship at least a garbage collector with the compiled binary, because WebAssembly does not give you one, see open issue: https://github.com/WebAssembly/design/issues/1079
That said the thought of targeting browsers and Node with binaries built out of Rust fills me with joy.
Heh, I was using Scala.js when AWS lambda was first announced I enthusiastically posted to the mailing list at the time that Scala.js would be great with this new paradigm of programming. I think the JVM is going to just be dead in the water on that front. So much of the design of v8 is around beign tossed a bunch of code and being told to start running it at full performance ASAP, whereas the JVM has been optimized under a very different set of constraints (e.g. long running server processes which it pivoted to 15 years ago after the failure of applets).
> I think WebAssembly is going to be great, but it's effectively a sandbox for native code, so it targets languages like C++ and Rust, or in other words languages that have a very light runtime and no garbage collector.
And, In the meantime, Two stories I came across recently, indicate enhancements and polyfills and moving rapidly to bridge the gap:
https://news.ycombinator.com/item?id=16585315 (Making WebAssembly better for Rust and for all languages)
https://github.com/alexcrichton/wasm-bindgen (A project for facilitating high-level interactions between wasm modules and JS.)
> I don't know what hooks WebAssembly will be able to provide, but consider that at this point most high level languages would also have to ship at least a garbage collector with the compiled binary, because WebAssembly does not give you one, see open issue: https://github.com/WebAssembly/design/issues/1079
I could see there being some sort of jvm bytecode interpreter written in wasm happening some handy wavy time in the "future". My hunch is that wasm going to become the universal low level bytecode, so I think there will be more and more projects emitting wasm as compilation target in addition to then eventually surpassing x86 / arm etc (obviously this won't happen overnight).
That does still leave the area of server processes with long uptimes that benefit from JIT performance optimizations and which are ok with garbage collection overheads.
> That said the thought of targeting browsers and Node with binaries built out of Rust fills me with joy.
With all great power ... And a bit of fear of having articles & guis rendered to canvas removing the ability for users to control their interaction with the web.
Maybe once that happens you'll start seeing your node servers stop falling over.
1: e.g. https://github.com/WebAssembly/host-bindings (Host Bindings Proposal for WebAssembly:
This repository is a clone of github.com/WebAssembly/spec/. It is meant for discussion, prototype specification and implementation of a proposal to add host object bindings (including JS + DOM) support to WebAssembly.)
Not a criticism, this it looks like an excellent product for those who can benefit from it, just calling it out for others that read the comments here before the original content.
On the other hand CloudFlare Workers looks more distributed, but suitable just for 50ms CPU time, 15s wall time and 128 MB. This is enough for redirects, A/B testing, but often not enough for writing serverless applications or any kind of rendering.
I wonder whether CloudFlare wants to get into serverless business and this is first iteration or if it's just a CDN which is more customisable by allowing code to run there.
Which would be interesting for cheap hosting while also being on cloudflare CDN.
Are you going to publish more info about that HTTP server? Like the technology used to write it.
For now, here's the source code for the HTTP implementation. :)
That's from the KJ C++ toolkit library, which is part of the Cap'n Proto project, which is my own work that predates my time at Cloudflare. We're using this in the Workers engine and updating the code as needed.
Yes, I know, NIH syndrome. But it has been pretty handy to be able to jump in and make changes to the underlying HTTP code whenever we need to.
I hope to spend some time writing better docs for KJ, then write a blog post all about it.
The only major dependencies of the core Workers runtime are V8, KJ, and BoringSSL.
I believe it would really be a game changer if we could open and maintain an open connection with a database.
Not sure how Cloudflare workers behaves here, but from their docs they recommend global variables a way to persist state, so perhaps an outbound TCP connection will stay alive across invocations of the same V8 "process".
My use case was to contact Redis and I wasn't able to maintain a connection open.
PS. If you're a distributed storage expert and want to work on this, we're hiring!
But I don't believe >40 or so nodes has never been Erlang's strong point.
Edit: with the conclusion being of course that you wouldn't want to strongly connect the edge nodes, so instead it'd be something more traditional, where if people pay for a storage add-on, you folks are automating the schlep of spinning them up a long-lived storage cluster, and networking it so that only their edge nodes can access it. At which point you could be building on anything, maybe even something that already exists like Redis or etcd. Hmm, though for hello-world purposes I might still see what could be put together in beam-land, where everything is more at your fingertips ...
You definitely know this world better than myself, but as first approximation I wouldn't try to provide storage, which we know how messy it can be.
Just keep a TCP connection open can go a very long way and it will get costumer to hook to redis or basically any other database.
2. Sorry, couldn't find the link. IIRC Fastly does this with its in-built logging feature.
Then there are things people are just starting to think about, like doing a/b testing by serving different variants from the edge and building their API gateway into the edge.
Finally there are things people will only start dreaming of now that the tech is available, like filtering the massive stream of data coming in from IoT at the edge, or powering interactive experiences that require compute which individual machines don't have, but speed which centralization can't provide.
These numbers 50 ms of compute time and 15 s idle are interesting in the serverless space. Now I'm waiting for sane performance suites to figure out what suites you, I'm guessin this solution will kill in these strange latency test for AWS lambda from the other day: https://news.ycombinator.com/item?id=16542286
FWIW that's something I'm working on. It's tricky because it's not actually in the Service Workers standard. Currently, we support WebSocket pass-through (so if you do something like rewrite a request and return the response, and it happens to be a WebSocket handshake, it will "just work"), but haven't yet added support for terminating a WebSocket directly in a Worker (either as client or server).
Getting it to work with Cloudflare Workers is a little more involved since our Node libraries don't run in their Service Worker environment, but if you implement the negotiations manually it does work.
We made an intentional decision early on to avoid providing any precise timers in Workers -- even Date.now() only returns the time of last I/O (so it doesn't advance during a tight loop). This proved to be a really good idea when Spectre hit. (But we also shipped V8's Spectre mitigations pretty much immediately when they appeared in git -- well before they landed in Chrome.)
Do workers play nicely with managed CNAMEs?
Though 0.50 per million is only true if you have more than 10 million requests per month, otherwise the price will be higher since there is 5$ minimum.
2. The performance of the code will be much higher than in the browser because of the server resources available and also because and subrequests will happen across Cloudflare's fast/reliable links and not whatever the end user is connected to.
4. Security: you can include things like API keys.
5. Script starts executing earlier.
6. Conserves bandwidth/battery life of mobile users.
- When you need to work with older browsers or non-browser clients (e.g. API clients!) that don't support service workers.
- When it would be a security problem if the user can bypass or interfere with the worker (e.g. you can implement authentication in a CF Worker).
- When you specifically want to optimize your use of the shared HTTP cache at the Cloudflare PoP.
- When startup time of a service worker would be problematic.
- When CORS would prevent you from making the requests you need from the browser side. (CORS doesn't apply to CF Workers, since the things CORS needs to protect against are inherent to the client-side environment.)
There's a lot of interesting stuff you can do with edge applications. Image optimization/resizing, content rewrites, pre-rendering, API gateway, etc, etc, etc. These are all things you want to do once for many visitors.
There's a lot you _can't_ do in a browser because browsers are untrusted. Edge applications can run with a different level of trust.
: Cloudflare now implements Privacy Pass which means Tor users mostly don't see captchas anymore.
: Please read: https://blog.cloudflare.com/why-we-terminated-daily-stormer/
You missed what I think is the most important thing: Cloudflare currently entails correlated risk, for lack of a better term. A government intrusion into CF represents access to thousands and thousands of sites' decrypted streams. This is a huge target for the US, Russian, and other spy agencies, to the extent that I cannot believe you're not already compromised.
All those small customers who are using you for free TLS should be using Let's Encrypt so they can get end-to-end encryption, necessitating individual, active attacks (I suppose on DNS) rather than sweeping, passive attacks.
I think there are some cool and good things that Cloudflare does, but it's irresponsible to minimize the threat it presents to privacy in today's internet.
[Edit: Also, if you don't want to respond to this thread, I will totally understand, and think that's reasonable. I don't want to shit on your cake!]
Do you use any cloud providers?
And yes, if AWS were compromised, that would suck. But right now a lot of CloudFlare sites are backed by AWS. So now their traffic is at risk in two places, not just one.
I don't tend to use cloud providers, no. I self-host some stuff out of my house, with reliance on DNS and CAs being the major points of "correlated risk". I use S3 for serving some public files.
Why is this different from a bunch of people running a LAMP monoculture on their own individual servers?
If anything, Cloudflare can use economies of scale to staff a dedicated incident response team, assuming that at all times they are already compromised and trying to stop each attacker. They can invest in systemic least-privilege isolation. They can test the latest upstream versions of software in CI and deploy patches quickly and have 24/7 on-call staff to manage those deployments. I can't do any of that on the Raspberry Pi in my bedroom. If an intelligence agency or even a not-that-intelligent agency decides they want in, they just need to wait for the next zero-day in L, A, M, or P, and bet correctly that I'm not going to patch and restart my server until at least when I get home from work. Scaling this to everyone like me is just a matter of putting their exploit in a for loop.
And I do server maintenance as my day job. I've maintained a many-thousands-of-users shared web host that has been broken into. I certainly don't expect myself as a hobbyist to do a good job of maintaining my systems; what about the person who just wants to run a website and has zero professional experience being a sysadmin?
2. If you're running behind Cloudflare, one pretty straightforward and common thing is to configure your web server to only respond to requests from Cloudflare. Since Cloudflare has its own WAF that's updated by a skilled security team, this decreases your exposure - something like Shellshock or the Rails mass assignment vulnerability would get dropped at the Cloudflare level before it makes it to your origin server, and nobody else can send you HTTP requests.
(At that point you can configure your machine for SSH keys only and reduce your attack surface to pre-authentication OpenSSH vulnerabilities.)
So I don't think you have two problems if you use Cloudflare. You are trading off one problem for another, yes, but for most people that's the right tradeoff.
However, the risk I was talking about was not things like CVEs that random people are scanning for, but the spectre of state actors (or similar) compromising an entire provider. That applies to both AWS and Cloudflare, so if you use both, your risk is higher. (Or perhaps more importantly, the risk for your users is higher.)
The cost to a state actor to mass-exploit a random 0-day on hobbyist targets, set up a persistent back door, and leave is very low. The benefit is low, too, but there's no real reason why they shouldn't do it just in case they end up needing it ever. And the risk is low because they look just like "random people."
Even if Cloudflare is a big champion of better encryption and is currently not doing anything shady with this ability, it's a concerning power for one organization to have.
(Of course, if Cloudflare doesn't see the plaintext, then disregard)
Part of the move to the cloud was the decision that well organized companies with large security teams can do a better job protecting internet resources than the vast majority of individuals. Cloudflare is just that, for cache/firewall/etc. appliances, I don't see the difference.
Didn't really think about how many services do this: AWS' ELB, any serverless service, Heroku and other PaaS services, etc.
Even a plain VM is easily observable for whoever is hosting it. At the end of the day you have to either trust your service providers or do it yourself, whether that's securing your network infrastructure or emptying the trash can next to your desk.
With let's encrypt SSL certs are free and easier to use than ever.
>: Cloudflare now implements Privacy Pass which means Tor users mostly don't see captchas anymore.
Yeah, at the expense of deanomymizing them and only if they install your addon in their web browser!
Yes, you're missing the whole cryptographic underpinnings of Privacy Pass which make it impossible to de-anonymize the user. I know, it sounds like impossible magic at first, but read the papers -- it actually works.
By different users do you mean that it demonstrates to the server that multiple instances of the plugin are behind the same Tor circuit? If it's using different tokens, I don't think the server gets to learn that; it could be multiple instances of the plugin accessing 30 pages each, or one instance accessing 60 pages.
No, you don't. You proxy their traffic and encrypt it on the way out, undermining the security of the Web by creating a single point of failure that you've proven unable to defend.
That's, at least, a massively better position on censorship than many other web companies, like Google, who explicitly believes they should remove websites they don't like from the Internet.
If you want to rag on Cloudflare as a shabby internet citizen, ask why they provide DDoS protection services while also hosting most DDoS-for-hire sites: https://www.google.com/search?q=booter
If you have a problem with specific orders of our government, the correct avenue of upset is with them, or, should they be unwilling to change their position, by voting.