NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Show HN: Fast Rolling Quantiles for Python (github.com)
alexhutcheson 1123 days ago [-]
This is pretty cool. The title would be a bit more descriptive if it were “Fast Rolling Quantile Filters for Python”, since the high-pass/low-pass filter functionality seems to be the focus.

The README mentions it uses binary heaps - if you’re willing to accept some (bounded) approximation, then it should be possible to reduce memory usage and somewhat reduce runtime by using a sketching data structure like Dunning’s t-digest: https://github.com/tdunning/t-digest/blob/main/docs/t-digest....

There is an open source Python implementation, although I haven’t used it and can’t vouch for its quality: https://github.com/CamDavidsonPilon/tdigest

cb321 1123 days ago [-]
You can also just get/use exact answers with the `accumulation_tree` package that Python `tdigest` impl uses, though for large windows a B-tree is faster than a red-black ranked/instrumented binary tree. Either tree also scales better than heap pairs to multiple quantiles per moving window (like 5%, 50%, 95%). Python's `blist` package may not have the exact API you need to build from, but it is very close. (`blist` also does not optimize for the "histogramming case" where there can be many duplicate values, IIRC. Instead each value gets its own independent slot in the data structure.)
myrl 1123 days ago [-]
Thanks, I'll be sure to check these out! It's a really neat idea behind that paper.
aldanor 1123 days ago [-]
Might be worth mentioning: `bottleneck` package, implemented in C, with various moving window functions - https://bottleneck.readthedocs.io/en/latest/reference.html#m...
SatvikBeri 1123 days ago [-]
Awesome! Rolling quantiles were a major bottleneck in my work a couple years ago, so much so that I ended up removing them and using approximations instead. It'll be good to try this if I ever go back to that code.
1123 days ago [-]
AyrtonB 1123 days ago [-]
What's the advantage of this over Pandas' standard rolling quantiles?

import numpy as np

import pandas as pd

s_input = pd.Series(np.random.randn(1000))

s_p10 = s_input.rolling(10).quantile(0.1)

nvilcins 1123 days ago [-]
xiphias2 1123 days ago [-]
Wouldn’t it make more sense to speed up the current APIs used by lots of programs over introducing a new API?
myrl 1123 days ago [-]
Perhaps, but as far as I'm aware, none of them are as flexible as my interface (to support e.g. streaming, pipelining.) I have not looked at all of pandas' time-series functionality that closely.
xiphias2 1123 days ago [-]
I see, it's interesting that ypu wrote it all in C, pandas in Python
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 08:52:26 GMT+0000 (Coordinated Universal Time) with Vercel.