nine_k 289 days ago [-]
I prefer to prepare the datasets first, and not depend on the order of evaluation (which you may fail to predict correctly):

  select * from some_table
  join (select rand() as random_id) on
    some_table.the_id = random_id
This way it is obvious that rand() will be evaluated only once.
mjevans 289 days ago [-]
I like your query as a portable version of the written function, but I also think that it could fail to yield the results probably intended before that mess was written.

If the intent is actually to select one, and only one, random row from the table than this Stack Overflow entry has a result that is in most cases in the direction I was thinking (LIMIT 1, and have some way of selecting a random entry, remembering that the ID field might be monotonic, but isn't assured to be continuous (have no gaps))

Unfortunately it seems that it depends on language extensions that are not portable.


It unfortunately also appears that the easiest query scales poorly to large tables as it works by assigning EACH row a random ID and then limiting it to one of those by some order (probably lowest assigned row).

A more effective approach might be a function or procedure based around a pre-recorded 'maximum' ID for the column (assuming it's vaguely monotonic and not a GUID), picking a random result between base and that ID, Verifying it exists, and if it doesn't trying again up to N times before using a fallback procedure. (E.G. after 100 random stabs that fail selecting ~500 records in the range of lastRND +/- 250 and using the ORDER BY rand() or equivalent on that sub-set). If even /that/ fails pick the next or previous record by which is closest.

nine_k 289 days ago [-]
I think that something like

  select * from the_table 
  offset rand(select count(*) from the_table)
  limit 1;
would do the trick with least bias, though not necessarily very fast.
mjevans 289 days ago [-]
The offered example query is could return no results in the presence of record gaps / deleted records; also the OFFSET value should probably be computed within a sub-query, not exposed.

A corrected version based on being >= or <= the random number (picking the nearest) COULD work (in fact, that's the way my suggest final fallback works), but would still be biased.

That was the reason I suggested taking a limited number of stabs at random numbers within the allowed range; my example still has some slight bias (a little in the 'best case' fallback, and a lot in the absolute fallback), but it tries to deliver a truly random result first, many times.

wolf550e 289 days ago [-]
Why do people use "i.e." instead of "e.g."?

EDIT:" For people who think I don't know what they mean, I do. I meant that the author of TFA obviously meant to write "Nondeterministic Functions in MySQL (e.g. Rand)", because the entire article is about all non-deterministic functions and just uses rand as an example.

lotyrin 289 days ago [-]
Why do people use the wrong there/their/they're?

Why do people match salmon colored items with rose ones?

Why do people sing off key?

They aren't themselves sensitive to the difference and don't think about it.

thisacctforreal 289 days ago [-]
"i.e." – for "id est" in latin – means "that is", or "in other words", "in essence".

"e.g." is "exempli gratia" in latin; just meaning "for example".

qubex 289 days ago [-]
Because they mean absolutely different things: ”e.g.” stands for “exempli gratia” (“for example”) while “i.e.” stands for “id est” (”in other words”).
optimuspaul 289 days ago [-]
well in this case they only really talk about the one function so i.e. is not incorrect.
paulddraper 289 days ago [-]
They say "Functions" though.
abundant 289 days ago [-]
Here one would read “that is” as something like “specifically”.
Zalastax 289 days ago [-]
But that's not how one should read "that is". The author should have used "e.g." here, no question. Using "i.e." would imply that there exists only one nondeterministic function in MySQL, which is not the case.
paulddraper 289 days ago [-]
Yeah, or I'd say "namely"
cryptonector 289 days ago [-]
As to SQLite3, the author should have indicated what version they tested, and they should have tested more versions. SQLite3 3.8.3 introduced a flag for marking functions as deterministic (vs. not), and in the version I just tried (3.11.0) it works as expected.

The rule I expect for where clause expression evaluation is: standard short-circuit semantics. That calls to deterministic functions get memoized, or that constant sub-expressions get hoisted out, is a plus -- I do expect GCSE and optimization of deterministic functions from modern RDBMSes. I also expect that functions meant to be used in queries have no side effects -- that they insert/update/delete no rows on any tables.

jcriddle4 289 days ago [-]
For SQL Server it looks like an index seek on the following so it is evaluated once.

SELECT * FROM SomeTable WHERE ID = CAST(Rand() * 1000 as int)

In addition this query: SELECT TOP(100) rand() AS RandData from SomeTable

Returns: 0.940284064056996 0.940284064056996 0.940284064056996 . . . .

A old trick to actually get random data on SQL Server that I think has been replaced with newer functionality is:


However if we move the NEWID.... stuff into the WHERE clause we end up switching to an index seek so back to being evaluated once.

alikrubin 285 days ago [-]
My name is Alex, I'm the author of the original article on Percona Blog. I wanted to thank everyone for the great comments and discussion! Additional SQL examples are great as well.
WhyNotHugo 289 days ago [-]
FWIW, the proper what to get a random row is actually:

    SELECT column FROM table
    LIMIT 1