NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Make invalid states unrepresentable (2023) (geeklaunch.io)
hyperman1 6 days ago [-]
I once got a program that was a never-ending source of bugs. It had huge forms, with fields becoming valid or invalid depending on the state of other fields. Each field had listeners to activate/deactivate controls as needed, but the interactions had become so complex that they were full of tiny mistakes. Each tiny mistake slightly corrupted the state, triggering other mistakes, until the whole thing snowballed out of control. Costly business mistakes followed.

After a while it occurred to me that all state was duplicate/triplicate/..., e.g. a checkbox on screen and a boolean field. Most bugs amounted to inconsistencies between the duplicates. So we wrote for each form 2 big methods: One copied the records to the UI, the other the UI to the record. All listeners became a 3 step process: Copy complete UI to record, do the change only in the record, copy complete record to UI.

In a way this was wasteful: Typing 3 characters would enable or disable most GUI elements 3 times. But computers had become fast enough that this was unnoticeable. The dynamic of the program changed: instead of tiny mistakes in the code spiraling out of control, they would disappear as the next user action would most probably fix them. Bugs basically dried up overnight after what amounted to a small code change.

In this case, it was too late to make invalid states unrepresentable, but we managed to declare 1 part of the state correct, and derive all the other state from it. I learned a lot about state management from that experience.

pjc50 6 days ago [-]
> but we managed to declare 1 part of the state correct, and derive all the other state from it

Isn't this the essential philosophy of React?

The other approach is INotifyPropertyChanged-style: an event for every change, which is supposed to propagate to all listeners, but only propagate if the value is different. It sounds like this is the one that had failed in the project you were given.

> Typing 3 characters would enable or disable most GUI elements 3 times

I don't think this matters so long as you can coalesce all the redraws. See also "immediate mode" GUIs: if you always redraw everything from the canonical state, which modern computers are extremely fast at, a lot of complexity goes away.

hyperman1 6 days ago [-]
We're talking a windows gui application, 20 years past. Compute was not as fast as today, so at the time he speed concern was more relevant.

In a way, the core change in mindset was realizing that computers got fast enough for this architecture to be reasonable. Some of the devs had worked with a machine having 1MB of memory shared by everyone, and snapping out of that is hard. I remember someone on that project lamenting we would be wasting whole kilobytes of memory. Funny even then, but today I recoil from electron, much for the same reason.

Today, you'd be absolutely right.

4ad 6 days ago [-]
> it was too late to make invalid states unrepresentable

It wasn't, that's exactly what you did, albeit not with types, but by removing redundancies from the state. It's the same reason why normalizing relational databases is a good idea.

chii 6 days ago [-]
> It wasn't, that's exactly what you did

the thing with using types to make unrepresentable state is that it ensures the compiler is the one checking.

By doing it "manually" this way, you're not saving much effort. You still had to make the analysis of which state is valid, and to code it up (hopefully without a mistake). The state of the program can also temporarily be invalid - just happens to fast for the user to notice (thus "fixing" the bug).

It's probably the best that could've been done other than a rewrite, but make no mistake - it's not the ideal.

barfbagginus 6 days ago [-]
Very nice!

This is the essence of model, view, controller, and also one of the very first use cases for it!

masklinn 7 days ago [-]
A companion piece to this idea (previously at Oleb’s “making illegal states unrepresentable: https://oleb.net/blog/2018/03/making-illegal-states-unrepres...) is “parse, don’t validate”: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

A third piece / concept I often circle back to is a lot more subtle and difficult to grok: “Names are not type safety” (https://lexi-lambda.github.io/blog/2020/11/01/names-are-not-...)

weinzierl 6 days ago [-]
I think it is fair to say that integral types are pretty common in software across the board. Pascal declared integral types like this:

  month: 1..12;
You could also use this nomenclature for characters:

  letter: 'A'..'Z';
When I first learned C, I could not believe that it did not support that. I wouldn't not have dreamed that almost half a century later I still don't have this feature back.

To be clear:

- Many things have improved and I'm glad I don't have to write my software in Pascal[1] today.

- I don't really miss the generalization of this feature. Things like refinement types would be nice, but what Pascal provides would be enough to make life so much better.

It also works for arrays, by the way, which very elegantly sidesteps the 0-based/1-based controversy.

  FooPerMonth: array[1..12] of Foo;

  LandingsPerRouletteWheelPos: array[0..36] of Landings;
[1] The old one, I don't know much about Delphi and later developments.
int_19h 6 days ago [-]
Take a look at Ada sometime, it really takes that to eleven.

   type Digit is range 0 .. 9;

   type Unsigned_Byte is mod 2 ** 8;

   type Binary_Floating_Point is digits 15 range -1.0 .. 1.0;

   type Binary_Fixed_Point is delta 2.0 ** (-32) * Pi range (-Pi / 2.0) .. (Pi / 2.0); 

   type Decimal_Fixed_Point is delta 0.01 digits 5;
nrr 6 days ago [-]
Conspicuously missing are Ada's arrays, which I feel properly take Pascal's to 11 due to the fact that describing their index bounds need not also prescribe that arrays statically allocate that much storage, as in `type Unsigned_Byte_Array is array (Positive range <>) of Unsigned_Byte`, where `subtype Positive is Integer range 1 .. Integer'Last`. Substitute `Positive` with `Natural` where you like.
LorenPechtel 4 days ago [-]
Yes, this was a really nice advantage Pascal had that I'm amazed hasn't been picked up by things like C#. It's not just zero or one based.

(It's been long enough I'm not completely sure of the syntax) Time : Array [2000..2100, 1..12, 1..31] of OneDayInfo;

Avoid allocating the 2000 years in the past while allowing directly indexing with the year field.

And you could use enum types as array indexes. In C# you're always casting them to int whenever an enum is an array index and every cast is a case where the compiler won't have your back in hunting for errors.

And it did *not* use duck typing.

Month: 1..12 DaysOfChristmas: 1..12

You couldn't assign which day of Christmas to a month.

The stricter the compiler the fewer bugs will even compile and the faster you'll find the problem.

As for his color assignment issue--you still need to be able to set it to "purple", but representing it his way pushes this back to whatever routine parsed the configuration. There's only one place that needs to make the check and anything bogus will be caught during startup.

1899-12-30 5 days ago [-]
One really nice thing about object pascal(and nim I think) is that you can declare enums as array indexes.

  type
    TMonth = (mJan = 1, etc);

    TFooPerMonth = array[TMonth] of integer;
psd1 3 days ago [-]
That is a great feature.

You can have those semantics in .net, because any hashable can be a dictionary key. It will have a small performance penalty, which won't matter in 99% of cases. You could roll your own collection and get O(1).

F# lets enum values be records, which is the most elegant solution IMO for manageable counts and fields.

darthrupert 6 days ago [-]
Nim smells like a modern Pascal sometimes, and it has this feature as well.
patrick451 4 days ago [-]
If I could just have an enum restricted to the enumerated values, I'd be happy. It's pure insanity that even a c++ enum class can take on an integer value.
FrankWilhoit 7 days ago [-]
Strong typing is like violence: if it isn't solving all your problems, you must just not be using enough of it.
hi-v-rocknroll 7 days ago [-]
Forth is tic-tac-toe and dependent typing is global thermonuclear war.
actionfromafar 7 days ago [-]
It would then seem, the only winning move, is not to play.
hi-v-rocknroll 7 days ago [-]
How about a nice game of Go?
0atman 6 days ago [-]
10/10, no notes XD
Leo_Germond 7 days ago [-]
I love that, going to reuse that one
FrankWilhoit 6 days ago [-]
Don't credit me, it's not original, but I don't know the actual source.
another2another 6 days ago [-]
First time I heard it, it was about XML. So, quite a while ago.
vsnf 7 days ago [-]
I'm sympathetic to this concept, and it's one I employ from time to time in the various codebases I own, but the type of person to advocate for this kind of thing is also the kind of person to write stuff like

> Types delineate the set of legally representable states |ℝ| in your application

and

> |ℝ|≥|ℙ|

Which sets off my "math" alarm. The less out-and-out math used to make a point about programming, the better, I think. This article is actually fairly good and doesn't get very mathy beyond the intro, but it still jumps out at me and made me wary to continue.

oytis 6 days ago [-]
Why should math trigger an alarm in engineer's mind? The only thing that triggered my alarm is that it's a pretty bad math - why thinking in terms of cardinalities when we are really talking about subsets? And why use ℝ - which is an agreed symbol for a set of real numbers - for anything else than that?
vsnf 6 days ago [-]
It's a fair question, and in this particular case the math both light and irrelevant. But sometimes I see useful topics being discussed that are for whatever reason, marred by mathematics. Often these topics arent even particularly complicated, but the author's desire to give it (or themselves?) more credibility leads them to make it mathematical when it didn't need to be. See also, every discussion involving Haskell. A monad is just a Monoid in the category of Endofunctors, after all.

In most cases, especially cases of software engineering, as opposed to computer science, the math gets in the way of the point. It was never needed, it just served to obscure the topic. Plus, I'm not an engineer, and I'm not doing mathematics in my daily life. I don't think like a mathematician, I don't work with cardinalities, or set theory, or integrals. At most I deal with some multiplication and powers-of-two. Maybe a ratio here and there. If I was a graphics programmer, which I thankfully am not, I'd toss in some matrix algebra maybe. But my career doesn't involve anything requiring the use of ℝ. What it does involve, is thinking about composability of systems, debugability, simplification of processes, and otherwise making sure things work and can be understood by future maintainers. The article's topic is useful, but making things mathy for the sake of it is a navel gazing distraction.

chii 6 days ago [-]
> But sometimes I see useful topics being discussed that are for whatever reason, marred by mathematics.

it's not marred, but formalized by mathematics.

Maths is very unambiguous. It makes it so that you cannot interpret it wrong, as long as you learn the meaning of the symbols. The transformation of these symbols are logical operations, and follows on from previous operations.

By describing processes or thoughts this way, it ensures that what you say is formal - aka, someone else can follow the logic _exactly_ from the assumptions/axioms.

It also allows you to overlay proven theorems from other fields of maths and apply it to your current situation. By doing so, you can transform your problem to a known solved problem, and therefore, have a solution. This solution might be complicated and require knowledge from that field unrelated to your problem, but i dont think that's a problem with maths itself - it's a sign of your own deficiency.

Finally, maths forces you to think systematically. It forces your brain to adopt a style of thinking that most people find difficult, but it is what it takes to solve problems wholistically.

vsnf 6 days ago [-]
You'll find no argument from me on any of your points.

But I think it is often counterproductive when evangelizing a concept or explaining something to software developers.

appplication 6 days ago [-]
Two main reason: 1) I will never be able to easily type the fancy R. Anytime I need to type it I’ll need to google it or find a previous reference to copy paste. It presents bad UX for continued communication, and plain English would suffice given the math isn’t the point. Similarly, it would be inappropriate to use overly grandiloquent language if the writing wasn’t the point.

And 2), most engineers are not super comfortable with even relatively basic mathematical notation like this. If your audience is software engineers (and specifically not mathematicians), it’s better to say what you mean, in plain terms. While 80% of folks might know what you mean, it’s not worth losing the other 20%.

On the other hand, if the math is the point, then it is more than appropriate.

oytis 6 days ago [-]
> most engineers are not super comfortable with even relatively basic mathematical notation like this.

I might be a bit out of date here. I can imagine how it could be true for bootcamp developers who never had relevant formal education, but I don't think they are a majority. Most engineers go through some kind of higher education program, be it CS or CE, and it normally includes a significant amount of math. How can you get around getting comfortable with it?

flufluflufluffy 4 days ago [-]
are we really at a point where “software developers should be competent in math” is a hot-take?!

If most engineers truly are not familiar with basic math notation, the solution should be to teach more math to engineers, not purposefully explain things in a less formalized way.

~~I KNOW I used the terms developer/engineer interchangeably, don’t kill me~~

encody 4 days ago [-]
Author here, sorry I suck. Thanks for the feedback, I've edited the article to (hopefully) use less incendiary notation.
jamil7 6 days ago [-]
We use math in all kinds of professions and in everyday life, it's a perfectly valid way of explaining the concept I think.
lelanthran 6 days ago [-]
To other mathematicians, sure. To programmers, definitely not.
jamil7 6 days ago [-]
So only mathematicians use math to explain concepts?

The actual example in the article isn’t that great but my point is generally it’s a valid way to explain or demonstrate something. That’s part of the reason we learn it.

lelanthran 6 days ago [-]
> So only mathematicians use math to explain concepts?

No, only an audience of mathematicians grasp concepts explained exclusively with maths.

Why do you find this surprising? It's no more surprising than "Only an audience of carpenters grasp concepts explained exclusively with joinery terms".

Would you, with a a straight face, make the claim that explaining something using terms like "Cheek", "Mortise & Tenon", "long grain", "Dado" and "Birdsmouth" is a valid way to explain something unrelated to carpentry?

Why then claim that using mathematics terms to explain something unrelated to that specific maths is a good idea?

I mean, this article is a good example: not only does the article get the maths wrong, the maths involved is unrelated to what it the article is trying to explain.

jamil7 4 days ago [-]
Carpentry isn’t woven into everyday life and professions in the same way maths is so I’m not sure I agree. But yes the article itself does a bad job.
chriswarbo 6 days ago [-]
> The less out-and-out math used to make a point about programming, the better, I think.

Curry and Howard have some bad news for you https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...

hnbad 7 days ago [-]
I second this as I find that usually the solution that is the most mathematically pure is usually also the one completely inadequate at handling the complexities of real-world applications and end up having various extremely impure additions bolted onto them (as shell scripts or Excel spreadsheets on a shared network drive if necessary) because refactoring them into an equally pure solution for the real-world case would take too long.
frandroid 6 days ago [-]
Okay but the solution here is to identify and parcel your cases into discrete entities. The article doesn't say "don't accept anything odd", it says "clearly identify what you accept". If you have to accept odd cases, identify them so it's clear what's happening.

So this isn't about purity, it's about being declarative. i.e. make your code say what it accepts, instead of writing board/implicit acceptable inputs that inevitably forget cases and crashes.

If you limit what you accept as inputs then you can stop worrying about downstream error handling and debugging.

4ad 6 days ago [-]
> The less out-and-out math used to make a point about programming, the better, I think.

Imagine if an electrical engineer said this. Or an aerospace engineer. Or any real engineer.

pjc50 6 days ago [-]
I think in many cases it's "pseudomath": all the information is carried in the explanation, and the notation isn't doing any actual work, so you could drop the decorative notation and just leave the explanation in English.
vsnf 6 days ago [-]
Sure is a good thing I'm a software developer, not an engineer, then.
psd1 3 days ago [-]
Those professions are going to be using calculus and trig and algebra.

Lambda calculus and number theory are significantly hairier.

piva00 6 days ago [-]
Yeah, yeah, yeah, software engineering is not Real-Hard-True engineering, we get it.

With that out of the way, isn't it better if software development concepts can be explained with less math-heavy notation while keeping the math notation for the cases precision is needed?

I for sure remember a dwindling amount of the math from my statistics bachelor in my head after 15+ years. I'm able to recollect it after re-reading some material, things usually click back in place but I won't be doing that unless it's very necessary.

1000100_1000101 5 days ago [-]
Don't reject the not real engineering folk. I'm fairly certain the only reason companies are so eager to label all software developers as engineers is because there are often loopholes in overtime laws specifically for engineers, with the logic that the engineer was in charge of the project and the schedule, and if there are problems, that's on them.

I've yet to see a software project where an software engineer was given much leeway in how a project was run, scoped, or where any of their concerns about the scope, schedule, or lack of an actual plan were taken seriously... I'll gladly trade being falsely labelled an engineer for overtime pay to cleanup the mess on schedule yet again.

int_19h 6 days ago [-]
(Most of) software engineering is not engineering, it's a trade.

So?

klysm 7 days ago [-]
This principle has served me well for a long time, but it’s definitely a trade off space. We want to enforce as many invariants as possible through the type system, but more invariants requires more complexity. Eventually you run into dependent types etc.
Leo_Germond 7 days ago [-]
I would say it's a tool with an optimal point that is located along the "heavy use" side. I think it is interesting to think of them as solidifying your specification. As such if your spec is still changing or it is unclear (e.g. first impl draft, example code...), you should use some lightweight types, whereas a public API should have types that encode basically everything your comments can say about the values, operations, and memory representation of the parameters. That would be the point where I would consider that defining my types is "done" and I would consider switching to e.g. moving the functiona around instead (there are lots of hanging fruits in safe by construction approaches, that might not even require types - can't shoot yourself in the foot if I remove the footgun entirely)
4ad 7 days ago [-]
Invariants need to be encoded one way or another anyway, you can't escape that. If your type system is not sophisticated enough this gets tedious and awkward to do in the type system, so it's easier to do in (dynamic) code. But dependent types simplify this, they make it much easier to express yourself in types compared to, say, all the various fancy extensions to System F. Computing types becomes just like computing terms.

That said, there is a vast design space between Javascript/Python and dependent types. Plain old ADT suffice in 95% of cases, yet the only mainstream language with ADTs is Rust. This is a shame.

DylanSp 4 days ago [-]
I definitely agree that ADTs are a big step forward; when I learned about them from Haskell, I was kind of baffled that they weren't more common, because the concept of "this value is always either type A or type B" is pretty simple and occurs a lot. I really wish Go had included them, because it doesn't seem like a terribly complex concept, the type switch syntax would work decently for basic pattern matching, and having ADTs built into the ecosystem from the start would be much smoother.
yen223 6 days ago [-]
(Assuming ADT here refers to Algebraic Data Types, as opposed to abstract data types)

Swift, Kotlin, Scala and Typescript all support forms of ADTs, just with different names.

- Swift calls them "enums" (like Rust)

- Kotlin and Scala has "sealed classes"

- Typescript has "discriminated unions"

4ad 6 days ago [-]
I forgot about Swift, that's true (and great).

Kotlin (and even Java 15+ nowadays) can emulate ADTs with sealed classes, but the ergonomics are incredibly bad and the ecosystem is not build around the concept.

Scala has ADTs, but I would put Scala in the same category as Haskell or OCaml. A niche language.

Typescript does not have ADT, they have union types. You can build a discriminated type out of a union type, but you have to do this manually. This misses on the ergonomics of using ADTs plus people do it in different ways.

ADTs in one way or another are becoming more mainstream, but they are very far from being accepted by default.

yen223 5 days ago [-]
Agree with you on the ergonomics of sealed classes in kotlin/scala/java. It works, but it's clunky.

I quite like Typescript's approach. You do get exhaustive switch-case matching, so that's like 80% of what I want out of sum types. Typescript also lets you enforce that a type is one of the variants of a sum type, something which e.g. Haskell doesn't let you do. I assume this is a function of Typescript doing union types, but it's pretty convenient.

Definitely see ADTs becoming mainstream, because they are genuinely useful. Biggest gap to adoption imo is that no SQL database supports sum types or anything equivalent.

DylanSp 4 days ago [-]
Typescript's exhaustiveness checking can be kind of clunky sometimes, particularly if you have a switch statement that's just causing side effects and not returning a value. Last time I looked at it, I think you had to add a default case with some sort of dummy statement assigning a value to a variable with type `never`; while that's doable, the ergonomics are a bit annoying.
yen223 4 days ago [-]
I recall having to add that dummy 'never' branch in the past, but I haven't had to do that in newer projects on recent versions of Typescript.
DylanSp 4 days ago [-]
I just tried it in the online playground. I didn't get any sort of error with one of the cases unhandled. https://www.typescriptlang.org/play/?noFallthroughCasesInSwi...
yen223 3 days ago [-]
Hmm. I think this is because in that situation, all cases (including the unhandled one) are "correctly" returning undefined.

I can see that if I made the branch return a string, Typescript will correctly show the "Not all code paths return a value" error

DylanSp 3 days ago [-]
Yep, or if you declare an explicit return type for the function, TS correctly gives a "Function lacks ending return statement and return type does not include 'undefined'." error. But if you have a function that's purely side-effecting, I think you still have to do something manual.
thom 7 days ago [-]
There’s no shame in just having complex constructors to check some invariants in non-structural ways. At least you still capture and enforce the transition.
4ad 6 days ago [-]
That only works if your language is value-oriented, but most existing languages use references and mutation extensively.
int_19h 6 days ago [-]
I think that's too broad and somewhat outdated. C++ has const, for example, and you're expected to use it. Java and C# give you final/readonly, and not only that, but immutability is becoming more idiomatic, with the language increasingly encouraging it. For example, in modern C#,

   record Point(double X, double Y);
is immutable, whereas the mutable equivalent is the much more verbose:

   record Point {
      public required double X { get; set; }
      public required double Y { get; set; }
   }
Java goes even further by not even having syntax for mutable records - if you want something like that, you have to write it out as a regular class.
4ad 6 days ago [-]
To reason about programs one can't consider only the good features and forget about the bad ones.

But yes, it's good that languages are moving in this direction.

psd1 3 days ago [-]
I've come out of a job of spaghetti and idiots.

My team lead used to shoot down ideas if they were "too complex". Correct and wise, right?

Except that this numskull created reams of functions that took untyped python dictionaries, did something, and passed them to other functions.

Early on, he bolted a giant mess of validation onto the perimeter. This did add value... but most data invalidity arose inside the fucking app!

He had a great eye for the complexity of abstractions, but was completely blind to the complexity of doing simple things in convoluted ways.

It was miserable. Any random PR could pass all the integration tests we had and still blow up prod, because there were orders of magnitude more code paths than lines of code.

Animats 7 days ago [-]
Classically, this is a problem in hardware logic design. It's desirable that logic circuits not be able to lock up in an invalid state. So having a state machine where invalid states cannot be represented is useful. Hardware without this property tends to need reset buttons.
YorkshireSeason 6 days ago [-]
The abstract concept here is liveness, aka "eventually something good will happen". In practise, liveness alone is insufficient because a locked up program/processor is indistinguishable from a very slow program/processor. So you really want bounded liveness, along the lines of "within X time units, something good will happen".
ykonstant 6 days ago [-]
Make invalid states unrepresentable, but take care not to make valid states a huge pain in the ass to represent!

I progam in Lean, a language that makes it extremely easy to state requirements and constraints. Using dependent types, you can even reach down into the implementations of function arguments, demand specific ranges for numerics etc. But this imposes a correspondingly large burden on the function caller! If you have a function

    /-! pass : short for pain in the ass -/
    def pass (n : Nat) (h : 0 < n && n < 5) := n^2
the caller must prove that the first argument is in the stated range and provide the proof in the second argument. You can `sorry` out, but then you are not programming with constraints. So you need to balance representability with convenience.

One way to deal with this without utter insanity is to create a certification monad and make these functions monadic, adding certificates to the return type:

    def myfun (args: _) (reqs: _) : (returns : _) (certificates : _) := sorry
and having the monad try to reconcile certificates with requirements, or open a proof block to prove the consistency manually. I want to implement something like this when I get the time, but in any case, be wary of extremes!
nraynaud 6 days ago [-]
Funny, I recently had a client telling me his system was non-refundable, so I didn't handle cancellations in the database, later I discovered that actually the sales channel tends to manually refund orders if the request is less than 1h old.

Humans are not ready for perfect information representation.

pjc50 6 days ago [-]
This sinks so many "business transformation" projects, especially in the public sector. An organization contains explicit knowledge, which is written down in its processes, and tacit knowledge, which isn't, and may not be known to managers! But when converting a business to software the tacit knowledge, critical to the functioning of the business, gets lost.
nraynaud 6 days ago [-]
It’s also because developing cancellations cost money, and they don’t like spending, so they “forget” some details.
4ad 6 days ago [-]
I would argue that languages are not ready for perfect information representation. Programs explode when presented with minor inconsistencies. And changing the representation in minor way requires rewriting the program.

With CUE, we are trying to solve this exact problem.

hi-v-rocknroll 7 days ago [-]
One of the grossest large codebase anti-pattern is god state objects. One of the worst I've seen had to do with the state of conference call. Depending on the state, zillions of properties were or weren't valid. It was a huge, fucking mess that often allowed invalid states to creep in via edge-cases that led to unrecoverable states.
wdroz 5 days ago [-]
Similar(identical?) concept with typestate pattern[0]

[0] -- https://zerotomastery.io/blog/rust-typestate-patterns/

orwin 7 days ago [-]
I have a question for the author or anyone smart enough to understand the whole post:

what's the difference here between state and data? Isn't data more generic, and we can have the title 'make invalid data unrepresentable'?

Because I have this instinct to do what I call in my mind and to my friend 'ddd', for 'data driven programming'. It translate by first thinking on my global inputs, outputs, and transformations, then writing the data structure (I learned with C), then prototyping the core functions (basically I write the .h before the .c).

Is it the same thing? Let the data write the code (or state, and they're the same thing here?) or did I miss something important?

lmm 6 days ago [-]
Do you think "state machines" should be called "data machines"? Why or why not?

IMO "state" is a better fit for the concept that this is getting at. The point is to stop the program getting into an invalid state, not to stop the program operating on invalid data, because a program might need to process data that's invalid in some sense, and because the program state is something a bit more active than "data" (e.g. having the right data at the wrong stage of execution is also an invalid state, even though it isn't exactly "invalid data").

orwin 6 days ago [-]
Very insightful (you and your siblings), thank you all. I think I have a fundamental misunderstanding of what a state is, maybe because I mostly practice responding to external state changes? I don't know, I'll think on that. Thank you again.
orangeboats 7 days ago [-]
The way I understand it, "state" refers to any point in time during the execution of a program.

A computer program can be split into 3 large parts: input, process, output. And invalid data, in my opinion, only concerns the first part, whereas invalid states can be found throughout all parts of the program due to e.g. bugs.

That makes "make invalid states unrepresentable" more generic than "make invalid data unrepresentable". Or to put it in another way: invalid data can lead to invalid states, but not all invalid states are due to invalid data.

lelanthran 6 days ago [-]
State is a collection of values. Data is a single value.

A checkbox represents a single value. A collection of checkboxes represents something different, the state.

aeonik 6 days ago [-]
Datum is a single data point, Data are plural, state is data that are dependent on time.
zamalek 6 days ago [-]
In information science, correct; in common parlance, incorrect.
pjc50 6 days ago [-]
"State" alludes to "state machine". It's natural to refer to inputs and outputs as "data", and they are definitely not "state", but the thing in the middle while processing the data? That's state, including things like the file pointer and the lexer state machine which are not part of the "data".
Hackbraten 4 days ago [-]
I’m not proficient in Haskell but I feel the article is stopping short of making its point.

So I’ve learned that the constructive approach is awkward to work with in practice, and the newtype-wrapper approach is not type safe. What would be an example to implement the `OneToFive` example properly and in a type-safe way, then?

zamalek 6 days ago [-]
Ever since coming across this idea I have wondered what a (likely relational) database which deeply embedded these concepts would look like.
another2another 6 days ago [-]
You can actually get quite some way by adding constraints to columns. NOT NULL, default values and a CHECK constraints on a column can limit random invariance bleeding into the db.

So for example, if I ever see a CHAR(1) column that looks like it's meant to hold a boolean of some sort, the best thing is to immediately lock it down to a strict subset (preferably not NULL) of 2 states like ['Y'|'N'].

I still have nightmares about a db schema that grew a fungus of boolean representations, and massive amounts of code checking for y,n,Y,N,0,1,T,F,t,f all of which were used by different teams at different times. Constraints would have stopped all that.

Here's the docs for Postgres: https://www.postgresql.org/docs/current/ddl-constraints.html

DylanSp 4 days ago [-]
As the sibling post says, constraints can cover a lot of this already. What I'm not sure about is how applications querying the language would interact with a database that provides a lot of useful invariants natively; if the application language's type system can't express those invariants, I think you'd end up with impedance mismatches that could cause a lot of frustration.
DylanSp 5 days ago [-]
I've had some idle thoughts about creating a Postgres extension that would make some of these concepts easy to implement and work with. I modeled a sum type/discriminated union a while back, I was able to make it work with constraints, but it was fairly verbose.
pdimitar 4 days ago [-]
Which part was verbose?

I feel such an extension will immediately find its audience.

DylanSp 4 days ago [-]
Writing out the constraints. The starting point was an admin_notes table to start with, where each note initially had the same fields; we later added different categories of notes, where two of the categories had fields that were only meaningful for that category. What I did was add nullable columns for all of the category-specific fields to the table, with constraints to make the category-specific columns only have values for rows of the appropriate category. The type looked roughly like this (using Rust syntax) [1]:

  struct AdminNote {
      id: String,
      text: String,
      category: CategorySpecificData,
  }

  enum CategorySpecificData {
      GeneralRequest,
      InitialRequestForm {
          appliesToBasicRequestDetails: bool,
          appliesToSubjectAreas: bool,
          appliesToAttendees: bool,
      },
      SupportingDocuments,
      ConsultSession,
      AdviceLetter {
          appliesToMeetingSummary: bool,
          appliesToNextSteps: bool,
      },
  }
Setting up the constraints took about 30 lines of SQL just for those couple of boolean fields on some of the categories, you can see the details here [2].

As far as making this a Postgres extension - I'm not sure how useful it would be when the application language doesn't have a notion of sum types. Thinking about it, what might be more useful would be a language-specific library for data validation/constraints that sets up the database constraints as well. I'm not sure, though.

[1] We weren't using Rust, this was in Go, but I figured this was the most succinct way to summarize it. Our GraphQL schema for this type was basically the equivalent of this.

[2] https://github.com/CMSgov/easi-app/blob/main/migrations/V164...

pdimitar 7 hours ago [-]
Nice, thank you. This is informative.

Though I really have to wonder if there couldn't be a way to generate the SQL constraints out of the data declaration as well. Golang's struct tags are likely not going far enough though.

terminatornet 6 days ago [-]
one of my favorite talks on this topic, "Making Impossible States Impossible" by Richard Feldman:

https://www.youtube.com/watch?v=IcgmSRJHu_8

the talk is Elm lang focused, but the concepts still apply to other languages.

johann8384 7 days ago [-]
I thought this was going to be some political thing about states and congress.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:50:52 GMT+0000 (Coordinated Universal Time) with Vercel.