yeldarb 10 days ago [-]
Neat walkthrough!

Last year I actually made an applied-CoreML app to solve sudoku puzzles where MNIST came in very handy.

I wrote about it here: https://blog.prototypr.io/behind-the-magic-how-we-built-the-...

nothis 10 days ago [-]
>After I scanned a wide variety of puzzles from each book, my server had stored about 600,000 images

600,000?!? Even divided by 81 that's over 7000! How long did this take?

yeldarb 10 days ago [-]
A couple of afternoons.

I just hacked into my app's flow to upload a "scan" of the isolated puzzle to my server instead of slicing it and sending the component images to CoreML.

Then I sat there and flipped through page after page of Sudoku puzzles and scanned them from a few different angles each, sliced them in bulk on the server, and voila: data!

dangero 10 days ago [-]
Sorry I’m still confused. You took roughly 7000 pictures in two afternoons? What do you mean by sliced them in bulk? If you took them from different angles how do you slice them in bulk?
yeldarb 10 days ago [-]
Correct.

The app already had the code for "isolate the puzzle and do perspective correction" so the uploaded images all looked something like this: https://magicsudoku.com/example-uploaded-image.png

By "slicing in bulk" I mean the server was the one that split that out into 81 smaller images rather than the app doing the slicing and uploading 81 small images.

Taking them from different angles was done because the perspective correction adds distortions that I didn't want my model to be sensitive to.

bigmit37 10 days ago [-]
Interesting stuff! I’m also a little confused as to how you took so much pictures in only a couple of afternoons.
jononor 9 days ago [-]
7000 pictures at 5 seconds per picture is "only" 10 hours of work. Possibly per-picture time can be lower than that too. Seems quite doable over 2-4 afternoons.

Props for doing the project end2end, including the non-trivial (and typically skipped) part of collecting training data.

rahimnathwani 10 days ago [-]
"Apple ... provides a ... helper library called coremltools that we can use to ... convert scikit-learn models, Keras and XGBoost models to CoreML"

Awesome.

a_c 10 days ago [-]
As someone with not much experience in ML, how to handle when there is no number present or if a number is present?
ericjang 10 days ago [-]
Great question! This is actually a surprisingly deep problem in ML, known as "anomaly detection" or "out-of-distribution" (OoD) detection.

Another way to formulate this question: "given training data that only tells you about digits, how do you know whether something is a digit or not?" Given that the training data never actually defines what isn't a digit, how can we ensure that the model actually sees a digit at test time? If we cannot ensure this (e.g. an adversary or the real world supplies inputs), how can we "filter out" bad inputs?

A quick hack solution that works well in practice is to examine the "predictive distribution" across digit classes. Researchers have empirically found that entropy tends to be higher (i.e. more smooth) when the model sees an OoD input. However, the OoD problem is not fully solved.

Here's a nice survey paper on the topic: https://arxiv.org/abs/1809.04729

Note that methods that tie OoD to the task at hand (classification) are not actually solving OoD, they are solving "predictive uncertainty" of the task.

jononor 10 days ago [-]
You mean to get either 0-9 or 'no number'? Here are two approaches:

1) Integrated. Represent 'no number' as class number 11 in the original model. Retrain it with this additional class (needs additional training data).

2) Cascading. Train a dedicated model for 'number' versus 'no number' (binary classifier), and use that in front of the original model.

Note that the MNIST data comes already extracted from original image, centered in fixed-size images of 28x28 pixels. In a practical ML application these steps would also need to be done before classification can be performed.

jononor 10 days ago [-]
In the work shown in the article, the segmentation and centering of digits looks to be done by the user holding the camera. Which can be workable for some applications!
lozenge 10 days ago [-]
The predictions variable has a confidence value for each digit. You can put a cutoff and say if none is above a certain confidence, assume there's no number at all.
jefft255 10 days ago [-]
This could work, but it is important to note that a lot of ML algorithms trained in a closed domain (no "other" class) will be pretty bad at knowing what they don't know. This is an open problem in ML.
jononor 9 days ago [-]
Choosing the threshold will be hard. And (as mentioned by other poster) the model is unlikely to generalize well to classes of data it has not seen. I suspect that this approach will get things similar to numbers wrong quite often, like handwritten characters (a,b,c). Including these into the training set is much more likely to yield a model which will successfully discriminate it.
gunzor 10 days ago [-]
You can use threshold value to detect whether there is no number. If the prediction accuracy is below this threshold value you can say it as no number
zackmorris 10 days ago [-]
The scrollbar distance confirms a suspicion that I've held for some time: that writing a machine learning algorithm is of similar complexity to developing an iOS app in Xcode!
saagarjha 10 days ago [-]
What scrollbar distance are you talking about?
zackmorris 9 days ago [-]
It was a joke - the Xcode section starts about halfway down the page. I was just illustrating that the friction we deal with today is of comparable complexity to what might be thought of as advanced programming (AI, VR, AR, physics, etc etc).