Big

lecture 3
introduction to
supervised learning,
training your first AI !!!
for construction part:
emoji overlaying and barcode detection revisited

SeAts APp SEAtS ApP SEaTS APP

GAME TIME!
the 12

change of flow, before noodling...

let's look at this COOL AI project borrowed from Murad's slides

In the artwork Pareidolia* facial detection is applied to grains of sand. A fully automated robot search engine examines the grains of sand in situ. When the machine finds a face in one of the grains, the portrait is recorded.

while we are learning the high-level as well as technical details about AI,
it is important for you to get familiarised with this tool,
BUT...

don't forget it is all about creativity! every single part of this unit can be challenged and modified with no limits

modify the use case, challenge the ethics in AI, come up with unconventional tasks and data, you name it

Recap{

understand vs. hard memorising

- generalisability
- simple is beautiful: explanation in simple words

data types

function as a mathematical embodiment of "input, process and output"

function types, aka the scaffold;
one function type corresponds to a characteristic shape

parameters in a function (weights and biases), aka the muscle

weaving and overshooting

/*end of recap*/
}

notation is scary

introduing "repeated exposure": the amazingness in our built-in perceptual adaptation effect

introducing repeated exposure index: 小 index

sorry to add another instance of notation suffering but this index represents how many weeks i have spent to understand this notion/math idea/whatever

for example, my 小 index for "data types" is 3 (weeks), it means 3 weeks of repeated exposure to data types

my 小 index for "weaving" is at least 8 (weeks)

my 小 index for "overshooting"? have a guess...

52 weeks! a year !

congrats if you have already grasped some sense of it, even just slightest, because you have not seen any concrete AI example...yet

repeated exposure

we are going to see dots getting connected starting from today!!!

introduction to supervised learning{

here is a "formal" definition of machine learning

“A computer program is said to learn from experience E,
with respect to some class of tasks T and performance measure P,
if its performance at tasks in T, as measured by P, improves with experience E.”
- Tom Mitchell

SL is a subcategory of machine learning

before diving into what SL is...

SL tackles two types of tasks:
- classification
- regression

- classification: tasks where the output are categorical values
- examples?

- sentiment analysis (does this sentence carry emotion of being happy or sad or angry?)
- is this image an AI-generated or real one?
- who is the singer of this song?

- regression: tasks where the output are numeric values
- examples?

- predict tomorrow's bitcoin price 🤑
- how human-like is my dog ?
- how much is london rent in 5 years? 🤫

小 index for distinguishing between classification and regression... 1.5 weeks!

ATTENTION 😈

introducing the training process -

aka how do we train a model? or how does a model learn?

using a regression task as an example

hi im a happy caveman,
i eat potatoes 🥔,
and i collect apples 🍎

i ate 0.5🥔 and collected 2🍎 on Monday
i ate 2 🥔 and collected 5🍎 on Tuesday
i ate 1 🥔 and collected 3🍎 on Wedneady
today i ate 1.5🥔, guess how many 🍎 i can collect today? 👑

numbers are too headache so i want to plot these numbers out and look at the numeric relation between 🥔 and 🍎 visually

easy task: if today i ate 1🥔, make an educated guess on how many 🍎 i can collect today?

back to the original problem, unfortunatelly we don't have a point that corresponds to 1.5🥔

what if... there is continuous line going through all points so that i can lookup the point corresponding to 1.5🥔 from this line?

the continous line is the "ideal" model we want (in the form of a function)
it takes # of 🥔 as input and # of 🍎 as output
or to say it maps # of 🥔 to # of 🍎

"weaving" through all the points means that this line perfectly captures relation between # of 🥔 to # of 🍎 in my existing experience(data)

an example of a "bad" line that does not weave through all points: what happen if I want to infer # of 🍎 given 1🥔?

the output does not agree with my experience(data) ☹️

now let's look beyond this example

caution: this is a simplified example where it seems trivial to find a straight line that has the perfect fit shape and perfect location on where to put the straight line

in real life, we don't have any clue on the perfect weaving,
99.9999...9% of times we can only start with a "bad" line, and gradually move the "bad" line to the (nearly)perfect weaving position

in real life, data is a lot more messy and do not fall into a straight line

in other words, for real life data,
to find the best-fit model that go through data points,
we can't just look at data and draw a nice weaving curve,
we can only start with initially guessed "bad" curve and move+rotate the "bad" curve till it is in a good weaving position.

making connection:
- a model that weaving through all points: the best-fit model that go through all data points
- a model that go through all data points: it perfectly captures the input-output numerical relation from the dataset (side note, we'll see this is not always good)
- such model is hard to achieve with real life noisy data, we can only approximate

an example of real life data...

formal term for this "gradually-moving-the-line-to-good-weaving-state" process: fit / train

- we train the model to fit the data
or
- we fit the model to data

the training process (a summary of what we just talked about)

1. have an initially guessed model (often random and imperfect)
->
2. feeding data into the (imperfect) model
->
3. get the (imperfect) model output
->
4. measure how wrong this output is compared to the correct answer
->
5. using the measurement to update the model
->
back to step 2 and repeat

have an initially guessed model:
- it is a funciton, so it has a guessed scaffold (function type, or the basic shape if you think visually) and guessed initial muscle

using the measurement to update the model:
- what part of the function to be updated/trained? the muscle, aka parameters aka weights and biases

let me demonstrate the training process for the naive caveman 🍎regression task in action

phew, that is what "learning" in SL is about
just another 小index 52 weeks ☺️

what about "supervised" in SL then?

easy, it means learning/training using data that have "correct answers", aka labels

wait, can learning be done without "correct answers"?

introducing unsupervised learning (the one that uses unlabelled data, stay tuned!) or self-supervised learning(cool kids use this term)

think about how human and animal baby learn, it is amazing...

/*end of introduction to supervised learning*/
}

train our first (classification) AI on a (not so)Fashion dataset

everything is prepared here

recap overfitting, testing dataset and the training process after seeing the example (with ML terminologies):

1. overfitting:
- the model hard-memorises some features from the training dataset that are not essential/relevant for the task, hence not able to generalise(recall "generalisability") what it has learned to unseen examples

2. what numbers to look at to tell if model is overfitting:
- when the model performance (measured by a predefined metric) on the testing dataset is worse than that on the training dataset 😵

3. that is why it is important to hold out some data from the initial dataset as a "testing dataset"

4. to prepare the dataset:
- divide the initial dataset into a training dataset and a testing dataset

connect the training process with the stuff we have just seen from codes on colab:

1. have an initially guessed model with a scaffold ("build the model")
->
2. feeding data into the (imperfect) model (an "epoch" means the model has seen every training data point once )
->
3. get the (imperfect) model output
->
4. measure how wrong this output is compared to the correct answer ("loss")
->
5. using the measurement to update the model ("optimizer" defines the rule on how to update the model)
->
back to step 2 and repeat

construction time🔧