AlphaGo and Deep Learning

Brais Martinez talks about AlphaGo and Deep Learning on March 18, 2016.

 

Brais Martinez is a Research Fellow & Deep Learning expert at the University of Nottingham.

Transcript of Deep Learning expert Brais Martinez About AlphaGo in March 2016

the Alphago has been using deep learning
00:01
to make this happen than is up yes yes
00:04
that’s I mean the company that did that
00:07
well now is Google right but is the deep
00:09
mind company that was a start-up in
00:11
London and Google acquired them at some
00:13
point before actually anyone knew what
00:15
they were doing right so they were
00:16
smelling that with something nice right
00:18
and they’re the best things that they’re
00:22
doing deep learning with reinforcement
00:25
learning
00:26
right so reinforcement learning is the
00:27
type of way you try to teach and deep
00:31
learning is attracted type of algorithm
00:33
that you that you use to try to learn
00:35
that right so the reinforcement learning
00:37
is exactly this thing of I’m not going
00:39
to tell you exactly what they are what
00:42
the output is I’m just going to tell you
00:43
what is the task and I’m going to tell
00:46
you whether you’re doing the task well
00:48
or bad and it’s going to be your your
00:51
job as an algorithm to find the best way
00:53
of doing these tasks
00:58
first idea that we have to understand is
01:00
with what we’re trying to do with it
01:02
right which is basically machine
01:04
learning is a is trying to automate
01:08
automatically do some tasks that what we
01:12
can do it by counting usually right we
01:14
can do basically statistics but if we
01:17
have a really lot of numbers we can end
01:20
up doing some heuristics what we can
01:24
come up with right and we might want to
01:26
use computers to actually learn from
01:28
these statistics in an optimal manner
01:31
say the best way of finding these
01:34
information out of this data that you
01:36
have is following these rules right so
01:38
the computer does this for you this is
01:40
this is the basic concept of machine
01:42
learning right one typical example is if
01:44
you want to invest in a either an
01:47
exchange market on the stock market
01:49
something like that and then I guess the
01:52
people that invest they have 13 rules
01:53
right if you have last five weeks going
01:57
on in the value then overall respect
02:00
that this week will go down because it’s
02:03
been like reaching a maximum or this
02:06
kind of idea right that you complete the
02:09
medium of the last ten values and this
02:12
gives you an idea of the real value and
02:13
if it’s above that it will go down and
02:15
and so forth right but this is just
02:17
basically heuristics that we think might
02:19
work right and we can even through
02:20
experience think that some are better
02:23
than others and so forth right but we’ll
02:25
have like 50 values on the different
02:28
stocks in the market and we have you
02:30
know like the currency exchange and we
02:33
have many other factors right how – how
02:36
many rules can we find there how many
02:38
heuristics right this is just almost
02:39
infinite number so what you would like
02:42
is ideally a way of choosing the best
02:45
and the computer can has a capacity of
02:50
checking them all right you can do that
02:52
pen and paper you spend like five
02:54
lifetimes doing it and maybe you know by
02:57
the time you finish this no stock market
02:59
but the computer does it everything for
03:01
you very quickly right so the only thing
03:03
is it will not do something better than
03:05
you can do giving enough time but just
03:07
do it so much quicker that you will
03:10
never get the
03:11
right so now we’re beginning to see
03:12
cases in which machine learning is
03:14
actually doing something better than
03:16
humans especially with very specific
03:19
tasks and maybe better than humans that
03:23
are not experts on that task right but
03:27
historically machine learning was always
03:29
worse than an expert human doing
03:33
something right
03:35
so one of the cases is this like now
03:37
famously alphago right so they say this
03:41
is one of the few cases in which machine
03:43
learning has managed to beat to beat the
03:46
an expert I mean like the world expert
03:49
on the on one specific problem right and
03:51
this mean this is all this talking about
03:54
why this is different from chess right
03:56
the main difference is that for chess
03:58
you have a limited number of options
04:00
right and once you move one piece right
04:03
then you have I don’t know how many
04:05
options right but again a limited number
04:07
so you can organize that in a tree and
04:09
more or less check exhaustively you tell
04:12
the program and this is what you want to
04:14
do check all the options and just you
04:16
know do what is best according to what I
04:19
told you that is good right so this is
04:22
just sheer computational power in order
04:25
to check the options right this this
04:27
tells you our computers are really
04:29
powerful right but it doesn’t say
04:31
anything about the algorithm itself
04:32
right with go it’s totally different
04:35
because there are so many options that
04:37
you cannot really check them right a lot
04:40
of people are repeating this fact that
04:41
there’s more go positions and particles
04:43
in the universe the people in deep mind
04:45
which is the people that did these have
04:48
said that they really don’t know what
04:50
kind of tactic the computer is following
04:52
right they didn’t really hard code any
04:54
tactic so what they did is they created
04:57
this machine learning algorithm right
04:59
that that would play go right and would
05:03
play it terribly right it’s just you
05:06
know some random rules they didn’t
05:09
really have a clue how to implement that
05:10
so they started at some random
05:13
algorithm that would play go terribly
05:15
right and the point is that they pitted
05:17
this this this machine this algorithm
05:21
against some other algorithm that would
05:23
have different parameters right would
05:24
play go differently right but you still
05:26
wouldn’t hard-code anything is would you
05:29
some parameters that would define how
05:31
you would play but you you really don’t
05:32
know what the meaning of this parameters
05:34
are right and they would play against
05:36
each other and at some point after 100
05:38
matches one would win right and you say
05:40
okay this one is better and then you
05:42
keep these parameters you change them
05:44
somehow right and you keep on doing that
05:46
for very very long time it’s a bit like
05:48
evolution there is a part of computer
05:50
science that is called the bullshit
05:52
evolutionary computing right so I don’t
05:54
want to this is this is a slightly
05:56
different thing right but the concept is
05:59
you have a parameter space right and
06:00
this will tell you all the possible ways
06:03
to play go according to you know your 13
06:07
machine learning algorithm right it will
06:09
it will have certain possibilities that
06:11
are specific to that algorithm but you
06:13
have a lot of parameters in its
06:16
parameter can take a real value so you
06:18
have infinite possibilities and the
06:21
question is how to find the best
06:22
parameters to play go most machine
06:25
learning algorithms they have you search
06:27
for the optimal parameters in some other
06:28
way this this was a specific of a type
06:32
of problems that are called
06:33
reinforcement learning where you you
06:36
basically tell them what you want to do
06:38
but you you are not sure which is the
06:41
correct way of doing it right and that’s
06:43
why deep mind says we don’t really know
06:46
because they didn’t gives the examples
06:48
for the machine to imitate they said
06:50
okay tried to learn by yourself
06:52
right this is what you want to do what
06:54
we want you to do just learn how to do
06:56
it every time that the machine learning
06:58
decides a move right so this is this is
07:01
what the algorithm is learning we have
07:03
this configuration what is my next move
07:05
right you are not specifying the best
07:07
move is this because no one knows right
07:09
I mean the idea is that they beat the
07:12
world champion so you know maybe now the
07:14
algorithm was able to figure out which
07:17
next move was better and better than
07:20
than the world champion right so there’s
07:22
no golden standard to imitate there
07:23
right
07:25
the difference in essence I suppose I
07:28
don’t really understand the game go so
07:30
that doesn’t help much but suppose the
07:31
biggest thing here is and feel free to
07:34
kind of correct me but in chess you can
07:36
brute-force it yeah
07:37
yeah and in go you can’t you thought
07:40
beefer that is this very fair yeah I
07:42
mean you still have to do it smartly to
07:45
brute-force it for chess because you
07:47
have to find a way to tell the computer
07:49
whether a movie is a smart move or not
07:51
down the line right but you still will
07:53
check all the consequences of your move
07:55
right but this is kind of a standard you
07:58
put a bit of domain knowledge and you
08:00
put brute force right
08:02
the goal is this is the totally
08:03
different game this is really fair
08:06
assessment of it
08:08
so basically there were two sets of
08:11
algorithms working against each other
08:13
and every time one did a bit better than
08:15
the other they thought yeah something
08:17
about that is better yes so that this
08:19
there were two different set of
08:22
parameters right of the same algorithm
08:24
right so the decisions that would take
08:25
would be different the algorithm was the
08:28
same with a needle network and basically
08:31
what you tell is the input is what is
08:34
the configuration right and then you
08:36
have a certain structure that will you
08:38
know like decide on the move right but
08:40
this is this structure is you need some
08:43
parameters that this is what in the end
08:45
the size right what so to see the
08:49
difference for example you can you can
08:53
see that you have a function that will
08:56
take the input and decide on the on the
08:58
output right this function you can
08:59
decide that this is a linear function
09:01
right oh so is just a line or or a plane
09:04
right is it display the traditional
09:09
like a regressing a line right and this
09:12
is when you modify the parameters of the
09:16
line this is this model is more or less
09:17
the same you change the slope of the
09:19
line you change the height of the line
09:20
right and this gives you different lines
09:22
right so this is exactly the same but
09:25
instead of having a line it was like
09:26
much more complex a function so that
09:29
will be until you dimensions but there
09:31
are just so many more dimensions yeah it
09:33
was not only about the dimensions it’s
09:34
also about the way that the parameters
09:38
are structured right and this is why we
09:40
use the word deep on it right so a line
09:43
is what what we call a shallow a
09:45
structure of the other variables because
09:48
you just have some inputs and you
09:50
multiply by some values and you get the
09:52
app right so the inner indifference with
09:55
deep learning or well a deep deep
09:57
algorithms in general is that you have
09:59
certain input then you get some
10:02
intermediate variables by doing the same
10:05
is like if you fit a line and then you
10:07
fit another line and another line and
10:08
another line and then when you input
10:11
some values this will give you 13 values
10:13
right so you have like predictions right
10:15
these predictions are not the ultimate
10:17
thing that you need their intermediate
10:19
values that I do at the same time are
10:21
input to another layer of prediction
10:24
right and you go doing like that right
10:26
so the input to one layer is the output
10:29
of the previous layer right and this
10:31
hierarchical structure is a much more
10:34
powerful in the sense of what kind of
10:37
functions
10:38
what kind of the flexibility of the
10:39
function right you can model much more
10:42
complex things with many less variables
10:44
right and this is a decent magnificent
10:46
trade-off that gives it a lot of power
10:51
can go back and find historical examples
10:53
endless historical examples of people
10:56
claiming that something fantastic is
10:57
right around the corner we should get a
10:59
show okay
11:00
and we did okay so that’s a good start
11:04
right we know our program works