↔
Title: Principal Component Analysis from Scratch
Duration: 01:17:32
Total Correct Answers:
Current Caption
Correct
Learning Modes
YouTube Video Transcript Hide
Ask AI:
Export as:
Ask AI Result
The ask AI result will appear here..
(00:00:00) Your YouTube transcript will appear here
(00:00:03)
all right um YouTube tells me we should
(00:00:05)
be live by now so um you're still seeing
(00:00:09)
the intro screen which is
(00:00:11)
fine um sound check if anyone in chat
(00:00:14)
can hear me um would be nice to just
(00:00:18)
throw in a message saying we can hear
(00:00:20)
you um but uh let me just put myself up
(00:00:24)
as well so that people can see me
(00:00:26)
there's a little bit of a delay at least
(00:00:28)
that's what I noticed and uh there might
(00:00:31)
also be a commercial playing at least
(00:00:34)
I'm getting a commercial so I'm just
(00:00:36)
going to going to skip that on my phone
(00:00:38)
just using my phone to watch it so I
(00:00:41)
hope people can hear me um otherwise
(00:00:45)
this is going to be a very interesting
(00:00:47)
stream with me talking and no one being
(00:00:50)
able to hear me but uh should be fine
(00:00:52)
all right so for today principal
(00:00:54)
component analysis it's one of the most
(00:00:57)
used techniques in molecular biology not
(00:01:00)
just in molecular biology but also in
(00:01:02)
biomatics and well I think any field
(00:01:06)
where people look at data and do data
(00:01:08)
analysis um they they have a principal
(00:01:11)
component
(00:01:12)
analysis so without further Ado um all
(00:01:17)
right cool chat says that they can hear
(00:01:19)
me perfect all right so um then we'll
(00:01:22)
just jump into it so this is what we
(00:01:26)
want to talk about today or this is what
(00:01:28)
I want to talk about today because I
(00:01:29)
think that you you need to understand
(00:01:31)
all of these different subsections
(00:01:33)
before you can start understanding
(00:01:35)
principal component analysis um so we'll
(00:01:37)
be starting off with uh
(00:01:40)
autoscaling um it's one of these things
(00:01:42)
that you have to do with your data um
(00:01:44)
and then uh we'll be talking about the
(00:01:46)
covariance Matrix igen vectors and igen
(00:01:49)
values uh variance
(00:01:53)
explained little bit of a frog in my
(00:01:55)
throat um and then we'll talk about PCA
(00:01:58)
projections um because PCA is nothing
(00:02:00)
more than a reprojection of your
(00:02:02)
original data um and then we'll compare
(00:02:04)
to back to uh PR comp um hello Arno
(00:02:08)
thank you for joining the chat um for
(00:02:11)
some reason I can't see the chat
(00:02:12)
messages on my phone so I have to look
(00:02:14)
at the screen there but uh it's uh it
(00:02:18)
it'll work out and then some something
(00:02:20)
about interpretation um so have
(00:02:23)
basically principal component analysis
(00:02:25)
we'll we'll start off by talking a
(00:02:27)
little bit about PCA uh what it means
(00:02:29)
how you can use it um how many principal
(00:02:32)
components you should take um and then
(00:02:35)
we we just go through so uh if anyone
(00:02:38)
has questions during the live stream um
(00:02:41)
just throw your questions in chat and I
(00:02:43)
will get to them um if I see them
(00:02:46)
because um like I said can't see it on
(00:02:49)
my phone but can see it over there all
(00:02:51)
right so first off what is principal
(00:02:54)
component analysis so principal
(00:02:55)
component analysis is a dimensionality
(00:02:57)
reduction method um so instead of
(00:03:01)
having to look at 50 different features
(00:03:04)
on 500 different um samples for example
(00:03:09)
U what we can do using PCA is reduce the
(00:03:11)
dimensionality so look at the the two
(00:03:14)
main components against each other right
(00:03:17)
so this will allow us to find like large
(00:03:20)
scale um clustering in data or at least
(00:03:23)
that's generally what we look at um so
(00:03:26)
it is also used to compress and reduce
(00:03:28)
the dimensionality of large data so this
(00:03:30)
compression is part of the PCA method um
(00:03:36)
so um so it works uh very basically so
(00:03:40)
compression and reduction of data right
(00:03:42)
so you can use for example uh a couple
(00:03:44)
of principal component analysis to
(00:03:46)
represent kind of the major vectors in
(00:03:48)
your data see and now I can't read chat
(00:03:51)
properly because it's so freaking small
(00:03:54)
Autos scaling works like normally or we
(00:03:56)
have to separate work on the data so
(00:03:57)
that uh yeah no that it if you do it
(00:04:00)
from scratch you need to UND scale your
(00:04:01)
data first um because otherwise The
(00:04:04)
covariance Matrix goes
(00:04:06)
Haywire um but like I said it's a way to
(00:04:09)
reduce dimensionality so it allows you
(00:04:11)
to in a single plot look at the things
(00:04:14)
which have like the major effect on your
(00:04:16)
data not only that but if you take the
(00:04:18)
first principal component and you
(00:04:20)
correlate it back to the features that
(00:04:22)
you have um you can also find which
(00:04:24)
features are contributing to one or two
(00:04:27)
or to the first or to the second
(00:04:28)
principal component so so it's a very
(00:04:30)
old methodology um it was invented in
(00:04:33)
1901 by Carl Pearson um the guy that
(00:04:36)
also brought us correlation Co variant
(00:04:39)
and most of our linear modeling toolkit
(00:04:42)
um so it's a very old method 123 years
(00:04:45)
now um but it's still used every day um
(00:04:49)
although there's newer methods nowaday
(00:04:50)
like TSN um which are different types of
(00:04:54)
method uh but the nice thing I like
(00:04:56)
about PCA is that PCA draws from your
(00:04:59)
data so it's it's based on your data um
(00:05:02)
and it doesn't use any of the labels
(00:05:04)
that you assign to the different samples
(00:05:07)
um so basically I have a little example
(00:05:09)
here so here we we kind of see a normal
(00:05:11)
axis right so we have like a variable
(00:05:14)
which is plotted on the x-axis we have
(00:05:16)
another variable plotted on the Y AIS um
(00:05:19)
and then we want to know which are the
(00:05:20)
main components of the data um so the
(00:05:23)
main components of the data is of course
(00:05:25)
the first component you can you can draw
(00:05:27)
a line the red line here so this red
(00:05:29)
line line here um this line here um this
(00:05:34)
gives you the most variance uh captures
(00:05:36)
the most variance in your data right
(00:05:38)
because the the the variance in the
(00:05:40)
x-axis so in this direction is larger
(00:05:43)
than the variance we see like this um so
(00:05:46)
the second principal component is always
(00:05:49)
going to be um orthogonal so it's always
(00:05:52)
going to be directly on top of the first
(00:05:55)
principal component so principal
(00:05:58)
component alls with two variable is
(00:06:00)
nothing more um than rotating the AIS
(00:06:04)
that you have right so if we would take
(00:06:06)
the x-axis and we would start rotating
(00:06:08)
the x-axis then the y- axis would rotate
(00:06:10)
as well so at some point one once the
(00:06:13)
x-axis is very is is through most of the
(00:06:16)
data points right so that the the sum of
(00:06:18)
squares of the data points to the x-axis
(00:06:21)
is minimized um then those are the
(00:06:23)
principal components so when we do
(00:06:24)
principal component analysis with just
(00:06:26)
two features It's relatively easy um
(00:06:29)
because it's not nothing more than just
(00:06:31)
rotating the axis um to fit the data
(00:06:35)
best so principal components C they are
(00:06:39)
linear combinations of the original data
(00:06:42)
right so we have the original data and
(00:06:44)
then when we do PCA analysis the first
(00:06:47)
principal component is just a linear
(00:06:49)
component of the original data uh that
(00:06:51)
means that if we take all of the
(00:06:53)
principal components that we have which
(00:06:55)
is generally the same number as the
(00:06:57)
number of features then the data that we
(00:06:59)
describe is a exactly identical to the
(00:07:01)
original data that we have so there is
(00:07:04)
there is no loss in that sense right so
(00:07:07)
there's no information being lost by
(00:07:09)
doing PCA um but we can look at subsets
(00:07:13)
of the data so if we look at the first
(00:07:15)
five principal components in a system
(00:07:17)
where we have 20 features um then of
(00:07:20)
course we are only capturing a certain
(00:07:22)
amount of variant but if we then would
(00:07:24)
look at all 20 components themselves
(00:07:27)
then we would capture the whole original
(00:07:29)
data set
(00:07:30)
so PCS are orthogonal so orthogonal
(00:07:34)
means that they are having a 90°
(00:07:37)
relative to each other um like the
(00:07:39)
x-axis and the y- AIS in a normal plot
(00:07:42)
and this is important because this will
(00:07:43)
come back and this also helps us to
(00:07:46)
interpret our data right because
(00:07:47)
generally when we have our data and our
(00:07:50)
data consists of um 60 70 different
(00:07:53)
features um then all of these features
(00:07:55)
are collinear to each other um in some
(00:07:58)
way it's very very common to have
(00:08:00)
features um for example the the size of
(00:08:03)
an animal and the length of it the tail
(00:08:05)
of an animal uh to be not correlated to
(00:08:08)
each other so the thing that principal
(00:08:10)
component analysis does is it gives us
(00:08:12)
vectors representing the original data
(00:08:15)
but each of these vectors
(00:08:18)
are uncorrelated to each other so the
(00:08:21)
correlation between any two principal
(00:08:23)
components is always going to be
(00:08:26)
zero so the variance explained by
(00:08:29)
principal components decreases the first
(00:08:31)
principle component explains the most
(00:08:33)
variance the second a little bit less
(00:08:35)
the third even less and so on until you
(00:08:38)
hit the last principal component and at
(00:08:40)
that point if you would sum up the
(00:08:41)
variance explained across all of your
(00:08:43)
components you would have 100% of the
(00:08:46)
variance explained right because the
(00:08:48)
original data is represented by a linear
(00:08:52)
combination of all principal
(00:08:54)
components so what we can do is is of
(00:08:57)
course H the number of principal
(00:08:58)
components that we can compute is always
(00:09:01)
equal to the number of features that we
(00:09:03)
have um but the number of principal
(00:09:05)
components can also be smaller right if
(00:09:07)
we have two features who are exactly
(00:09:09)
cinear with each other um so there's
(00:09:11)
there's no difference between them for
(00:09:13)
example if we would calculate the the
(00:09:16)
body weight um in kilograms and we would
(00:09:19)
calculate or measure the body weight in
(00:09:21)
in in grams then these two things would
(00:09:23)
be the same right so at that point we
(00:09:26)
will end up with one less princi
(00:09:28)
component comp compared to the original
(00:09:31)
data so people always ask me how many
(00:09:33)
principal components should I keep and
(00:09:35)
the question or the answer to that is
(00:09:38)
that it depends on your application
(00:09:40)
right so uh depending on what you want
(00:09:42)
to do you might want to um describe your
(00:09:45)
data so you want to get the main two
(00:09:47)
directions in your data then if you want
(00:09:49)
to only get like the main two components
(00:09:51)
in your data then of course you only
(00:09:53)
have to use two principal components um
(00:09:56)
but you can use a lot more um hey Dion
(00:09:58)
welcome to the Stream
(00:10:00)
um so had the number of principal
(00:10:02)
components that you have and the number
(00:10:03)
of principal components that you are
(00:10:05)
going to use later on uh varies or
(00:10:08)
depends a lot on on what you want to do
(00:10:10)
so generally if you think about qtl
(00:10:12)
mapping or genomewide Association uh you
(00:10:15)
would take one two or three principal
(00:10:18)
components to describe kind of the
(00:10:19)
overall population structure um and you
(00:10:22)
would regress those out of your your
(00:10:25)
fener type generally U but it really
(00:10:27)
depends on your application there's
(00:10:29)
nothing really to say how many you
(00:10:31)
should use right if you want to um just
(00:10:34)
get your data to be completely
(00:10:37)
orthogonal um then of course you want to
(00:10:39)
keep all of them because otherwise you
(00:10:40)
start losing information so the more um
(00:10:44)
the more principal components you
(00:10:46)
include the better you capture the
(00:10:48)
original data um but the the harder it
(00:10:51)
again becomes to interpret it so the it
(00:10:54)
it really
(00:10:56)
depends all right so for today we're
(00:10:58)
going to do um from scratch so we're
(00:11:00)
going to write our own principal
(00:11:02)
component analysis um and we're not
(00:11:05)
going to use any libraries we're not
(00:11:06)
going to use any packages um so we're
(00:11:08)
just going to use R basic R
(00:11:12)
um all right question in chat um I did
(00:11:15)
not understand what you meant by
(00:11:16)
colinearity of the data um if those two
(00:11:18)
features have different levels of
(00:11:20)
measurement the PCA can be applied of
(00:11:22)
them height and weight yeah so height
(00:11:23)
and weight they are collinear right
(00:11:26)
because the the generally the the bigger
(00:11:29)
person is the more he will weigh but
(00:11:32)
this is not a perfect collinearity right
(00:11:34)
so there is still um if you would take
(00:11:37)
both of the measurements then still some
(00:11:39)
people who are a little bit smaller than
(00:11:41)
other ones will still be bigger um than
(00:11:44)
people that are that are or they will
(00:11:46)
still weigh a little bit more right so
(00:11:48)
collinearity is when there is a more
(00:11:52)
than a nonzero correlation between two
(00:11:54)
things right but of course if we measure
(00:11:57)
body weight in kilograms and we measure
(00:11:59)
body weight in grams then these two
(00:12:01)
things are identical right so their
(00:12:04)
correlation is going to be exactly one
(00:12:07)
so at that point um a single principal
(00:12:10)
component can more or less describe both
(00:12:12)
of these features so in that sense we
(00:12:15)
will end up with the number of features
(00:12:17)
that we have without
(00:12:19)
one all right so from scratch um I think
(00:12:23)
that yeah okay perfect so from scratch
(00:12:27)
so we're going to use a data set so this
(00:12:29)
is the data set that everyone uses in R
(00:12:32)
um I like the data set it's a Edgar
(00:12:34)
Anderson's Iris data set um and it's a
(00:12:37)
very very well commonly used data set in
(00:12:40)
R it has uh 150 measurements um so it
(00:12:44)
measured 50 different flowers from three
(00:12:46)
species of irises um so you have the
(00:12:49)
Sosa the vericolor and and the the
(00:12:51)
vinica and just to make things clear I I
(00:12:54)
plotted them here so the satoa has this
(00:12:56)
blue one uh the versy color like the
(00:12:59)
name can be many many different colors
(00:13:01)
but I took a like nice purple one um
(00:13:04)
which has these like yellow and white
(00:13:06)
accents on them um and then you have the
(00:13:08)
virginica and the virginica is a is a is
(00:13:10)
a white one with a little bit of a
(00:13:12)
bluish U on top of it so very beautiful
(00:13:15)
flowers um very interesting flowers um
(00:13:19)
and this is the standard data set that
(00:13:21)
almost everyone uses when they do
(00:13:23)
something in R so it's just available
(00:13:25)
you can type data Iris um and you'll
(00:13:28)
have the data available in R so it has
(00:13:32)
four features um so they have measured
(00:13:35)
from each of these 50 flowers that they
(00:13:38)
had they measured uh The Petal length
(00:13:41)
and width and the seel length and width
(00:13:44)
right so basically if you have a flower
(00:13:47)
um then the petals are the the the
(00:13:49)
leaves that surround the flower which
(00:13:51)
are the nice colorful ones um and the
(00:13:54)
SLE is where the stem um transforms into
(00:13:58)
the flower and these are generally green
(00:14:00)
leaves underneath the flower to support
(00:14:03)
the the more or less the flower shape um
(00:14:06)
so again like petal and seel um they are
(00:14:10)
relatively collinear with each other um
(00:14:12)
and of course the length and the width
(00:14:14)
of the petal and the SLE are also
(00:14:16)
collinear with each other a little bit
(00:14:19)
right but but in the end it's just 150
(00:14:22)
flowers of three different species um
(00:14:24)
and they measured four features so
(00:14:26)
that's the data set very basically um
(00:14:28)
and of course when we load the data in R
(00:14:31)
we can type data Iris um and we have to
(00:14:33)
get our values from it um so I want to
(00:14:36)
transform these as a matrix so I take
(00:14:38)
the first four columns which is petal
(00:14:41)
length petal width SLE length and SLE
(00:14:43)
width um transform it into a numeric
(00:14:46)
Matrix and store this as values um and
(00:14:49)
then there's the fifth column of the
(00:14:50)
Matrix which contains the name of the
(00:14:52)
specie so I'm just going to say as
(00:14:54)
Factor because it's it's a factorial so
(00:14:56)
a categorical variable with three levels
(00:15:00)
um be it satoa verol and
(00:15:03)
virginica all
(00:15:05)
right so once we have loaded our data we
(00:15:08)
need to autoscale our data so
(00:15:10)
autoscaling in PCA is done to remove the
(00:15:14)
effect of uh scale right you can imagine
(00:15:17)
that if petals are between 50 cm and 1
(00:15:23)
meter um there's a lot of variance in
(00:15:25)
the size of the petals right but if
(00:15:27)
another variable has a small range so
(00:15:31)
for example from 3 cm to 4 and 1/2 CM
(00:15:35)
then of course the first one is going to
(00:15:38)
show a lot more variant than the second
(00:15:40)
one so to remove that we need to more or
(00:15:44)
less standardize our data and in PCA
(00:15:47)
standardization is always done uh using
(00:15:51)
uh autoscaling um this is also called
(00:15:53)
Unit variance scaling right so basically
(00:15:56)
it works like this um you Center each
(00:15:58)
column round zero so you take a column
(00:16:01)
of the Matrix and you subtract the mean
(00:16:04)
of the column and then you normalize
(00:16:08)
your variance so normalizing your
(00:16:10)
variance is done by dividing by the
(00:16:13)
standard deviation right so if you now
(00:16:15)
have a phenotype which is a large amount
(00:16:17)
of variance it will have a large
(00:16:19)
standard deviation so dividing by it
(00:16:21)
will mean that um you end up with values
(00:16:24)
which are generally almost all values
(00:16:27)
are between minus1 and POS POS one right
(00:16:30)
so and of course since you centered your
(00:16:33)
data um all the data becomes equally in
(00:16:36)
in size of of of units more or less all
(00:16:40)
right so this is the function that you
(00:16:41)
can use to do autoscaling um so the
(00:16:43)
autoscale function is is relatively
(00:16:46)
simple so it's a function um it it has a
(00:16:49)
capital x as input so Capital X's so
(00:16:53)
generally in R when you use small
(00:16:55)
letters like this small letter X it
(00:16:58)
means that you are looking at a vector
(00:17:01)
and uh capital letters um you are
(00:17:05)
denoted matrices right so it gets a
(00:17:08)
matrix as input this Matrix is called x
(00:17:11)
uh what are we going to do to it well
(00:17:12)
we're going to apply to x to the columns
(00:17:16)
a new function so this is an anonymous
(00:17:18)
function we're not going to name the
(00:17:20)
function um but what this function will
(00:17:22)
do it will take the column that we're
(00:17:24)
currently looking at subtract the mean
(00:17:27)
of the column and then divide by the
(00:17:29)
standard deviation of the column and
(00:17:31)
then we're going to return this whole
(00:17:33)
thing back to the caller right so that's
(00:17:36)
how this function work so we apply to
(00:17:39)
the Matrix to the columns this function
(00:17:41)
to standardize every column by
(00:17:44)
substracting the mean and dividing by
(00:17:46)
the standard
(00:17:49)
deviation so the next step is going to
(00:17:52)
be Computing The co-variance Matrix so
(00:17:56)
co-variance means uh or the idea behind
(00:17:59)
covariances is that if you have two
(00:18:02)
features right if one feature is higher
(00:18:04)
than the average and the
(00:18:07)
corresponding feature or the
(00:18:09)
corresponding value of of the other
(00:18:11)
feature is also above the average then
(00:18:14)
that means that these two things are
(00:18:16)
more or less similar right because they
(00:18:18)
are above the mean um same thing for
(00:18:20)
stuff which is below the mean right so
(00:18:23)
so the idea is is that if you if you
(00:18:25)
look at a vector of values for one Fe
(00:18:29)
feature and you have another Vector of
(00:18:30)
values for the other feature after
(00:18:32)
you've autoscaled them um values above
(00:18:36)
mean that you are larger than average
(00:18:38)
values below mean that you're smaller
(00:18:40)
than average so co-variance Works in
(00:18:43)
this way by just saying well okay so we
(00:18:45)
have the mean of the column so the mean
(00:18:48)
of the column in our case is always
(00:18:49)
going to be zero because we autoscaled
(00:18:51)
it so the mean of X is going to be zero
(00:18:54)
the mean of Y is going to be zero so
(00:18:56)
what are we going to do well we're going
(00:18:58)
to take the values of X then subtract
(00:19:01)
the mean of X and then multiply this by
(00:19:04)
y minus the mean of Y right so we have
(00:19:07)
two different vectors for example the
(00:19:10)
SLE length and the SLE width um and now
(00:19:12)
we want to see how they co-vary so if
(00:19:15)
one of them is high and the other one is
(00:19:17)
high then this positively contributes to
(00:19:20)
the co-variance while if one is low so
(00:19:23)
below the average it's a negative value
(00:19:25)
and the other one is also below the the
(00:19:28)
the mean uh it also is a negative value
(00:19:31)
so you're multiplying two negative
(00:19:32)
values together which again becomes a
(00:19:34)
positive value right so the only way
(00:19:37)
that covariant cancels each other out if
(00:19:39)
is one of the measurements of a plant is
(00:19:42)
above and the other measurement is below
(00:19:44)
because then you get a negative value
(00:19:46)
which then subtracts from the covariance
(00:19:49)
right so the covariance function is here
(00:19:51)
um implementing it in R you can do it
(00:19:54)
like this um so what we say again we
(00:19:56)
have covariance it takes a matrix as
(00:19:59)
input um and the first thing that we do
(00:20:01)
is we build a new Matrix filled with
(00:20:04)
Naas um and this is because we want to
(00:20:07)
have a matrix to store our covariance
(00:20:10)
values in right so we have four features
(00:20:13)
so the features are in the columns so
(00:20:15)
that means that we have a matrix which
(00:20:18)
is 4X 4 so it's four number of columns
(00:20:22)
of X number of columns of X so it's a
(00:20:24)
4x4 Matrix and I'm going to give the
(00:20:27)
names on the rows and I'm going to give
(00:20:29)
the names on the column and that is just
(00:20:31)
going to be SLE length petal length SLE
(00:20:34)
width so those are the column names of X
(00:20:37)
um I repeat them twice because they are
(00:20:39)
both the row names as well as the column
(00:20:41)
names right so in the first line I'm
(00:20:43)
making just an output Matrix a 4x4
(00:20:46)
output Matrix which has the proper names
(00:20:49)
on the rows proper names on the column
(00:20:52)
all right and then I'm going to go
(00:20:53)
through the column names of X I'm going
(00:20:55)
to go through the column names of Y um
(00:20:57)
and of course um then I need to compute
(00:21:01)
this term so I'm going to call this VX
(00:21:05)
right so it is from Matrix X take column
(00:21:10)
number X and subtract the mean of this
(00:21:14)
column I'm going to do the same thing
(00:21:16)
for VY so I'm going to take the Y column
(00:21:19)
the Y column from Matrix X and subtract
(00:21:22)
the mean of the Y column and then I'm
(00:21:25)
going to just multiply VX and v y
(00:21:28)
together and then sum all of the values
(00:21:31)
up and then I'm going to divide by the
(00:21:33)
number of rows of x minus one so the
(00:21:36)
dividing here is standardizing based on
(00:21:38)
the number of values that you have um it
(00:21:41)
is minus one um because you lose one
(00:21:43)
degree of Freedom so that's that's kind
(00:21:46)
of You could argue about why not divide
(00:21:49)
by n um it doesn't matter too much but
(00:21:51)
the covariance definition is to divide
(00:21:53)
by n minus one because of the fact that
(00:21:56)
you lose one degree of freedom because
(00:21:58)
of the mean
(00:22:00)
calculations so we're going to do this
(00:22:02)
for all of the columns so I'm going to
(00:22:04)
compare column one to column one column
(00:22:06)
one to column two column one to three
(00:22:09)
and so on so I'm going to just iterate
(00:22:11)
through all four possible or through all
(00:22:13)
16 possible combinations I'm going to
(00:22:16)
compute the co-variance and then I'm
(00:22:18)
just going to re return the co-variance
(00:22:20)
Matrix once I'm done right so it's a a
(00:22:23)
very basic function um to compute
(00:22:26)
co-variance um there's a function in our
(00:22:29)
that does it as well um but since we're
(00:22:31)
building it from scratch I thought why
(00:22:32)
not just show you the formula explain
(00:22:34)
you how the formula works right so if
(00:22:36)
you have above average and above average
(00:22:39)
it adds to the co-variance if one of
(00:22:41)
them is above average the other one is
(00:22:42)
below average um it's it it takes away
(00:22:46)
from the co-variance value um and then
(00:22:48)
in the end we divide by the number of
(00:22:50)
comparisons that we did minus
(00:22:54)
one all right so this is going to be the
(00:22:58)
hardest part because we need a third
(00:23:01)
function so the third function is going
(00:23:04)
to be the Matrix decomposition function
(00:23:08)
because PCS are orthogonals right so we
(00:23:12)
first have our data we autoscale our
(00:23:15)
data and then we need to compute the
(00:23:18)
orthogonal bases so the igen values and
(00:23:21)
the igen vectors of our principal
(00:23:24)
component analysis right so we need to
(00:23:27)
transform our co-variance made matx into
(00:23:30)
an orthogonal Matrix because our
(00:23:32)
covariance Matrix still is not
(00:23:35)
uncorrelated right so we can do that by
(00:23:38)
decomposition of a matrix into a product
(00:23:40)
of matrices right so we have Matrix a
(00:23:44)
which is our co-variance Matrix and now
(00:23:46)
we want to define a into two separate
(00:23:49)
matrices so we want to split it into
(00:23:51)
Matrix Q which is orthogonal and then we
(00:23:55)
have Matrix R which is the upper
(00:23:57)
triangular Matrix so by multiplying q
(00:24:00)
and R together I get back my co-variance
(00:24:03)
Matrix but I can use then this Matrix Q
(00:24:07)
to then have the um to then multiply my
(00:24:13)
data to right because Matrix Q is
(00:24:16)
orthogonal so that means that if I would
(00:24:19)
take my autoscale data multiply it by
(00:24:22)
the Q Matrix then now I end up with a
(00:24:25)
score Matrix and the score Matrix is the
(00:24:28)
the principal component Matrix because
(00:24:31)
every the first principal component is
(00:24:33)
on the First Column the second principal
(00:24:35)
component is on the second coln right
(00:24:37)
because Q is a big Matrix it will have
(00:24:42)
uh igon vectors that is how we comput it
(00:24:45)
um so we can take the autoscale data and
(00:24:48)
then we can project it using the Q
(00:24:50)
Matrix and now we have our principle
(00:24:53)
components right so the igen vectors of
(00:24:55)
the Q Matrix you can think about them as
(00:25:00)
the axis rotations right so we have if
(00:25:04)
we have two orthogonal AES and we have
(00:25:08)
two features um then the Q Matrix is
(00:25:11)
nothing more than a rotation of the two
(00:25:15)
axes so a rotation of the first axis to
(00:25:18)
more or less change the the the
(00:25:22)
perspective at which we are looking at
(00:25:24)
the
(00:25:25)
data so that is how principal components
(00:25:28)
analysis works it is nothing more than
(00:25:31)
modifying and tweaking the axis at which
(00:25:34)
we are looking at our data so orthogonal
(00:25:38)
means that everything is independent of
(00:25:41)
each other so like I said we need to
(00:25:43)
decompose our Matrix into a product of
(00:25:45)
matrices so we take our covariance
(00:25:48)
Matrix we then compute an orthogonal
(00:25:51)
Matrix which multiply together by this
(00:25:54)
upper triangular Matrix um causes
(00:25:58)
these two matrices combined to be
(00:26:01)
identical to a we then take our Q Matrix
(00:26:05)
which now has the igen Valu so the
(00:26:07)
different rotational vectors for the
(00:26:10)
different axes if we then multiply that
(00:26:13)
with our autoscale data we end up with a
(00:26:17)
score Matrix of which each individual
(00:26:20)
column of the score Matrix is going to
(00:26:22)
be the Principal component so this looks
(00:26:27)
like a lot of magic um and it kind of is
(00:26:30)
because linear algebra
(00:26:33)
or Matrix multiplications generally are
(00:26:36)
relatively magic um but this is
(00:26:39)
something that you just have to well I
(00:26:43)
would say deal with now it's not deal
(00:26:45)
with because you can learn how it works
(00:26:47)
but it takes a lot of time to read up on
(00:26:50)
how to do decomposition right because in
(00:26:53)
the end um it is something which is um
(00:26:57)
very similar to kind of projection
(00:26:59)
matrices in computer Graphics right so
(00:27:01)
if you do computer Graphics um you also
(00:27:04)
have like a a triangle which you then
(00:27:07)
multiply the vector or the the qu
(00:27:09)
quarteron of the triangle so each
(00:27:12)
individual point you multiply that by a
(00:27:14)
projection Matrix uh to put it somewhere
(00:27:17)
in the 3D world that you are creating so
(00:27:21)
PCA is the same um but it just doesn't
(00:27:23)
do it in three principle comp or in in
(00:27:26)
three dimensions it does it in x
(00:27:28)
Dimensions where X is the number of
(00:27:30)
features that you are looking
(00:27:34)
at so we compute the igen vectors here
(00:27:37)
through the gr Smith process because
(00:27:40)
igen vectors cannot be well you for a 2X
(00:27:43)
two and a 3X3 there's always going to be
(00:27:46)
a single unique answer um however for
(00:27:51)
larger problems you can only numerically
(00:27:54)
approximate the igen factors so that is
(00:27:58)
the way that it works so we need to
(00:28:00)
compute uh Q the igen vectors through
(00:28:03)
something which is called the grum
(00:28:04)
Schmid process so you take an initial
(00:28:06)
estimate and then you refine the
(00:28:09)
estimate the further you go
(00:28:11)
along so how do we compute that so well
(00:28:15)
we compute the igen vectors through QR
(00:28:18)
decomposition using the gram Smith
(00:28:20)
process so what we do is we take a
(00:28:22)
function called IG vectors we Define it
(00:28:25)
as a function which as an input takes a
(00:28:28)
Matrix X so in this case X is not our
(00:28:32)
data Matrix it is our co-variance Matrix
(00:28:35)
so it takes the 4 by4 covariance Matrix
(00:28:38)
and then we set a number of iterations
(00:28:41)
right so the more iterations we do the
(00:28:44)
the better our estimation of the of the
(00:28:48)
proper uh igen Vector
(00:28:52)
Matrix but if we do too many iterations
(00:28:55)
this process will slow down quite
(00:28:57)
tremendously so so there is actually a
(00:28:59)
way to make it stop because at some
(00:29:02)
point um it won't improve further right
(00:29:06)
so at some point we have the best basis
(00:29:08)
the the most accurate representation of
(00:29:11)
the pr or not of the principal component
(00:29:13)
but of the igen vectors and from that
(00:29:15)
point on uh The Matrix won't improve
(00:29:17)
anymore um so in theory you could have
(00:29:19)
an if state if statement here which
(00:29:22)
quits um after the Matrix doesn't
(00:29:25)
improve anymore so what are we going to
(00:29:27)
do we're going to set P of Q as the
(00:29:30)
identity Matrix so we take a diagonal
(00:29:32)
matrix um which means that it's just a
(00:29:35)
matrix which is composed all zeros but
(00:29:38)
the diagonal is just ones right so this
(00:29:41)
is going to be um our Q Matrix
(00:29:46)
eventually right so we start off with a
(00:29:48)
Q Matrix which is a diagonal matrix um
(00:29:52)
and then we are going to compute both
(00:29:54)
the Q Matrix and the r Matrix based on
(00:29:57)
our input value value X so how are we
(00:30:00)
going to do that well we're going to go
(00:30:02)
and we are going to iterate in this case
(00:30:04)
100 times so we're going to make a 100
(00:30:07)
updates every time we're going to
(00:30:09)
compute the QR decomposition so we're
(00:30:12)
going to decompose Matrix X into a q
(00:30:16)
Matrix and an R Matrix right so a q
(00:30:20)
Matrix and an R Matrix so we take X we
(00:30:24)
compute the QR
(00:30:27)
decomposition we're going to take the Q
(00:30:29)
Matrix and call it q and we're going to
(00:30:32)
multiply this Q Matrix created from X to
(00:30:36)
our identity Matrix PQ right so PQ is
(00:30:40)
the Matrix that will be updated time and
(00:30:43)
time again so we are going to Matrix
(00:30:45)
multiply PQ * Q so we have q and then
(00:30:50)
what are we going to do well we're now
(00:30:51)
going to reconstruct what is left of X
(00:30:55)
so what are we going to do we're going
(00:30:57)
to take the r Matrix of the original QR
(00:31:00)
decomposition and we're going to
(00:31:01)
multiply that by the Q that we also
(00:31:04)
decomposed right so it's the it's
(00:31:06)
basically nothing more than multiplying
(00:31:09)
the two decompositions together to
(00:31:11)
reconstruct our X Matrix and then we're
(00:31:14)
going to do this again so the only thing
(00:31:17)
which is really or the only thing that
(00:31:19)
is really changing is the X and the p
(00:31:21)
and the Q um so the PQ is going to
(00:31:25)
contain our igen vectors while the X
(00:31:28)
Matrix after we're done is going to
(00:31:31)
contain our diagonal so this is going to
(00:31:35)
be the r Matrix eventually right so once
(00:31:38)
we are done we take the diagonal of X
(00:31:42)
which is our which are our igen values
(00:31:44)
and then the igen vectors are The
(00:31:47)
Columns of p and Q which we can now use
(00:31:50)
to multiply with our standardized
(00:31:55)
data all right so three functions I hope
(00:31:58)
hope it's clear how we kind of build
(00:32:00)
these functions why we need this
(00:32:02)
function so the autoscaling is there to
(00:32:04)
make sure that all of the data is
(00:32:06)
normalized so that we can compute our
(00:32:09)
co-variance Matrix our covariance Matrix
(00:32:12)
tells us how features are related to
(00:32:15)
each other if they are if one of the
(00:32:17)
features is high and the other one is
(00:32:19)
high as well it contributes if they are
(00:32:21)
low then it also contributes but if they
(00:32:23)
are unequal then it doesn't contribute
(00:32:25)
so it gives us a a measurement of how
(00:32:28)
related the different features are right
(00:32:31)
so a high number for covariant means
(00:32:33)
that two things are very similar when it
(00:32:36)
comes to um looking at them while then
(00:32:40)
the QR decomposition will give us more
(00:32:43)
or less the basis for computing our
(00:32:47)
principal
(00:32:50)
components all right so let's put it all
(00:32:52)
together so I'm going to switch to R and
(00:32:55)
just Chuck in the um different um
(00:32:58)
different matrices so let me open up
(00:33:00)
notepad++ um so here we have the three
(00:33:02)
functions uh we have the autoscale
(00:33:05)
function we have the covariance function
(00:33:08)
and we have the igen vector
(00:33:10)
decomposition function uh to compute the
(00:33:12)
igen vectors and the igen values all
(00:33:16)
right so I'm going to copy this and I'm
(00:33:18)
going to go and show you guys R so I'm
(00:33:21)
just going to copy paste this in right
(00:33:25)
so very basically we have now an
(00:33:27)
autoscale function fun right so if we
(00:33:29)
have a vector so let's just do a vector
(00:33:32)
5 6 8 52 32
(00:33:37)
926 um oh those are not commas right so
(00:33:41)
I'm just going to make a little
(00:33:43)
Vector right so what this is going to
(00:33:46)
do I'm going to call this X and I'm
(00:33:50)
going to say
(00:33:55)
autoscale as Matrix
(00:33:59)
X and then I need to transpose it
(00:34:02)
because I think if I do as
(00:34:04)
Matrix no other way around so it does
(00:34:08)
create a matrix the way that I want
(00:34:09)
right so here you see how autoscaling
(00:34:11)
works so it tells you that well okay so
(00:34:15)
this is the value these are the numbers
(00:34:17)
that we inputed so it's going to be the
(00:34:19)
column of our Matrix so what I'm doing
(00:34:21)
is or what it does it computes the mean
(00:34:24)
and then it computes the standard
(00:34:26)
deviation and then it substracted so it
(00:34:28)
means that the original number five is
(00:34:31)
negative 0.7 standard deviations away
(00:34:34)
from the mean value while this large
(00:34:37)
value 95 is 1.93 standard deviations
(00:34:42)
above the average of the vector right so
(00:34:45)
that is how autoscaling works so
(00:34:48)
co-variance so let's load the data set
(00:34:50)
of Iris so that we can actually do that
(00:34:53)
so I'm going to load data Iris and I'm
(00:34:55)
going to make sure that we have our
(00:34:58)
values so let's look at the first 10
(00:35:00)
right so 1 to 10 so first 10 rows so
(00:35:03)
this is how the values look like so here
(00:35:05)
you can see the four vectors are the
(00:35:08)
four columns that we have so the four
(00:35:10)
features that were measured on these
(00:35:12)
Iris plans so we have SLE length SLE
(00:35:14)
width petal length and petal width um
(00:35:17)
and then we can just um standardize them
(00:35:20)
right so it could could take the SLE
(00:35:22)
length column and then say Auto
(00:35:26)
scale right so again it would transform
(00:35:30)
oh um
(00:35:31)
sorry we need to do at least two columns
(00:35:34)
in this case right so I'm taking the
(00:35:36)
first two
(00:35:38)
columns so the first two columns are 5.1
(00:35:41)
3.5 so if I'm autoscaling it it tells me
(00:35:43)
that 5.1 for stle length is actually
(00:35:47)
below um zero or 0.9 standard deviations
(00:35:51)
below the mean while a SLE width of
(00:35:54)
three and a half is actually one
(00:35:56)
standard deviation above the mean right
(00:35:58)
so a value of zero means that it's
(00:36:01)
exactly similar to the mean and any
(00:36:03)
positive value means that it's that many
(00:36:06)
standard deviations above the mean and
(00:36:08)
negative values that many standard
(00:36:09)
deviations underneath the mean um I have
(00:36:12)
my values right so let's look at values
(00:36:15)
1 to 10 again so that's how it looks and
(00:36:18)
then I also have my labels um I think I
(00:36:21)
called it labels um so let's look at the
(00:36:24)
first 10 labels um so the first 10 are
(00:36:27)
all sat plans so these 10 measurements
(00:36:30)
all come from
(00:36:33)
satas all right so we have our data
(00:36:35)
loaded um so then the next step is going
(00:36:38)
to be to um look at the or do the
(00:36:42)
computations right so I'm going to take
(00:36:44)
my values I'm going to autoscale them
(00:36:46)
and I'm going to call this STD so for
(00:36:49)
standardized values um then I'm going to
(00:36:52)
compute The co-variance Matrix um and
(00:36:54)
then I'm going to compute the igen
(00:36:56)
vectors and the igen Val values um using
(00:36:59)
our igen Vector so like I said we're
(00:37:02)
doing it from scratch so it's good to
(00:37:04)
see all of the functions and and how
(00:37:06)
they exactly tie into each other um but
(00:37:08)
all of these functions have built in
(00:37:10)
functions in R um so co-variance um you
(00:37:13)
can use the cuof function which is the
(00:37:16)
built-in covariance function and igen
(00:37:18)
vectors um they have uh the igen
(00:37:21)
function so you can use igen as well but
(00:37:23)
we're going to use igen vectors for our
(00:37:25)
own all right so let's check back into
(00:37:28)
to R so I'm going to just do the
(00:37:30)
computation right so I'm going to load
(00:37:32)
our data I'm going to subset it and then
(00:37:35)
I'm going to compute the STD so these
(00:37:38)
are the standardized values which we
(00:37:39)
already saw right so first 10 Again
(00:37:42)
minus 0.9 standard deviation one
(00:37:45)
standard deviation above the mean under
(00:37:48)
the mean under the mean right so if I
(00:37:50)
look at my covariance U Matrix um so if
(00:37:53)
I look at my covariance Matrix the
(00:37:56)
covariance of two things
(00:37:58)
will always be one right because SLE
(00:38:00)
length and SLE length uh they are
(00:38:04)
100% the same thing so their co-variance
(00:38:07)
is going to be one um of course it
(00:38:09)
doesn't matter which direction I take so
(00:38:12)
the upper triangle of the Matrix is
(00:38:14)
going to be equal to the lower triangle
(00:38:16)
right because it doesn't matter if I
(00:38:18)
compute the SLE length versus SLE width
(00:38:21)
covariance it's going to be the same
(00:38:23)
when I do the reverse or when I do the
(00:38:25)
SLE width computation
(00:38:28)
against the SLE length computation right
(00:38:31)
so here we see that SLE width and SLE
(00:38:33)
length are actually negatively co-
(00:38:35)
varying with each other so that means
(00:38:38)
that the larger the uh SLE the smaller
(00:38:42)
the uh width of so the larger or the
(00:38:45)
larger the length of the SLE the smaller
(00:38:48)
the width of the SLE so that kind of
(00:38:51)
that if you have a flower um you can
(00:38:53)
think about well you can make a very
(00:38:55)
long petal or SLE and and then you will
(00:38:58)
have very very small uh the width of
(00:39:01)
them um while the SLE length and the
(00:39:04)
petal length they are really correlated
(00:39:07)
with each other or I'm saying correlated
(00:39:09)
but they co-vary with each other quite a
(00:39:11)
lot right so um 0.87 that's pretty high
(00:39:15)
um so that means that the longer the SLE
(00:39:18)
of a flower the longer the petal of a
(00:39:20)
flower and the funny thing is is that
(00:39:23)
you can see that the SLE length is also
(00:39:25)
positively co-varying with The Petal
(00:39:28)
width um so it just means that the
(00:39:30)
longer your SLE is the larger the the
(00:39:33)
the the petal that it can support on the
(00:39:36)
on the
(00:39:37)
flower um petal length and petal width
(00:39:40)
are even higher correlated to each other
(00:39:42)
um so they are very very similar so but
(00:39:45)
you can see that for SEL it doesn't hold
(00:39:48)
so SEL are not the the length of the SLE
(00:39:50)
is not related to the width of the SLE
(00:39:53)
but when you look at pedals then the
(00:39:55)
pedal width and the pedal length are
(00:39:57)
highly correlated to each
(00:39:59)
other all right so then when we look at
(00:40:01)
our igen vectors I call those
(00:40:04)
evf so if I look at my igen vectors then
(00:40:07)
it computes here the IG values so those
(00:40:10)
are the diagonal of the r Matrix after
(00:40:13)
de composition and this is my igon
(00:40:16)
Vector Matrix so the first igon Vector
(00:40:19)
is here so it it assigns a value of 0.52
(00:40:24)
to the SLE length um it assigns a value
(00:40:27)
of -
(00:40:28)
0.26 to the SLE width 58 for um SLE
(00:40:34)
petal length versus SLE length right so
(00:40:36)
the The Columns here are unitless um but
(00:40:39)
they do kind of relate to how you should
(00:40:42)
rotate it right so the nice thing about
(00:40:44)
igen vectors which I Told You So if I
(00:40:47)
would take the vectors out and I would
(00:40:49)
just compute the correlation of them um
(00:40:52)
then the correlation of them um should
(00:40:56)
be zero which it is not which is strange
(00:41:00)
l no let me see I'm going to need to
(00:41:02)
transpose
(00:41:04)
those so you can see that they're not
(00:41:06)
exactly zero which is oh no sorry sorry
(00:41:09)
sorry I'm I'm it's not the igen vectors
(00:41:13)
that are uncorrelated it is the
(00:41:15)
principal components that are
(00:41:16)
uncorrelated so I'm I'm messing up
(00:41:18)
myself here I'm I'm confusing igen
(00:41:21)
vectors for principal
(00:41:23)
components anyway let's go back to the
(00:41:26)
presentation
(00:41:28)
right so we can we can autoscale ourself
(00:41:31)
compute the co-variance and then compute
(00:41:34)
the igen vectors um and we can use the
(00:41:36)
igen vectors to make the data
(00:41:39)
uncorrelated into principal
(00:41:43)
components all right so if we want to
(00:41:46)
know how much variance can be explained
(00:41:49)
by each of the principal components that
(00:41:51)
we're going to calculate we can actually
(00:41:53)
use the igen values um so the variance
(00:41:56)
explained by the individual principal
(00:41:58)
components can be computed by the nth
(00:42:01)
igen value divided by the sum of all of
(00:42:04)
the IG values so we can compute the
(00:42:07)
variance explained like this so we can
(00:42:09)
say I take my igen values and I divide
(00:42:12)
those by the sum of the igen values
(00:42:15)
right so then I get a vector um where
(00:42:18)
each of the values is divided by the sum
(00:42:21)
um I multiply it with 100 and I round
(00:42:24)
down to one digit behind the comma if I
(00:42:26)
then want to see the Comm ative sum so
(00:42:28)
so the communative sum should always be
(00:42:31)
one after the number of components right
(00:42:34)
so in this case we have four features we
(00:42:37)
have four components four vectors um so
(00:42:41)
if I sum them up um they should be 100
(00:42:45)
um and then I can visualize it it leads
(00:42:47)
to a plot which looks like this but
(00:42:49)
let's very quickly do that in R right so
(00:42:52)
if I compute the variance explained so
(00:42:56)
I'm going to just say VAR explained
(00:42:58)
right so I round so I take the igen
(00:43:00)
values so that's 2.9 divided by the sum
(00:43:04)
of this so that's not it's 2.9 3.8 4.2
(00:43:10)
4.2 so it's like uh 2.9 divided by 4.2
(00:43:15)
uh you see that the first principal
(00:43:16)
component is going to explain 73% of the
(00:43:20)
data um the second principal component
(00:43:23)
is going to explain
(00:43:25)
22.9% uh the next one 3 7 and the last
(00:43:28)
one half a percent of variance explained
(00:43:32)
so using the commum function we can do
(00:43:34)
the cative sum so if we do cve ctive sum
(00:43:38)
we see that just it doesn't add up to
(00:43:41)
100 it add up to 101 but that is because
(00:43:44)
we rounded them right if we would not
(00:43:46)
round the variance as explained so if we
(00:43:48)
would just say don't round them at all
(00:43:50)
just multiply them by 100 right so
(00:43:53)
variance explained do the ctive sum um
(00:43:56)
then now you see that it will add up
(00:43:58)
exactly to 100 so by rounding it we kind
(00:44:01)
of introduce a little bit of a of a skew
(00:44:03)
um but that's how it works all right so
(00:44:06)
we can basically just plot the
(00:44:08)
communative variance explained and then
(00:44:10)
we get our graph which looks like this
(00:44:12)
so we can see that the first component
(00:44:14)
explains around
(00:44:15)
73% uh the next one so the first two
(00:44:18)
components combined explain
(00:44:21)
95.8% of the variance seen in the data
(00:44:25)
and then the third component combined we
(00:44:27)
are already up to
(00:44:29)
99.5% of the variance explained right so
(00:44:32)
the variance explained in this case
(00:44:34)
tells us how many components we should
(00:44:37)
or can take uh to represent our data so
(00:44:41)
in this case you would say well with two
(00:44:42)
principal components I'm catching 95% of
(00:44:46)
the variance that is in my data set um
(00:44:49)
so the two principal components should
(00:44:51)
be enough to accurately represent the
(00:44:54)
data that has been measured
(00:44:58)
all right so variance
(00:45:01)
explained so the next thing what we want
(00:45:04)
to do when we want to compute our own
(00:45:05)
principal components is to do the PCA
(00:45:08)
projection so it reconstructs the iris
(00:45:11)
data as a linear combination of the
(00:45:13)
original data right so that is what PCA
(00:45:15)
does so how can we do that well we can
(00:45:18)
take our standard data and then multiply
(00:45:21)
it by the projection Matrix um so here
(00:45:25)
we can compute the projection Matrix
(00:45:27)
which is the principal component Matrix
(00:45:29)
um so we can take the first we can take
(00:45:32)
all of the igen vectors we call that W
(00:45:36)
and then what we do is we multiply our
(00:45:39)
standardized data together with W right
(00:45:43)
so if we do that um we can put the
(00:45:45)
column names on so now P will be
(00:45:48)
principal component one in the First
(00:45:49)
Column principal component two in the
(00:45:51)
second
(00:45:52)
column all right so let's do that in R
(00:45:56)
so that you guys can see it as well so P
(00:46:01)
right so this is now the principal
(00:46:04)
component Matrix for our data so if we
(00:46:09)
now would calculate the correlation of P
(00:46:11)
um then now all of them should be more
(00:46:14)
or less zero which you can see that that
(00:46:16)
is the case it is 1.4 * 10- 16 right so
(00:46:20)
of course every principal component is
(00:46:22)
100% correlated to itself but it is
(00:46:25)
uncorrelated to all of the the other
(00:46:27)
principal components right so that now
(00:46:29)
means that each of these vectors of data
(00:46:32)
is independent of each other is catching
(00:46:35)
one unique axis of data right so if you
(00:46:38)
would think about it in a 2d plane uh
(00:46:41)
what we did is if we have the SLE width
(00:46:44)
and the SLE length just two vectors
(00:46:46)
right what we just did is we just
(00:46:48)
rotated the axis so that the axis the
(00:46:51)
first axis catches the most variance and
(00:46:53)
the second axis catches the other
(00:46:56)
remaining Vari in the
(00:46:59)
data all right so we have our projection
(00:47:02)
Matrix so we have computed our own
(00:47:04)
principal components so that is more or
(00:47:06)
less what we set out to do um so next
(00:47:10)
step um is also we can do a partial
(00:47:13)
reconstruction right so if we are
(00:47:15)
thinking about compression um then what
(00:47:18)
we could do instead of storing the
(00:47:20)
original Matrix we can actually store
(00:47:24)
part of the principal component Matrix
(00:47:26)
so if we want to get a scale down kind
(00:47:28)
of image or a scale down Matrix of the
(00:47:32)
iris data set um this would mean that we
(00:47:35)
would be able to store just the first
(00:47:37)
two principal components because with
(00:47:39)
just the first two principal components
(00:47:41)
we can reconstruct 95% of the variance
(00:47:44)
in the data um so that means that we
(00:47:47)
could compress this data set by around
(00:47:50)
half so that means that instead of
(00:47:52)
storing the four features we could store
(00:47:55)
the first two principal components and
(00:47:57)
the first two principal components would
(00:47:59)
still be good enough to reconstruct most
(00:48:01)
of the variance right so if we would do
(00:48:04)
that um let me show you guys how we can
(00:48:06)
do that so I'm just going to go to
(00:48:08)
notepad right so I'm going to take the
(00:48:11)
first
(00:48:12)
two columns of the W Matrix I'm going to
(00:48:17)
multiply them together and then I'm
(00:48:19)
going to do p right so now we see that P
(00:48:23)
is a two column Matrix um and it's still
(00:48:27)
there it's still an uncorrelated
(00:48:30)
Matrix so if we calculate the
(00:48:32)
correlation of P we can still see that
(00:48:35)
these are two things which are
(00:48:37)
independent of each other and if we
(00:48:39)
would plot Matrix P then it shows us
(00:48:42)
that this is kind of the structure in
(00:48:45)
the data right so it will tell us the
(00:48:47)
loading on pc1 and the loading on PC2 um
(00:48:51)
but let's go back and let's take all of
(00:48:54)
the principle or let's take all of the
(00:48:56)
igen vectors to p and then we would plot
(00:48:59)
P um and then we see that there's almost
(00:49:02)
no difference but that is because the
(00:49:04)
first two principal components already
(00:49:06)
caught 95% of the data um so I realize y
(00:49:12)
that you didn't see R all right so let's
(00:49:15)
do it again right so let's go up
(00:49:18)
so what we want to do is we can do so
(00:49:23)
this is the when I look at the first two
(00:49:27)
principal components of P after I've
(00:49:30)
used all of the igen vectors right so it
(00:49:34)
looks like this so if I only take the
(00:49:38)
first two igen
(00:49:41)
vectors
(00:49:44)
then you can see it looks almost
(00:49:47)
identical almost none of the uh plots or
(00:49:51)
almost none of the points have moved but
(00:49:53)
that's because the first two principal
(00:49:55)
components contain more or less the same
(00:49:58)
amount of information or 95% of the
(00:50:01)
information relative to the whole
(00:50:03)
principal component Matrix right so
(00:50:06)
instead of storing a matrix which has
(00:50:08)
150 entries four columns I could just
(00:50:11)
store two columns 150 entries so
(00:50:16)
virtually reducing the data set um by
(00:50:18)
around
(00:50:20)
50% anyway I'm just going to switch back
(00:50:23)
to notepad right and I'm going to make
(00:50:25)
sure that I use all of the principal
(00:50:28)
components uh all of the igen vectors to
(00:50:31)
compute the principal
(00:50:32)
components so let's do that and then do
(00:50:36)
a little plot to make sure that it's
(00:50:38)
there so plot and you can see that they
(00:50:41)
barely move the points um so barely move
(00:50:46)
again I realize you guys are not looking
(00:50:48)
at R so they barely move right almost
(00:50:53)
identical all right so let's start
(00:50:56)
visualizing in it right because this is
(00:50:59)
uh one of the visualizations that we can
(00:51:01)
use um principal component one versus
(00:51:03)
principal component two that's generally
(00:51:05)
the two that catch the most variant um
(00:51:08)
so what if we compare our from scratch
(00:51:12)
PCA to the builtin pr comp function
(00:51:16)
right it should be very very similar
(00:51:18)
should be identical so the way that I'm
(00:51:20)
going to do this is by saying I'm going
(00:51:23)
to make a plot which has two uh windows
(00:51:26)
so there's two PL inside a single window
(00:51:28)
so I can do that by setting the
(00:51:30)
parameter MF row to say I want to have
(00:51:33)
one row two columns right so it's a 1
(00:51:36)
time two plot so one row two columns so
(00:51:39)
I do the first plot so I'm plotting the
(00:51:41)
principal components from scratch in the
(00:51:44)
first so on the left side and then on
(00:51:46)
the right side I'm going to plot the
(00:51:48)
buil-in pr comp function right so our
(00:51:51)
principal component Matrix is called P
(00:51:54)
so from P I'm going to take PCA pc1 and
(00:51:57)
PC2 I'm going to color by the labels I'm
(00:51:59)
going to give it a main so we know which
(00:52:01)
one is which um and I'm going to do PCH
(00:52:04)
is 19 to have filled circles um and then
(00:52:07)
I'm going to put a legend there um which
(00:52:09)
is going to be on the bottom right um
(00:52:12)
the levels are going to be the labels so
(00:52:14)
that I'm going to take the levels of the
(00:52:17)
labels so that's going to be setosa
(00:52:19)
vericolor and FICA and the colors are
(00:52:22)
going to be the unique labels because
(00:52:25)
here I use the the labels the
(00:52:27)
as the colors um same PCH so in our
(00:52:31)
doing principal component analysis it's
(00:52:33)
just a single call right so you can do
(00:52:36)
PR comp on your values scale is equal to
(00:52:39)
True right so here there is no
(00:52:43)
autoscaling that you need to do no you
(00:52:45)
can just give the pr comp function the
(00:52:48)
values and then you can say well I want
(00:52:50)
to autoscale them and then from this
(00:52:54)
extract Matrix X so Matrix small X in
(00:52:58)
the return of the pr comp function that
(00:53:01)
is the principal component Matrix um so
(00:53:04)
this is called PC so um I'm can then
(00:53:07)
plot the PCA again I'm going to take the
(00:53:09)
first two columns I'm going to take the
(00:53:11)
labels um make sure that it that I know
(00:53:14)
which one is which all right so let's
(00:53:17)
check to
(00:53:18)
R sure that we can see the r window so
(00:53:21)
let's do the little plot so that we can
(00:53:23)
compare both of them together um so I'm
(00:53:26)
going to make sure that my windows a
(00:53:27)
little bit broader like this so we can
(00:53:30)
fit all
(00:53:31)
two right so and this is what we see we
(00:53:34)
see our from scratch principal component
(00:53:36)
analysis um and then the function PR
(00:53:39)
comp all right so first question could
(00:53:42)
you elaborate more on the value of pc1
(00:53:44)
and PC2 and they explain the data uh
(00:53:48)
repartition yeah so pc1 and PC2 are so
(00:53:53)
the data is so the data in the PCA
(00:53:58)
Matrix is exactly the same as the data
(00:54:00)
in the original Matrix right if I use
(00:54:03)
all principal components it is just that
(00:54:06)
it's a different projection right so
(00:54:09)
instead of having one axis which is
(00:54:12)
petal length another axis which is petal
(00:54:14)
width and then a third axis which is SLE
(00:54:17)
length and another fourth axis which is
(00:54:19)
SLE WID I now have four
(00:54:24)
AES which are not related to the
(00:54:27)
original
(00:54:28)
measurements but they are capturing the
(00:54:31)
exact same pattern right if we could
(00:54:33)
make a four dimensional Cube then in the
(00:54:37)
four-dimensional cube the data has not
(00:54:40)
changed the only thing which has changed
(00:54:42)
is the AIS system that we are using to
(00:54:45)
project the data so instead of having a
(00:54:47)
data AIS which we can understand as
(00:54:50)
being SLE length we now have a first X
(00:54:54)
AIS and the property of this axis is it
(00:54:58)
explains the most variance in the data
(00:55:01)
the second axis is this is the the Y AIS
(00:55:05)
used to be SLE width but now it is the
(00:55:07)
axis which explains the most variance
(00:55:11)
except for the other axis right so it
(00:55:13)
explains the second most amount of
(00:55:16)
variance so the data doesn't change
(00:55:18)
right the data is still exactly the same
(00:55:21)
as it was it is just that we move the
(00:55:25)
exis system so having four axes which we
(00:55:29)
have measured petal length and width SLE
(00:55:32)
length and SLE width we now have four
(00:55:34)
different axes through our data but the
(00:55:37)
data is still exactly the
(00:55:39)
same also what does this plot exactly
(00:55:42)
mean pc1 versus PC2 how do we analyze PC
(00:55:45)
in the context of the iris data yeah so
(00:55:47)
here in the context of the iris data we
(00:55:50)
might want to know are these two
(00:55:53)
different plants or are these three
(00:55:55)
different species right so if we look at
(00:55:58)
the first principle component we see
(00:56:00)
that the data kind of splits out into
(00:56:03)
one very clearly distinct group right so
(00:56:05)
we can see that the satoa is on the
(00:56:09)
first principal component axis I
(00:56:11)
can very basically I could just say well
(00:56:15)
if the value if the loading of the plant
(00:56:18)
or of the measurements is below minus
(00:56:21)
one it is going to be
(00:56:23)
AOA right but for the other two species
(00:56:26)
it's not that clear we can see that the
(00:56:29)
verol and the vinica they overlap each
(00:56:32)
other in the middle here right so that
(00:56:34)
means that these two species on the
(00:56:38)
first principal component axis they are
(00:56:40)
not separated right so it means that
(00:56:43)
there's no clear phenotypic difference
(00:56:46)
between these two plants when we just
(00:56:48)
measure these four variables so from
(00:56:51)
these four variables we can uniquely
(00:56:53)
identify the satoa right so the satoa is
(00:56:57)
clearly different from all of the other
(00:57:00)
plants but the feric color and the FICA
(00:57:04)
they are not uniquely separable on the
(00:57:06)
first principal component the second
(00:57:09)
principal component in this case doesn't
(00:57:11)
add anything it doesn't allow us to
(00:57:14)
distinguish um better right we could say
(00:57:17)
well um if you are high on the second
(00:57:21)
principal component axis right above one
(00:57:25)
then you are either a
(00:57:28)
frenica or you are a setosa but the main
(00:57:32)
difference between these plants like 75%
(00:57:35)
of the difference in these measurements
(00:57:37)
is just captured in the first principal
(00:57:40)
component and it allows us to
(00:57:42)
distinguish satas from the other two
(00:57:46)
species very
(00:57:47)
clearly we can see that the virginica on
(00:57:50)
the first AIS is slightly higher values
(00:57:53)
than the verc color but they are not
(00:57:55)
separating so they are not separating
(00:57:58)
out of each other so it still means that
(00:58:01)
if you would want to make a
(00:58:03)
determination saying are these really
(00:58:05)
three different species or are they just
(00:58:08)
two different species the answer here
(00:58:10)
would be is that well there's some
(00:58:12)
evidence to say that they are two
(00:58:14)
species and that the verc color and the
(00:58:17)
virginica are more or less similar to
(00:58:20)
each other still based on just having
(00:58:23)
these four measurements on the the SEL
(00:58:25)
and the petals right because we only
(00:58:28)
have four measurements to start off with
(00:58:30)
we don't have the color or other things
(00:58:32)
that we look at right so the idea is is
(00:58:34)
that the loading on the first principal
(00:58:36)
component tells us um if we can separate
(00:58:40)
out the groups based on the most or the
(00:58:43)
AIS with the most variance so if we want
(00:58:46)
to know right so if we look at um the so
(00:58:50)
very basically if we have our Matrix P
(00:58:53)
right and we look at pc1 right so let's
(00:58:57)
look at
(00:58:58)
pc1 right so these are our values so if
(00:59:01)
I correlate this to the original values
(00:59:05)
that we had right you can see that petal
(00:59:09)
length and petal width is highly loaded
(00:59:13)
on the first principal component as well
(00:59:16)
as SLE length right so you can see that
(00:59:18)
the correlation is almost 0.99 so that
(00:59:22)
means that the first principal component
(00:59:24)
AIS is a combination
(00:59:27)
or is mostly looking at the petal
(00:59:30)
length right so it's looking at the
(00:59:32)
petal length and it's including a little
(00:59:35)
bit or 0.9 the SLE length right so it's
(00:59:38)
it's a combination axis so instead of
(00:59:40)
having one axis which is SLE length this
(00:59:43)
axis is catching variance mostly from
(00:59:47)
petal length a lot from petal width but
(00:59:50)
these two things are relatively
(00:59:52)
correlated to each other and also a
(00:59:54)
little bit of the SLE length if we look
(00:59:57)
at the second principal component right
(00:59:59)
so we can just say the correlation of P
(01:00:01)
PC2 to all of our values then we see
(01:00:04)
that the second principal component
(01:00:06)
actually catches the variance which is
(01:00:08)
in the SLE
(01:00:10)
width right so if we want to annotate
(01:00:12)
our axis then the principal component
(01:00:14)
one axis is the seple petal length axis
(01:00:20)
together with the petal with axis the
(01:00:22)
second principal component is the axis
(01:00:24)
which catches the variation of the SLE
(01:00:27)
width um and a little bit of the SLE
(01:00:30)
length right so so this is how we can
(01:00:33)
take PCS and kind of deconstruct our
(01:00:36)
data into um unique vectors but these
(01:00:40)
vectors are uncorrelated to each other
(01:00:42)
and that is the advantage because they
(01:00:44)
are uncorrelated to each other um it
(01:00:46)
means that we can look at one axis and
(01:00:48)
then look at another axis and put them
(01:00:51)
perpendicular to each other right but
(01:00:53)
because if we would do that for SLE
(01:00:55)
length and SLE with we would not clearly
(01:00:58)
see this
(01:00:59)
difference all right so um let's go
(01:01:03)
quickly back right so I also put the
(01:01:05)
plot in the presentation so I put H here
(01:01:08)
right because if we look at our from
(01:01:10)
scratch principal component
(01:01:13)
analysis it looks slightly different
(01:01:15)
than when we did the pr
(01:01:17)
comp right so we can see that there are
(01:01:21)
slight differences so what are those
(01:01:24)
differences well there are no
(01:01:28)
differences because PCS are linear
(01:01:31)
combinations right so a a principal
(01:01:34)
component is just
(01:01:37)
inverted in our case but that doesn't
(01:01:40)
matter because it doesn't matter if you
(01:01:42)
if you have the x-axis on the or if you
(01:01:44)
have the Y AIS going from minus one to
(01:01:47)
one or if you have it go from minus1 to
(01:01:50)
one the other way around right so in the
(01:01:52)
end it's the same thing right minus1 -
(01:01:56)
and minus one is the same as 11 one it's
(01:02:00)
just in the opposite direction right so
(01:02:02)
since they are linear combinations they
(01:02:04)
can be orthogonal and of course being
(01:02:07)
orthogonal in the perpendicular
(01:02:09)
direction is the same as being
(01:02:11)
orthogonal in the other direction right
(01:02:13)
so so how do we fix this well we just we
(01:02:15)
just flip it around right so we just
(01:02:18)
flip it around um and then you can see
(01:02:21)
that they are exactly the same PCA plot
(01:02:23)
right so that's the fix um of course we
(01:02:26)
can do this in r as well right so in R
(01:02:29)
when we have our our plot here um so the
(01:02:32)
only thing which I have to do when I do
(01:02:33)
my plot um so let me do my plot again
(01:02:37)
let me get my code for the plot let's go
(01:02:39)
back to notepad so and the only thing
(01:02:42)
that we can do is if we want to do this
(01:02:44)
then I know now that on the x-axis I
(01:02:48)
want to have pc1 right not just pc1 and
(01:02:51)
PC2 and I can say on the y axis I want
(01:02:55)
to have p
(01:02:57)
PC2 but now take the negative value of
(01:03:00)
PC2 right so I'm just going to say
(01:03:03)
negate the PC2 value and on the x-axis
(01:03:07)
put the pc1 value so if I will do it
(01:03:10)
like this right so now um when we look
(01:03:14)
at it um we can see that they are
(01:03:16)
exactly identical um so had the the
(01:03:18)
value here is exactly like that oh crap
(01:03:21)
you guys can see that
(01:03:23)
again I'm not paying attention to which
(01:03:25)
window I
(01:03:27)
right so if I put it in and I say on the
(01:03:29)
X plot pc1 on the Y plot the negative of
(01:03:33)
PC2 um then now they look exactly
(01:03:35)
identical and of course now I need to
(01:03:38)
move the legend up to here as well so
(01:03:40)
that it doesn't overlap the the points
(01:03:42)
here uh but you can see that they are
(01:03:44)
exactly
(01:03:47)
identical all right so let me go back to
(01:03:51)
the presentation so we can just fix it
(01:03:52)
by flipping it around no no issue
(01:03:55)
whatsoever um so we can flip the whole
(01:03:57)
figure or we can just flip the PC2 axis
(01:04:01)
right so that happens often in principal
(01:04:04)
components because it's the same thing
(01:04:06)
they are linear combinations so it
(01:04:08)
doesn't matter if you project it in the
(01:04:10)
positive direction or if you projected
(01:04:12)
in the negative Direction uh like it's
(01:04:14)
doing
(01:04:17)
here all right so that's actually
(01:04:19)
everything that I wanted to say for
(01:04:21)
today um so principal component analysis
(01:04:24)
it depends on you doing Auto scaling of
(01:04:26)
your data you then compute a covariance
(01:04:29)
matrix you then compute the igen vectors
(01:04:32)
and the igen values using uh the grum
(01:04:35)
Smith
(01:04:37)
process you then compute your variance
(01:04:39)
explained based on the igen vectors and
(01:04:43)
then you can do a PCA projection which
(01:04:45)
means taking your standardized data
(01:04:47)
Matrix multiplying it with the igen
(01:04:50)
vectors and then you have a
(01:04:53)
re-representation of your data in in
(01:04:56)
principal component space so in
(01:04:59)
orthogonal principal
(01:05:01)
components um and then we compared it
(01:05:03)
back to PR comp um and then we talked a
(01:05:06)
little bit about the interpretation um
(01:05:08)
so principal component analysis it's
(01:05:10)
used a lot to find groups in data to see
(01:05:13)
how well things separate from each other
(01:05:16)
um but also to see how reproducibility
(01:05:19)
is in in experiments right because the
(01:05:22)
closer things Cluster on the first two
(01:05:24)
principal components the
(01:05:26)
the the the less variant there is um so
(01:05:30)
if we would look at this and we can see
(01:05:33)
that these um the ctoas they cluster
(01:05:37)
really really well on the first
(01:05:38)
principal component so there's no real
(01:05:41)
variance in the ctoas when it comes to
(01:05:45)
the seel length um and the Bal length
(01:05:49)
and we can see that because of the fact
(01:05:52)
that these two are highly correlated to
(01:05:54)
the principal component one so that
(01:05:56)
means that when we look at satas they
(01:05:59)
don't have that much differences in in
(01:06:02)
SES or in Petal links um if we look at
(01:06:05)
the versy colors and the virginas right
(01:06:08)
so these have much much more spread so
(01:06:11)
hey if we would if we would look at the
(01:06:13)
data and we would for example look at
(01:06:16)
the petal length right then I would
(01:06:18)
assume that if I make a histogram of uh
(01:06:22)
the
(01:06:23)
values right so I'm just going to take
(01:06:25)
value
(01:06:26)
I'm going to say
(01:06:28)
which
(01:06:30)
labels is is oh
(01:06:34)
Sosa and then I'm going to look at
(01:06:41)
the past petal. length right so this is
(01:06:45)
a histogram of the satas The Petal
(01:06:48)
length um if I would do the same thing
(01:06:51)
for the versy colors ver color
(01:06:56)
then you can see that had these vary
(01:06:59)
from like 1 cm to 1.8 cm but these vary
(01:07:04)
from 3 to 5 and 1 12 CM right so here
(01:07:07)
there's only a 0.8 variance and here
(01:07:11)
there is a 2 and a half variance right
(01:07:13)
so that is caught in the principal
(01:07:16)
component plot as well um because that's
(01:07:20)
what we see here so there's less
(01:07:22)
variance in
(01:07:24)
The Petal length
(01:07:27)
of satas relative to the other two
(01:07:30)
species right so we can use it to
(01:07:33)
interpret our data um and we can kind of
(01:07:37)
Reason about what we see um same thing
(01:07:40)
would be if we look at the second
(01:07:41)
principle component axis which is the
(01:07:43)
SLE width axis right because it's highly
(01:07:46)
correlated to the SLE width uh we would
(01:07:49)
assume that if we would look at the SLE
(01:07:51)
width from uh Sosa that there's more
(01:07:55)
spread in the Sosa SLE width than there
(01:07:58)
is in the ver color right so let's check
(01:08:01)
that right so if we do the same one um
(01:08:05)
again um but now we do the Sosa petal
(01:08:12)
with oh uh petal width probably with a
(01:08:15)
capital and we do the vericolor petal
(01:08:21)
width right we would
(01:08:24)
see that this one various varies
(01:08:27)
0.6 this one varies 0.8 so not entirely
(01:08:33)
but that's also because the correlation
(01:08:35)
is not it's not 100% but he we can see
(01:08:38)
that there's at least the same kind of
(01:08:40)
the same amount of variance in the in
(01:08:43)
the petal width between the satas and
(01:08:46)
the ver colors which is kind of what the
(01:08:49)
PC second PC tells us a little bit as
(01:08:53)
well because eh I would have expected it
(01:08:56)
to be a little bit less but there's a
(01:08:59)
massive outlier here in the Sosa um so
(01:09:02)
and the Sosa's range from like minus 3
(01:09:05)
to one so it's four and these ones go
(01:09:08)
from minus one to two something so I
(01:09:11)
would have expected the um SLE width uh
(01:09:15)
to be a little bit less
(01:09:18)
variable all right
(01:09:21)
so are there any questions so far
(01:09:28)
I understand that it's difficult like um
(01:09:32)
if we would if I would have to explain
(01:09:35)
igen vectors and igen values all the way
(01:09:38)
from zero it would involve doing a whole
(01:09:43)
linear algebra rection right so all of
(01:09:46)
these things um you can easily find very
(01:09:49)
good starting points on things like
(01:09:51)
Wikipedia right so if you want to learn
(01:09:53)
more about igen vectors and igen value
(01:09:55)
use definitely take a look at Wikipedia
(01:09:59)
um they have very good citations to
(01:10:01)
original literature that you can read um
(01:10:04)
there's very much or there's probably
(01:10:07)
like other YouTubers who do like a whole
(01:10:09)
linear algebra lecture from scratch
(01:10:12)
right so because you want to start off
(01:10:14)
with what is a vector multiple vectors
(01:10:17)
together they form a matrix you can
(01:10:21)
transform vectors um so how do you do
(01:10:24)
that so in this case because I just
(01:10:27)
wanted to give you an high level
(01:10:28)
overview of PCA so what is needed
(01:10:31)
autoscaling co-variance igen vectors and
(01:10:34)
igen values normally people just use the
(01:10:37)
pr comp function in R right so they just
(01:10:40)
use PR comp they don't think about what
(01:10:42)
is happening a lot of people actually
(01:10:44)
forget to set the scale is true right
(01:10:47)
because you do need to do
(01:10:49)
autoscaling always um so the default is
(01:10:54)
false in PR comp um and this is not due
(01:10:58)
to scaling not being needed but this is
(01:11:01)
because of backward
(01:11:03)
compatibility um so these are the steps
(01:11:05)
to do your own PCA normally people just
(01:11:08)
use the pr comp function and that's it
(01:11:11)
and then they plot the first two
(01:11:12)
principal components they look at it and
(01:11:14)
then they try to interpret what's going
(01:11:16)
on um but you can do much more with it
(01:11:20)
um so you can he you can you can modify
(01:11:22)
so instead of co-variance you could look
(01:11:24)
at the uh comp the igen vectors and igen
(01:11:27)
values based on the correlation as well
(01:11:29)
um so it allows you more flexibility to
(01:11:32)
know how principal component analysis is
(01:11:35)
um is
(01:11:37)
working all right so if there's no
(01:11:39)
further questions for today um then
(01:11:42)
that's that's what I wanted to talk
(01:11:44)
about um 1 hour 11 minutes not too bad
(01:11:47)
not too bad um I've gotten some
(01:11:49)
complaints that my videos are too long
(01:11:52)
um which I can understand like I tend to
(01:11:54)
rabble on about things that don't really
(01:11:56)
matter too much um but
(01:11:59)
uh it it's the way that it is so um I
(01:12:03)
might go back to uh streaming on Twitch
(01:12:06)
and then cutting it up um so that you
(01:12:08)
guys get like bite-sized videos um but I
(01:12:12)
I I do like doing the streams so I might
(01:12:14)
just keep them on
(01:12:16)
YouTube all right so no further
(01:12:19)
questions then um I'm wishing you all a
(01:12:23)
very happy Sunday um
(01:12:28)
uh thanks for this lesson if total pc1
(01:12:30)
and PC2 accounts for less than 50%
(01:12:33)
should we present more components yes
(01:12:36)
yeah well you always like principal
(01:12:38)
components they are related to the
(01:12:41)
original data sources that you had right
(01:12:44)
so if you have 60 different features of
(01:12:47)
course the first two principal
(01:12:49)
components are not going to catch all of
(01:12:52)
your variation right because if you
(01:12:54)
start off with 60 features you probably
(01:12:57)
need like five or six or seven uh to
(01:13:00)
explain a reasonable amount of variance
(01:13:02)
but also what is a reasonable amount of
(01:13:04)
variance is very dependent on what
(01:13:06)
you're doing right so one of the things
(01:13:09)
that I always do when I do PCA is do the
(01:13:12)
correlation of the PCS back to the raw
(01:13:16)
unscaled data that I had to see what is
(01:13:20)
causing or what is how how are these
(01:13:22)
original phenotypes loaded onto the PC
(01:13:27)
um but yeah no generally you want to
(01:13:30)
have two or more
(01:13:34)
components you want to or uh if you have
(01:13:36)
60 features but you want to end up with
(01:13:39)
like 80% to 85% variance explained um
(01:13:44)
because those are kind of the main
(01:13:46)
directions in your data right so if you
(01:13:48)
need four or five components to hit this
(01:13:51)
80% explained you would plot PCA one for
(01:13:55)
vers 2 1 versus 3 2 versus 3 1 versus 4
(01:14:00)
2 versus 4 3 versus 4 and you would look
(01:14:03)
at all of them to see if you see any
(01:14:06)
clear grouping right because this clear
(01:14:08)
grouping will allow you to do
(01:14:10)
predictions as well right because if we
(01:14:12)
now measure a new
(01:14:14)
satoa we then do the computation right
(01:14:16)
so we multiply the values that we obtain
(01:14:19)
with the projection Matrix and it ends
(01:14:22)
up being right so it ends up being a
(01:14:26)
negative scoring one so we would know
(01:14:29)
that the measurements come from a satoa
(01:14:32)
right so we can take four
(01:14:33)
measurements of the plant measure the
(01:14:36)
four
(01:14:37)
things multiply this with our our igon
(01:14:41)
Vector Matrix and then it will get four
(01:14:44)
new values which would then allow us to
(01:14:47)
determine which plan it is without
(01:14:49)
knowing it right so it PCA can also be
(01:14:52)
used to predict um what predict what you
(01:14:56)
are
(01:14:58)
seeing based on the phenotypic
(01:15:01)
measurements that you
(01:15:02)
have all right so thank you guys for
(01:15:05)
spending your Sunday with me um please
(01:15:09)
like the the video stream And subscribe
(01:15:12)
to my YouTube channel if you want to see
(01:15:13)
more um you would apply electrom
(01:15:16)
metabolomic data analysis like the one
(01:15:18)
on RNA from scratch I am working on that
(01:15:20)
Dion I am working on that um I've been
(01:15:23)
doing a lot of metabolomics
(01:15:26)
recently um
(01:15:28)
and I I almost have a working pipeline
(01:15:32)
um very similar to things like uh Metabo
(01:15:35)
analyst uh Ms dial um which goes from
(01:15:39)
raw machine output through all of the
(01:15:43)
different steps that you normally need
(01:15:45)
to do like um like scaling the the ma uh
(01:15:50)
scaling the profiles um and then
(01:15:53)
determining features and then doing
(01:15:54)
feature annotation um so I am planning
(01:15:57)
on doing one of those in the futures um
(01:16:00)
there's also a qtl lecture that I'm
(01:16:03)
currently preparing based on a request
(01:16:06)
from last no not last week but the week
(01:16:09)
before that so from the last stream um
(01:16:11)
someone asked me could you do a lecture
(01:16:13)
about qtl um so there will definitely be
(01:16:15)
a qtl lecture and there will definitely
(01:16:17)
be a metabolomic lecture um once I get
(01:16:22)
the pre-print out for the qtl mapping
(01:16:25)
work that we've been doing on the um
(01:16:28)
head 3 um there will also be a video
(01:16:33)
about how to do longevity analysis on
(01:16:37)
the um 3 m that we're currently been
(01:16:39)
doing so there's there's a lot of things
(01:16:41)
in the pipeline um it's just finding the
(01:16:44)
time to make them um but metabolomics is
(01:16:47)
definitely on my list um like I said
(01:16:50)
I've almost got a fully working Pipeline
(01:16:52)
and then we will do metabolomics from
(01:16:56)
scratch um and then uh it's going to be
(01:16:59)
fun all right thanks so much guys um
(01:17:03)
enjoy the rest of your Sunday um I hope
(01:17:05)
you have better weather than me like
(01:17:07)
here it's been gray and raining the
(01:17:09)
whole weekend well mostly the whole week
(01:17:13)
um so yeah it's been a been a poor
(01:17:15)
summer here so I hope that you guys have
(01:17:17)
nice beautiful weather um and that you
(01:17:19)
can spend some time
(01:17:21)
outside all right then see you guys next
(01:17:24)
time ch
