DataFrame is 2-D array, Series is 1-D array
例子1:
democrat = (gss['partyid'] <= 1)
gss is a DataFrame from CSV, gss['partyid']取出gss这一列,而DataFrame的每一列都是serises, gss['partyid'] is a series and gss['partyid'] <= 1 is series which type of is bool
例子2:
selected = democrat[liberal]
selected, democrat, liberal are all series.
Q:why democrat and liberal has same length and selected has smaller lenth?
A:因为democrat and liberal has the same length as the number of rows of gss,即democrat对gss中的每一个respondent判断其是否为democrat,如果是对应的值就是True,所以democrate的长度等于gss的行数。
Democrat[liberal],就是49290个respodent中把是liberal的跳出来,即selected的长度为13493,即在所有的respondent中是liberal是有13493个人
即条件概率依然是fraction of a finite sample space,只是这里的样本空间从49290缩小到13493。
对于selected这一series来说里面respondent全是liberal,只要用selected.sum()就能知道liberal里面有多少democrat,只要用selected.mean()就能知道已知某人为liberal,则它为respondent的概率是多少