目标:得到一个四期面板数据,每期包括家庭库和个人库
一、提取变量
以2014年为例,2016、2018、2020省略处理过程
1.处理个人库
keep fid14 pid provcd14 urban14 cfps2014_age cfps_gender qea0 qp201 cfps2014eduy_im qz207 ku802
替换缺失值
for var _all: replace X =. if inlist(X, -10, -9, -8, -2, -1)
婚姻状况
codebook qea0
recode qea0 (2 = 1 "已婚")(1 3 4 5 = 0 "未婚"), gen(qea0_1)
drop qea0
2.处理家庭库
keep fresp1pid fid14 provcd14 urban14 finc familysize fwage_1 foperate_1 fproperty_1 ftransfer_1 fu201 fo1
替换缺失值
for var _all: replace X =. if inlist(X, -10, -9, -8, -2, -1)
对收入、人情支出取对数
foreach var of varlist finc fwage_1 foperate_1 fproperty_1 ftransfer_1 fu201 {
gen log`var'=log(`var')
}
drop finc fwage foperate fproperty ftransfer fu201
3.合并个人库和家庭库
use "D:\2014famecon.dta"
rename fresp1pid pid
merge 1:1 fid14 pid using "D:\2014person.dta"
keep if _merge==3
drop _merge
gen year=2014
4.数据核查
egen miss = rowmiss(fo101 familysize logfinc logfwage_1 logfoperate_1 logftransfer_1 logfu201 age age2 gender qp201 eduy_im qu802 qz207 qea0_1)
tab miss
keep if miss == 0
2014、2016、2018、2020都按照以上方法处理保存
5.合并四年
use"D:\2014.dta"
append using "D:2016.dta"
append using "D:\2018.dta"
append using "D:\2020.dta"
6.构建非平衡面板
duplicates tag fid14 year, gen(num)
tab num
keep if num == 0
order fid year
7.到此,整理结束.但清洗结束后,所剩样本量不到2000个,显然清洗过程有问题,还请大家指正。
标签:CFPS,dta,keep,指正,using,var,2014,心得,qea0 From: https://blog.csdn.net/jk182/article/details/143375642