首页 > 其他分享 >时间序列分段和预测

时间序列分段和预测

时间:2024-10-27 19:20:27浏览次数:5  
标签:matrix plt 分段 fix cluster 序列 ax data 预测

 

总览

Data overview
Loading library and data
Segmentation
    Pre-processing data for segmentation
    Determining the number of cluster
    Clustering
    Visualization of segments
    Analyzing each segment
Forecasting Single time-series
    Prepocessing Data for forecasting
    Forecasting electricity usage of each segment using Fbprophet
    Forecasting electricity usage of all costumers using Fbprophet

加载数据

import pandas as pd import numpy as np import matplotlib.pyplot as plt import warnings warnings.filterwarnings('ignore') %matplotlib inline pd.set_option('display.max_columns', 500)

data_ori = pd.read_csv('../input/daily_electricity_usage.csv') data_ori['date'] = pd.to_datetime(data_ori['date'])

data_ori.head()

Meter IDdatetotal daily KW
010002009-07-1411.203
110002009-07-158.403
210002009-07-167.225
310002009-07-1711.338
410002009-07-1811.306

给定的数据集包含从ID 1000到ID 7444的6445个ID。该数据集提供了2009年7月14日至2010年12月31日的每日用电量。客户不仅来自住房,也来自企业。

1. 数据预处理

按用电户分割数据

data = pd.DataFrame({'date':pd.date_range('2009-07-14',periods=536,freq='D',)}) for i in range(1000,7445): S=data_ori[data_ori['Meter ID']==i][['date','total daily KW']] data=pd.merge(data,S,how='left',on='date') for i in range(1,6446): data.columns.values[i]="ID"+str(999+i)

data.head()

dateID1000ID1001ID1002ID1003ID1004ID1005ID1006ID1007ID1008ID1009ID1010ID1011ID1012ID1013ID1014ID1015ID1016ID1017ID1018ID1019ID1020ID1021ID1022ID1023ID1024ID1025ID1026ID1027ID1028ID1029ID1030ID1031ID1032ID1033ID1034ID1035ID1036ID1037ID1038ID1039ID1040ID1041ID1042ID1043ID1044ID1045ID1046ID1047ID1048ID1049ID1050ID1051ID1052ID1053ID1054ID1055ID1056ID1057ID1058ID1059ID1060ID1061ID1062ID1063ID1064ID1065ID1066ID1067ID1068ID1069ID1070ID1071ID1072ID1073ID1074ID1075ID1076ID1077ID1078ID1079ID1080ID1081ID1082ID1083ID1084ID1085ID1086ID1087ID1088ID1089ID1090ID1091ID1092ID1093ID1094ID1095ID1096ID1097ID1098ID1099ID1100ID1101ID1102ID1103ID1104ID1105ID1106ID1107ID1108ID1109ID1110ID1111ID1112ID1113ID1114ID1115ID1116ID1117ID1118ID1119ID1120ID1121ID1122ID1123ID1124ID1125ID1126ID1127ID1128ID1129ID1130ID1131ID1132ID1133ID1134ID1135ID1136ID1137ID1138ID1139ID1140ID1141ID1142ID1143ID1144ID1145ID1146ID1147ID1148ID1149ID1150ID1151ID1152ID1153ID1154ID1155ID1156ID1157ID1158ID1159ID1160ID1161ID1162ID1163ID1164ID1165ID1166ID1167ID1168ID1169ID1170ID1171ID1172ID1173ID1174ID1175ID1176ID1177ID1178ID1179ID1180ID1181ID1182ID1183ID1184ID1185ID1186ID1187ID1188ID1189ID1190ID1191ID1192ID1193ID1194ID1195ID1196ID1197ID1198ID1199ID1200ID1201ID1202ID1203ID1204ID1205ID1206ID1207ID1208ID1209ID1210ID1211ID1212ID1213ID1214ID1215ID1216ID1217ID1218ID1219ID1220ID1221ID1222ID1223ID1224ID1225ID1226ID1227ID1228ID1229ID1230ID1231ID1232ID1233ID1234ID1235ID1236ID1237ID1238ID1239ID1240ID1241ID1242ID1243ID1244ID1245ID1246ID1247ID1248...ID7195ID7196ID7197ID7198ID7199ID7200ID7201ID7202ID7203ID7204ID7205ID7206ID7207ID7208ID7209ID7210ID7211ID7212ID7213ID7214ID7215ID7216ID7217ID7218ID7219ID7220ID7221ID7222ID7223ID7224ID7225ID7226ID7227ID7228ID7229ID7230ID7231ID7232ID7233ID7234ID7235ID7236ID7237ID7238ID7239ID7240ID7241ID7242ID7243ID7244ID7245ID7246ID7247ID7248ID7249ID7250ID7251ID7252ID7253ID7254ID7255ID7256ID7257ID7258ID7259ID7260ID7261ID7262ID7263ID7264ID7265ID7266ID7267ID7268ID7269ID7270ID7271ID7272ID7273ID7274ID7275ID7276ID7277ID7278ID7279ID7280ID7281ID7282ID7283ID7284ID7285ID7286ID7287ID7288ID7289ID7290ID7291ID7292ID7293ID7294ID7295ID7296ID7297ID7298ID7299ID7300ID7301ID7302ID7303ID7304ID7305ID7306ID7307ID7308ID7309ID7310ID7311ID7312ID7313ID7314ID7315ID7316ID7317ID7318ID7319ID7320ID7321ID7322ID7323ID7324ID7325ID7326ID7327ID7328ID7329ID7330ID7331ID7332ID7333ID7334ID7335ID7336ID7337ID7338ID7339ID7340ID7341ID7342ID7343ID7344ID7345ID7346ID7347ID7348ID7349ID7350ID7351ID7352ID7353ID7354ID7355ID7356ID7357ID7358ID7359ID7360ID7361ID7362ID7363ID7364ID7365ID7366ID7367ID7368ID7369ID7370ID7371ID7372ID7373ID7374ID7375ID7376ID7377ID7378ID7379ID7380ID7381ID7382ID7383ID7384ID7385ID7386ID7387ID7388ID7389ID7390ID7391ID7392ID7393ID7394ID7395ID7396ID7397ID7398ID7399ID7400ID7401ID7402ID7403ID7404ID7405ID7406ID7407ID7408ID7409ID7410ID7411ID7412ID7413ID7414ID7415ID7416ID7417ID7418ID7419ID7420ID7421ID7422ID7423ID7424ID7425ID7426ID7427ID7428ID7429ID7430ID7431ID7432ID7433ID7434ID7435ID7436ID7437ID7438ID7439ID7440ID7441ID7442ID7443ID7444
02009-07-1411.2036.7446.35524.18350.05717.76512.05630.39927.21751.86538.25949.38795.92916.88934.08515.6705.00216.19350.53723.97730.64219.10218.39268.64911.8012.83835.9906.592239.1453.60318.8713.95613.58841.49949.9747.6198.63411.457278.58432.99139.80211.02346.68912.17551.94811.42223.25523.48028.9416.79636.80933.03515.20614.3398.22549.221139.75134.40513.35834.3384.91919.16021.25725.37333.9994.265NaN21.09530.8031.794NaN10.67522.43443.31612.1155.12520.80615.85810.59137.90415.44811.24617.22518.3919.08620.7554.04613.7178.13925.61885.45941.0146.3952.42730.3359.71926.89940.10410.5549.93522.9862.45130.398354.344126.10044.42815.2564.36250.72470.31613.77720.5574.1261.928NaN11.24522.29842.33420.35733.6774.0868.32617.27112.6455.796NaN10.4933.66424.830NaN34.35318.76821.41917.53627.31648.37911.14572.52949.30629.96522.18937.462101.49831.84424.98727.255167.82310.59921.22010.74722.090NaN18.1245.01321.66839.49911.70933.60921.6766.6059.2861.12223.21329.76331.56313.08618.49013.49027.01754.51920.37246.35886.86622.41517.24762.40430.225439.62411.84110.68024.59556.0524.76827.0832.84824.60952.842NaN24.95846.47446.90114.85310.6189.58115.38714.51729.7986.99717.05713.48925.28416.06027.99551.1874.55517.26811.35214.0620.38423.9388.53029.48216.77914.6947.45526.93519.08042.24413.46234.25111.94521.52418.14314.25711.09219.16420.45180.164100.41622.93117.93017.77116.99433.23517.05722.65915.344197.91433.51016.25527.6987.909202.93715.54115.11325.67145.79315.16516.375...57.5816.0865.72418.39117.25136.247248.99829.7378.39711.80542.32230.46416.18211.0314.38489.48779.471NaN61.15610.56727.21613.15621.73043.8950.64316.23143.56126.35110.69416.50011.49019.6505.35035.50419.24312.96351.40141.11310.7623.90612.306168.33542.059106.8381.55912.25825.861185.74620.20434.1455.22418.01333.36420.6718.9759.60625.3342.9089.72219.62221.05224.61613.64179.23819.317151.1424.9753.58315.85825.935NaN16.49646.82124.69026.64115.14520.70125.30818.71034.72536.32014.746541.85326.74811.83415.75456.912313.98124.46928.80230.4449.9504.19334.73025.54730.85227.5972.98550.90624.73735.79528.72617.5457.13229.8867.35949.336101.27513.936225.3101.99028.63447.02052.83413.38337.9867.6799.69515.81529.94118.2488.231NaN12.8738.70919.7263.09143.14524.29937.69416.95412.29421.39525.73635.8224.69927.5652.72120.54216.16929.87920.39110.930171.300204.7966.73815.55620.93012.9297.79727.9195.6517.059126.39216.70529.1958.68719.67936.09127.08315.01925.26744.880NaN12.969NaN13.68520.54622.6936.75828.08815.88210.92023.375164.46813.9085.97313.66827.92414.60832.17715.52235.02329.89715.28039.47711.33626.58218.9116.73523.11812.87611.28313.36626.85523.0857.64118.64228.12823.209157.6926.16357.60220.93714.4198.7159.12233.81844.990NaN24.83914.09438.67616.38331.40531.62819.18836.08620.16716.6139.17227.46718.35640.054470.05525.48719.58915.46322.47520.36713.670NaN36.92313.09120.86030.3664.2749.21414.31918.35715.64336.50015.3460.732138.13041.81314.49136.8135.11252.940
12009-07-158.4036.9498.97226.65948.81319.80117.16923.97631.99642.74041.70657.969100.72818.38332.54419.1934.21125.66552.62124.01015.7887.11417.28738.9229.0262.83340.6275.640232.7505.36014.6383.99423.51335.01445.59914.7288.4339.968295.30833.56040.5104.99931.47012.76843.5604.22920.73027.44021.8478.47734.94136.02811.25616.5497.23821.788151.95632.75316.25815.9154.03015.20613.76522.27523.6524.248NaN14.45138.7261.796NaN9.95321.31049.7377.8701.95317.83111.1568.20043.30518.49314.12520.07819.5558.44023.5774.09128.9158.26620.91990.85251.3697.7432.07926.9507.79933.05740.51914.62910.80913.7402.42827.410330.995122.28426.57715.60021.51051.34045.13618.95418.0344.2863.195NaN12.98631.39534.20224.83043.0818.58811.98821.02712.0675.798NaN10.1283.65722.707NaN22.90818.01818.21113.04628.75545.09510.51965.96238.02236.37937.30238.23690.6834.23640.68327.443173.81210.53418.0238.68724.065NaN30.14211.12031.02733.6358.22326.13515.3706.90117.6421.12819.15432.49329.30712.45121.10622.28323.08130.65012.31532.01290.60622.15416.56372.19729.139464.1217.0297.47015.98151.9424.74535.2292.80612.84244.534NaN12.82859.48234.94811.1978.9749.40220.82323.23434.0337.02333.52314.79834.85525.62330.33636.2895.12932.27212.63616.7460.38417.65512.16125.50322.1857.1538.06227.28918.91751.38912.68619.1357.09615.1416.45911.09810.87218.23034.71788.234106.37627.33519.75627.45520.85230.89617.84713.3479.039203.69034.03717.52331.36210.616175.93923.11614.30129.82838.24410.35618.673...67.1697.1825.9199.37321.15934.967258.69022.7588.47622.49438.91436.49111.33014.0884.03952.48354.393NaN64.16414.78134.13216.77821.26949.3970.63020.66243.64120.17815.43610.0059.12024.0215.16840.94818.72912.63339.04858.4888.1773.69512.90197.14822.42195.6271.58913.14020.562160.62834.95228.7633.76912.10720.68818.2508.25423.48838.3652.05012.63219.42114.56640.19222.34861.02926.571129.42111.0479.64819.61618.483NaN19.44640.11012.24132.11612.58321.74230.17012.96939.22241.94520.775548.21123.13615.37417.34947.207278.89838.61718.03335.51513.9694.16835.80625.36326.37035.5163.38647.65212.86226.89422.25036.0737.05029.3436.39047.87097.75313.597223.3381.97636.65918.30536.48126.72939.3419.93014.06015.69126.55122.42710.389NaN14.3616.30815.4671.99140.16822.39844.14817.78411.70818.36826.16612.7954.35611.0702.23830.63116.67432.49920.97312.496135.493196.22512.14217.23117.4808.70510.43833.3126.7668.049126.87516.26833.4308.85221.33347.61617.21316.27119.45149.738NaN21.919NaN10.37417.34028.71026.06136.9276.6528.73616.200144.6027.1348.98313.47822.94216.88543.60114.74539.9717.61314.40045.93712.08524.27223.6726.75529.4556.35520.62214.98336.98217.4465.68038.75114.42316.103121.9227.48571.90816.17713.12414.70515.56343.53627.921NaN23.52717.20735.22617.49120.71926.26815.76928.6418.01318.1238.77718.25118.96859.596473.21927.69116.57512.37926.87016.42810.447NaN54.7729.41315.65726.5074.26320.57317.81925.50914.66729.44326.1560.685115.89331.57212.59740.49218.23335.582
22009-07-167.2257.2558.79432.01732.55515.21616.26034.53424.36356.39052.19852.062101.9406.77131.32416.6834.00613.85742.53624.62829.17523.16914.89973.19314.8592.89647.0168.619228.4435.26510.8014.07324.91735.13935.1809.1509.7639.611291.70430.06730.4097.56130.27020.08448.1841.97912.92618.38820.41211.33232.47733.72819.82914.2727.92819.304149.59611.19517.99223.65011.29014.88816.42619.58518.8334.291NaN20.27829.1061.799NaN10.78318.60138.4846.2306.67020.69114.5028.75840.68920.40018.03818.15516.2048.57516.4264.03314.5197.99821.78878.08537.0377.8082.10941.1426.85323.60530.9365.8306.78819.1902.47224.673296.074124.13037.88536.22216.94036.22444.01714.53822.4224.14312.104NaN13.54928.99548.00016.05941.7261.55612.61626.53013.4524.514NaN13.6363.73228.136NaN35.92620.43821.21923.59727.84050.96115.45657.34244.27828.57525.80738.39890.3684.27416.72927.428154.6668.50916.74212.44319.854NaN24.52813.70845.75145.42211.06920.22413.9187.6948.8802.11814.93325.23024.8849.32015.56020.99629.26231.67211.80934.79786.64116.88717.07555.25633.757418.4508.2129.48827.67860.6604.89527.3972.76012.32249.913NaN23.55023.42632.44014.4087.41210.55728.95319.4419.7345.83321.31414.48835.68020.69423.59848.7176.79127.22812.54214.0270.38623.42239.28522.94419.50316.8808.89023.48718.79845.88711.50416.19910.05011.8565.11714.43321.59416.29132.41978.488103.62831.06013.80720.41116.33422.33319.13612.68511.517153.78133.24018.30921.28411.167184.44520.05614.67128.24441.82519.06021.593...54.4928.6535.5558.77621.83141.974267.13425.4078.90921.76648.90133.62017.8256.2274.34758.15388.276NaN65.18815.02230.12221.29415.45350.9890.63715.39656.63124.9375.59813.5069.69327.6535.22532.99518.56125.26235.55942.12912.7873.76914.22386.80033.35099.1421.59017.27435.314128.20625.26636.2213.74617.34319.34523.6417.41720.18531.0772.8658.00120.74929.45416.26313.03659.32920.852162.0404.8579.04717.09430.273NaN18.51223.91820.95621.69218.17325.12030.2464.14143.95951.45119.897557.95215.41214.26015.42243.931267.34734.49026.86635.48611.5644.63844.88724.57115.25638.73310.96826.45726.20237.77521.81225.4467.06036.3804.09665.543106.43918.619219.8751.94135.50923.86634.68918.77130.6998.60515.17417.83718.61717.4808.739NaN14.9676.02026.9203.61038.70830.79347.99519.65513.31511.31019.79133.6224.73625.2083.85423.76212.97535.62828.67411.038117.340193.04411.84619.38924.07412.0949.93431.82612.61017.249128.40618.97418.1778.64925.34142.99524.78114.12923.18247.893NaN19.408NaN9.1109.22730.6509.78332.41215.3654.58721.120158.1669.3208.97515.62917.35823.42643.63716.32343.6298.02218.31230.54511.68640.07627.1435.42124.40921.02716.36219.26330.67717.8326.67425.47023.58118.642120.7945.89259.9718.8388.23120.94714.41056.28538.531NaN12.79612.34631.71318.7718.74834.58718.92038.8717.33625.9979.36923.87418.54435.521478.95232.80319.35119.56616.26117.19411.487NaN14.8318.36819.13938.4814.23615.11817.56221.36018.03728.78623.9450.707127.69832.61815.81641.4876.92529.307
32009-07-1711.3387.1908.30633.03246.72723.41814.81319.25123.12243.60447.17163.930103.6715.97736.05317.9223.80115.03944.32240.11229.8446.16417.03457.14812.9847.66039.7787.374210.0624.8855.1864.05819.44129.92724.1339.8959.29914.908293.52432.05331.14313.76426.35619.32363.2226.12924.25916.19016.3339.87835.54131.12017.04111.4677.36818.832160.30621.26914.25035.4665.19013.30319.09426.31119.5394.289NaN15.20328.4451.839NaN11.56126.37942.22614.9358.80214.5659.94410.28336.06219.21220.31720.65016.3397.32323.7644.19425.7818.10922.99678.15740.5537.7652.44035.6119.6584.23322.96014.47710.19820.1092.44122.846351.355129.15935.33720.79418.38027.62259.35817.23516.2574.00312.304NaN9.35129.85825.74322.97835.2852.12912.87517.80711.8884.461NaN13.2023.64029.452NaN19.79319.59026.12810.82724.33244.06110.33263.54341.38634.78828.93528.10087.5044.40615.34728.645157.1338.95914.32914.48323.351NaN20.14118.44730.28235.99411.38025.24812.7968.6199.9874.23224.00039.77225.22413.01828.80016.73920.96917.18111.96062.37484.68020.17014.94736.90630.017436.44114.3757.48221.55353.1844.77324.9092.75624.79046.512NaN22.07435.76128.0469.14511.61412.79017.18925.4833.7718.37725.97813.10822.68529.98642.17567.90514.28920.15912.3177.9260.38622.82415.40816.90423.13612.4068.26724.06623.41537.47613.15520.3289.83315.3365.42112.64310.45214.60718.85563.997102.26236.26625.74018.61619.96635.69819.11714.75411.613123.52040.40926.14117.9976.279173.34022.76418.00226.89851.72812.36919.126...69.06910.0115.8948.49322.74423.975246.88331.2227.78011.91630.44633.24822.90412.4594.37557.44597.883NaN58.82916.75932.24621.78420.63536.2030.6506.26257.89022.5948.97013.7087.00729.9195.25831.86321.43432.94539.09535.1076.7016.71814.858101.16821.38176.1541.58014.35423.277165.86833.06932.2593.87416.44118.70311.5905.83730.10028.2723.57816.80921.34128.66219.37924.97375.87625.626171.97012.2859.76422.56418.613NaN15.45710.34515.35123.0368.32722.53830.7917.34425.95138.73417.399502.82115.46912.63417.01438.155251.90729.87424.29234.1139.1294.16737.08028.63520.17829.35216.76531.14522.25632.13119.35324.5687.10133.5801.26548.93091.86712.097232.4701.94730.16020.90230.40914.34831.2137.46813.60019.46830.49322.51911.627NaN19.5426.07626.0497.17749.87326.17949.75818.69011.65030.91227.03631.2924.41026.9352.09228.09224.55728.49121.8036.885122.633209.0805.94519.79617.03810.4316.45532.36713.98914.425124.62616.69922.29910.56920.56937.20826.43018.60125.33448.160NaN29.561NaN11.14612.02529.0328.97147.5446.2494.28417.424155.9378.5299.52916.82621.75024.25438.02114.56622.4127.65926.08238.55612.9575.48928.6207.34824.51010.20419.66218.87638.34920.12414.72220.90925.78024.053131.5056.20987.70332.78910.41917.50419.62443.94743.773NaN16.58316.15734.36717.19813.55828.44617.94827.4587.34527.5338.14122.41613.55313.235572.60931.07814.38213.9787.71626.9056.623NaN14.39610.79419.85128.9094.20030.42715.06317.57611.51231.39423.1180.655142.21136.61413.16243.9865.37040.986
42009-07-1811.3066.80510.11931.23835.21529.39212.32521.39225.72141.58141.22646.862107.1397.85037.84515.9763.69915.83237.23143.96428.04510.65023.12526.25018.4738.45223.0986.716117.6815.55316.43515.60333.14034.79320.7105.30912.6409.869290.71039.69537.47810.05133.28218.73769.1137.88616.69821.59420.4498.88629.63739.51410.63415.68315.86328.43036.73649.07519.54444.1614.12612.81216.53717.38021.9814.262NaN18.88637.1971.898NaN12.78424.40457.9096.9669.10928.65712.8179.99447.59219.60420.86912.83320.1067.06629.8654.12527.1287.88322.45480.92236.0995.8942.71527.47010.09218.52742.2439.95512.38712.5672.46427.637267.968120.19572.75425.89922.02036.84470.28016.43916.7702.70821.373NaN5.77024.82240.04835.07676.4505.11111.01710.30712.4424.858NaN36.6793.62533.813NaN40.33133.45019.13516.96231.99946.67513.39914.86247.32939.05722.89247.59910.14637.91718.38627.80515.54612.96217.3059.65913.844NaN15.85725.26223.87851.4365.50336.66514.37714.00314.4683.25729.62336.64631.51620.21024.12527.60329.70637.11819.33357.67060.55522.11817.49866.49930.265411.91810.9158.51434.19639.9274.61334.7412.77814.85946.489NaN25.18953.93023.78016.9547.53633.67124.89419.37216.2297.67722.34813.56827.62231.63328.95341.66310.52313.94423.78710.5420.38619.2446.45313.19127.26311.2238.49216.79038.45925.04913.78711.80211.2977.5575.18111.4888.57916.19221.29598.05673.86355.79717.75830.36720.82531.17018.58314.22810.53316.83760.48431.66718.5698.08133.64420.21116.47436.71747.21724.27621.738...45.25411.51810.68811.81514.78634.421247.56918.57712.39431.14944.10532.95016.81026.1124.28410.93993.067NaN20.56512.98949.83114.83218.02457.8490.63414.66844.88818.83211.00014.5536.49127.0245.21327.79123.05640.14245.73432.7709.37014.62219.29093.82332.92523.2731.57120.50131.79245.20529.80810.47413.15722.79620.8346.9026.87624.87443.3582.05812.14330.29822.34041.56013.11031.10120.577126.2119.7287.7789.52824.551NaN20.64610.17419.94238.46216.23942.09434.7148.44928.79149.26214.066515.42521.3299.47323.72747.367113.76131.03833.87538.64120.3504.20229.83324.7277.76722.0334.05858.7204.59438.03646.27825.3377.38419.4541.28965.09544.11717.091157.8907.76234.88816.43655.3148.72934.80718.40215.76918.71240.24045.74919.272NaN24.6557.45718.6288.86842.60239.97544.89118.08511.46817.59025.05855.2354.66738.7641.71918.37720.34233.30914.8129.47890.62954.03415.29026.48415.98410.0206.05227.1519.6166.211149.56816.58130.27916.86527.94734.43123.59916.18639.26119.450NaN11.049NaN10.53013.32219.31819.57438.14012.3003.48219.679160.71310.6127.56416.70729.50216.27540.16315.66325.8409.32521.29231.73210.8855.56523.79713.02421.24032.42812.87818.52336.36120.5216.31625.63320.69721.77784.0536.38233.70521.91114.40415.99121.96440.43931.808NaN13.53320.36537.90214.29011.36721.96320.46129.2957.36230.2528.23223.91320.41543.819535.44842.82919.8229.63416.03328.2015.505NaN14.6769.92817.21332.0164.20020.71220.23025.96230.76722.11215.5820.6824.64127.98213.30141.0186.75140.270

数据补全或者删除

data.isnull().sum().sum()

163262

data = data.fillna(data.mean())

数据集包含163.262个缺失数据,因为有新客户;有些仪表id在1000-7444之间没有观察到。

1. 分段

处理分段特征

data.date = pd.to_datetime(data.date) data['day'] = data['date'].apply(lambda x:x.weekday()) x_call = data.columns[1:-1]

data_fix = pd.DataFrame({'Meter ID':range(1000,7445,1),'total KW':np.sum(data[x_call]).values}) data_fix['average per day']=data[x_call].mean().values data_fix['% Monday']=data[data['day']==0][x_call].sum().values/data_fix['total KW']*100 data_fix['% Tuesday']=data[data['day']==1][x_call].sum().values/data_fix['total KW']*100 data_fix['% Wednesday']=data[data['day']==2][x_call].sum().values/data_fix['total KW']*100 data_fix['% Thursday']=data[data['day']==3][x_call].sum().values/data_fix['total KW']*100 data_fix['% Friday']=data[data['day']==4][x_call].sum().values/data_fix['total KW']*100 data_fix['% Saturday']=data[data['day']==5][x_call].sum().values/data_fix['total KW']*100 data_fix['% Sunday']=data[data['day']==6][x_call].sum().values/data_fix['total KW']*100 data_fix['% weekday']=data[(data['day']!=5)&(data['day']!=6)][x_call].sum().values/data_fix['total KW']*100 data_fix['% weekend']=data[(data['day']==5)|(data['day']==6)][x_call].sum().values/data_fix['total KW']*100

data_fix=data_fix.fillna(0)
data_fix.head()

Meter IDtotal KWaverage per day% Monday% Tuesday% Wednesday% Thursday% Friday% Saturday% Sunday% weekday% weekend
010005515.67510.29043813.81896114.64939514.79258712.84894413.90003915.45549714.53457670.00992629.990074
110015090.3759.49696814.12609114.36183014.28996914.61159514.85163713.89190413.86697472.24112227.758878
210025352.8309.98662315.71458514.48615016.01582715.18340813.96496812.88630511.74875775.36493824.635062
3100316305.58130.42086014.54505114.04845414.21650714.36318014.00015114.63071414.19594371.17334228.826658
4100425326.44247.25082514.63079614.17704114.40030513.67407612.89305514.67156315.55316469.77527330.224727

建立11个变量来检测每个消费者的消费行为:

  • 观察期内总用电量(总KW);
  • 每日平均用电量(每日平均值);
  • 周一总消费的百分比(% Monday);
  • 周二总消费的百分比(% Tuesday);
  • 周三总消费的百分比(% Wednesday);
  • 周四总消费的百分比(% Thursday);
  • 周五总消费的百分比(% Friday);
  • 周六总消费的百分比(% Saturday);
  • 周日总消费的百分比(% Sunday);
  • 平日总消耗量的百分比(%);和
  • 周末总消费的百分比(% Weekend)。

归一化

from sklearn.preprocessing import StandardScaler
x_calls = data_fix.columns[1:]
scaller = StandardScaler()
matrix = pd.DataFrame(scaller.fit_transform(data_fix[x_calls]),columns=x_calls)
matrix['Meter ID'] = data_fix['Meter ID']
print(matrix.head())
   total KW  average per day  % Monday  % Tuesday  % Wednesday  % Thursday  \
0 -0.462901        -0.462901 -0.248425   0.150228     0.333406   -0.956438   
1 -0.477627        -0.477627 -0.026894  -0.042109    -0.010012    0.279580   
2 -0.468539        -0.468539  1.118883   0.041043     1.169195    0.680550   
3 -0.089300        -0.089300  0.275301  -0.251709    -0.060206    0.105385   
4  0.223048         0.223048  0.337149  -0.165704     0.065376   -0.377833   

   % Friday  % Saturday  % Sunday  % weekday  % weekend  Meter ID  
0 -0.251452    0.500570  0.118593  -0.253682   0.324755      1000  
1  0.452542   -0.181690 -0.111800   0.170082  -0.161274      1001  
2 -0.203417   -0.620474 -0.842809   0.763378  -0.841745      1002  
3 -0.177389    0.140683  0.001729  -0.032718   0.071324      1003  
4 -0.996421    0.158507  0.470114  -0.298249   0.375870      1004  

保留异常值,这样大公司或太小的房屋的客户就不会被淘汰。

相关性

corr = matrix[x_calls].corr()
fig, ax = plt.subplots(figsize=(8, 6))
cax=ax.matshow(corr,vmin=-1,vmax=1)
ax.matshow(corr)
plt.xticks(range(len(corr.columns)), corr.columns)
plt.yticks(range(len(corr.columns)), corr.columns)
plt.xticks(rotation=90)
plt.colorbar(cax)

簇个数

def plot_BIC(matrix,x_calls,K): from sklearn import mixture BIC=[] for k in K: model=mixture.GaussianMixture(n_components=k,init_params='kmeans') model.fit(matrix[x_calls]) BIC.append(model.bic(matrix[x_calls])) fig, ax = plt.subplots(figsize=(8, 6)) plt.plot(K,BIC,'-cx') plt.ylabel("BIC score") plt.xlabel("k") plt.title("BIC scoring for K-means cell's behaviour") return(BIC)

In [14]:

K = range(2,31)
BIC = plot_BIC(matrix,x_calls,K)

通过贝叶斯信息准则(BIC),将客户划分为5类。

Clustering

from sklearn.cluster import KMeans from sklearn.decomposition import PCA from mpl_toolkits.mplot3d import Axes3D cluster = KMeans(n_clusters=5,random_state=217) matrix['cluster'] = cluster.fit_predict(matrix[x_calls]) print(matrix.cluster.value_counts())

1    3747
3    2208
4     385
0      95
2      10
Name: cluster, dtype: int64

d=pd.DataFrame(matrix.cluster.value_counts()) fig, ax = plt.subplots(figsize=(8, 6)) plt.bar(d.index,d['cluster'],align='center',alpha=0.5) plt.xlabel('Cluster') plt.ylabel('number of data') plt.title('Cluster of Data')

Text(0.5,1,'Cluster of Data')

from sklearn.metrics.pairwise import euclidean_distances distance = euclidean_distances(cluster.cluster_centers_, cluster.cluster_centers_) print(distance)

[[ 0.          9.35063261 30.46869987  9.84791523  9.82435185]
 [ 9.35063261  0.         28.24977455  2.02708658  6.67472145]
 [30.46869987 28.24977455  0.         27.44631089 31.68601779]
 [ 9.84791523  2.02708658 27.44631089  0.          8.67886855]
 [ 9.82435185  6.67472145 31.68601779  8.67886855  0.        ]]

The first segment (Cluster 0) contains 95 costumers, the second (Cluster 1) 3747 costumers, the third (Cluster 2) 10 costumers, the fourth (Cluster 3) 2208 costumers, and the fifth (Cluster 4) 385 costumers.

可视化分段

# Reduction dimention of the data using PCA
pca = PCA(n_components=3)
matrix['x'] = pca.fit_transform(matrix[x_calls])[:,0]
matrix['y'] = pca.fit_transform(matrix[x_calls])[:,1]
matrix['z'] = pca.fit_transform(matrix[x_calls])[:,2]

# Getting the center of each cluster for plotting
cluster_centers = pca.transform(cluster.cluster_centers_)
cluster_centers = pd.DataFrame(cluster_centers, columns=['x', 'y', 'z'])
cluster_centers['cluster'] = range(0, len(cluster_centers))
print(cluster_centers)
           x         y         z  cluster
0   3.091673  8.622535 -0.845156        0
1   0.264480 -0.242741 -0.019198        1
2 -14.255594  2.118913 -9.273445        2
3  -1.722686  0.087896  0.107413        3
4   6.897584 -0.321715  0.021218        4

# Plotting for 2-dimention
fig, ax = plt.subplots(figsize=(8, 6))
scatter=ax.scatter(matrix['x'],matrix['y'],c=matrix['cluster'],s=21,cmap=plt.cm.Set1_r)
ax.scatter(cluster_centers['x'],cluster_centers['y'],s=70,c='blue',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(scatter)
plt.title('Data Segmentation')

Text(0.5,1,'Data Segmentation')

# Plotting for 3-Dimention
fig, ax = plt.subplots(figsize=(8, 6))
ax=fig.add_subplot(111, projection='3d')
scatter=ax.scatter(matrix['x'],matrix['y'],matrix['z'],c=matrix['cluster'],s=21,cmap=plt.cm.Set1_r)
ax.scatter(cluster_centers['x'],cluster_centers['y'],cluster_centers['z'],s=70,c='red',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.colorbar(scatter)
plt.title('Data Segmentation')

Text(0.5,0.92,'Data Segmentation')

By the plots above, we can see that all segments are separated well from each other. It means that BIC method works good for this project.

标签:matrix,plt,分段,fix,cluster,序列,ax,data,预测
From: https://blog.csdn.net/workflower/article/details/143248077

相关文章

  • ChatGPT在解读历史股票收益预测中的外推和错误校准
    论文地址:https://arxiv.org/pdf/2408.16707原本地址:https://mp.weixin.qq.com/s/gL8ZTnpS0xJy7Qc73QEaGw摘要本文研究了大型语言模型(LLM)如何解释历史股票回报,并将其预测与众包股票排名平台的估计进行了比较。虽然股票回报表现出短期逆转,但LLM的预测过于外推,对近期表现的......
  • 使用Python实现深度学习模型:智能天气预测与气候分析
    在现代科技的推动下,天气预测和气候分析变得越来越智能化和精准。本文将介绍如何使用Python和深度学习技术构建一个智能天气预测与气候分析模型,帮助我们更好地理解和预测天气变化。本文将从数据准备、模型构建、训练与评估等方面进行详细讲解。一、数据准备天气预测模型需......
  • 机器学习实战:想精准预测石油日产气量?ARIMA模型是你不可错过的“神助攻”!
    在当今能源领域,石油日产气量犹如一颗跳动的心脏,牵动着全球经济、能源战略以及无数相关产业的神经。准确地预测石油日产气量,就如同掌握了一把开启能源未来大门的神秘钥匙。而今天,我们要深入探讨的是一种强大的预测工具——ARIMA模型,它在石油日产气量预测方面正展现出令人惊叹的......
  • 动态规划求最大子序列的乘积(含负数)
    整个过程是遍历数组,时间复杂度为O(n)设f(n)为[0,n]区间内以n结尾的最大乘积g(n)表示[0,n]区间内以n结尾的最小乘积为什么设定g(n):因为当这个最小乘积为负数时,遍历到的当前数也是一个负数,相乘后会得到一个较大的数。我们得考虑这个数是否为最大状态转移方程为:f(n)=max(f(n......
  • 数据结构与算法——Java实现 46. 从前序与中序遍历序列构造二叉树
    努力的意义大概就是当好运来临的时候你觉得你值得                                                ——24.10.24105.从前序与中序遍历序列构造二叉树给定两个整数数组 preorder 和 inorder ,其中 preorder 是......
  • 数据挖掘与机器学习入门-以房价预测为例
    数据挖掘与机器学习入门-以房价预测为例背景此时我们有两份CSV文件houseprice_train.csv:包含训练数据和房价数据houseprice_test.csv:只包含测试数据不包括房价,将测试集真正房价对开发者不可见用于打分数据处理导入两份csv文件:train=pd.read_csv('houseprice_train.cs......
  • 如何使用数据分析预测股市趋势
    文章开头:使用数据分析预测股市趋势主要涉及以下几个步骤:一、收集质量可靠的数据;二、采用适当的数据分析方法;三、运用预测模型进行预测;四、实时调整和验证预测模型;五、做出投资决策。其中,收集质量可靠的数据是预测股市趋势的基石,包括但不限于股票交易量、股票价格、宏观经济数据、......
  • 基于MATLAB的混沌序列图像加密程序
    设计目的图像信息生动形象,它已成为人类表达信息的重要手段之一,网络上的图像数据很多是要求发送方和接受都要进行加密通信,信息的安全与保密显得尤为重要,因此我想运用异或运算将数据进行隐藏,连续使用同一数据对图像数据两次异或运算图像的数据不发生改变,利用这一特性对图像信息......
  • 大数据毕业设计:就业信息分析 招聘数据分析预测系统+爬虫+可视化 +django框架+vue框架
    博主介绍:✌全网粉丝10W+,前互联网大厂软件研发、集结硕博英豪成立工作室。专注于计算机相关专业毕业设计项目实战6年之久,选择我们就是选择放心、选择安心毕业✌......
  • 代码随想录算法训练营day26|455.分发饼干 376. 摆动序列 53. 最大子序和
    学习资料:https://programmercarl.com/贪心算法理论基础.html#算法公开课贪心算法Part1求局部最优解,最终达到全局最优455.分发饼干(大胃口吃大饼干)点击查看代码classSolution(object):deffindContentChildren(self,g,s):""":typeg:List[int]......