加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

SAS EM数据挖掘--客户信用预测模型(回归和决策树)

(2010-09-02 14:21:35)
标签:

教育

分类: 数据挖掘

SAS EM数据挖掘-----预测模型

1问题定义

目标:建立模型预测贷款申请的信用状态,选择最优的模型来预测和减少损失。

数据集:SAMPSIO.DMAGECR

数据集大小:1000

变量数目:21(20个输入变量,1个目标变量)

变量描述

https://pfztpq.bay.livefilestore.com/y1mhuHcFwOaTZa16rm7wDI9GJJ1yM-SYEtMV5ZWcvPamTR2tfdc10Aqwv29AdyEGRUGXXs_VILK92YonwPB8n8-iUEVnF9WX7MwTKZf5zmNtEt-Qkv2MUSumEDxwCnTWxCgrV3F2j3x4hTsPJ_zu_nEhw/clip_image002_thumb%2059DC01DA.jpg

训练数据集:60%

验证数据集:40%

定制损失矩阵

https://pfztpq.bay.livefilestore.com/y1mlXRcjbQkT_0UfXjFLc4N92vnFTvMHvG8Vgn8iiq5LUG4ZtQVJtZ0X5iwSvQcDADXbMs8Nkw37iEVrfTFlAdvnlKlmWGB9Kbcp4pTgDmLOI6r8ZHuTi0IkbpiY6pakbtV1_GXlRib72WhP-dYn_-0aQ/clip_image004_thumb%207F719109.jpg

建立的模型是用来决定将接受哪些客户。因此,我们应该关注Accept。为了更加容易地解释模型的结果,可以将上述决策矩阵转换为如下,

实际损失矩阵

https://pfztpq.bay.livefilestore.com/y1mjv7pI2I6FQl3-2BaemUNIluVzh5eGOscPV85L_ZLJVbbVwAeaFfouQkmHgWSaLxvqhhtWHXxLU9yEEyClQc2dJnkqB_vbDyjrqKM34G-SD9abKoFYkUn7RRSoA0JRD6ebEKXa1wmYwq0pNW4shyE2A/clip_image006_thumb%20086D3517.jpg

该损失矩阵将产生和第一个损失矩阵相似的决策,但是第二个矩阵产生的统计描述更加容易理解。

先验概率

在训练数据集中,

https://pfztpq.bay.livefilestore.com/y1m7f2Qii6gZoShW6YCoyeabVMx53ybAuDWZDE7cfPVo7bakWuZoT7UoxW0IancKjPIsOqd_8zbnoJaSbv7jwCxM5TLe-WfOfkixEK1byaEb684tva3HvTcaxIrz44Wy3kxycqP7okZdHDCCE25MPA9cA/clip_image008_thumb%2047D6FA75.jpg

假设实际分布

https://pfztpq.bay.livefilestore.com/y1mgn4qFmgubfmlAyS1-f0fBjZTVmb1eMjlV_CyKTVvr4hkFvhim24JXD1rrb2SbCp-q1uU53JKaMtD3F2slZrZUDfqG6pLkiim33qoI7vQ4Bsd04rQDZ2ov-vb3zwOSKEeQjCKnpgeVp_ZxylWNSTwOw/clip_image010_thumb%203F1303F7.jpg

新的申请者

未预测数据:SAMPSIO.DMAGESCR

数据集大小:75(没有GOOD_BAD变量)

2创建数据挖掘工程

https://pfztpq.bay.livefilestore.com/y1mIRECpd9rqpxZhe9t2MRl5U-kBAJ5DJFs51fV7d-w3SUhiBHa1FtFgg6-Lha7qMbx8xetLnyesrxvc4VYkD-1gdd4G4cOW_Unmz5Sruw_rOmRhYpW1o0ruIV0nolpmo7XPMzGrh_J9-ZlHN75xk_zgA/clip_image012_thumb%20076C5B95.jpg

3定义输入数据集

https://pfztpq.bay.livefilestore.com/y1m8HNcKsuGNi-iiCsaLVTgsBvEboRjqaBr83eP0gXDJstRRbmdXDZWc1NlsKHkS1IVCWQEOtB3HH1TP0QJuIwF24fj2rLHyMLXuEyfA2QKjMaQfmYQoXZnZOoYPMsF6r95SynWg0XyH78vPnc8i2y37g/clip_image016_thumb%205F61AA44.jpg

4设置目标变量

选择变量选项卡,右击GOOD_BAD的模型角色,设置变量角色为target。

5为GOOD_BAD变量定义目标Profile

目标资料主要定义三种信息

(1) 分类变量的目标级别

(2) 决策矩阵

(3) 先验概率

在变量选项卡中右击GOOD_BAD目标变量,编辑目标信息

https://pfztpq.bay.livefilestore.com/y1mh6Hu6ervXu09P-cbJRQXuvOwVeUFYhYqFGAjst3sL14aXAO0a8hBC3FTg39HG3JESJ0GvIIDTGyr16U9PQCcI4MSp2i3LA4nDxhW8lBhYVI41-WUuO4gaoqMI6Za-JiiLodxy8EMc4xdpQYZP1hY_Q/clip_image018_thumb%2049C35DA8.jpg

设置目标事件级别

https://pfztpq.bay.livefilestore.com/y1m34wrfaBC974uoefWutpKK1oiBFE_55kLGAMvfIPWKVyTL2vN8Ye5HXYx_Cv_r6baaVHyZGF1jatC-yzieCVJx-kCAt9kjsQatgR9p-JzWh8icEtmBT54kyAKxBJPQpMhObWtR7X2rxo53e5-aO6Bfw/clip_image020_thumb%200BD5DEB8.jpg

为GOOD_BAD变量定义决策矩阵

选择评估信息选项卡,显示四个预定义的矩阵

建立新的损失矩阵

https://pfztpq.bay.livefilestore.com/y1maFAhj96PUvJTz_Xltaj9IA3R1le1jlJH_aZpkJkYQL1RoRg6hQqrLEqEGMkCbhPRbwfK_9Hsw-49YnlmDE6mFvrm5cH5_K22QYn3EBiG06SBC4veuIvp4Hb_vikvtLaJRJvVdoqedSq7lrQHs4KI6A/clip_image022_thumb%20081425E9.jpg

右击default loss,选择copy

https://pfztpq.bay.livefilestore.com/y1mmKuSIGFCW2bIl9pUMxMCYxLPyXfvkG86j_gPTl41oLC8oXhIEuPGhxZGrLmtoZ9kiEK4QcQE7REHkycmPcF6jeOwHSL_pXHuj-1CmpzKhECskUqYGUCvmSbZC414zk65zOynho-nP3ZHIBYxcH6MTA/clip_image024_thumb%204C13F601.jpg

点击edit decision编辑决策矩阵

https://pfztpq.bay.livefilestore.com/y1muIhGMnCPKoFpu1E7eOmk6wMRUV2rJVg_C5fGhViFoh-UxDHBf5oICArTeNPo8edWOLT_1ADCQKR7eFHcdfzYAOzfZN-C7Ohq5CyNCyiNRKsTJBUTmPOu7Rqtm_PykhsZCZ7tdOs7MQYa98YUTaMhOw/clip_image026_thumb%2068D97CE4.jpg

定义损失矩阵

https://pfztpq.bay.livefilestore.com/y1miJ4e4oowHErDgwDJic30AQHkTMk9BqRtwnOEZmmNfuHQFeMtuvqSP5KFhPyGYT-lHNR9S6TgDkqPtXT46yG3X56G-TTmVBo9fB8_J7VdWpUqeYu7-oV9FtyGW1Vx-XooF2aQDya0sUCMSvkkQ349aA/clip_image028_thumb%201A90863B.jpg

定义正确的先验概率

https://pfztpq.bay.livefilestore.com/y1mUfe6XcTjr2d1mzDC_94RI3JgfgKA5RnbSbGJ33RsSTSSM87dJUiDMJpGxSFq8aafo-V9B6uCgTtPYYi_BBCo4v0Os-LDTkq30b09mPh43LjhjbKKw_61K6iVOpFi9H_5-Xn2imFOFLY1EroXB1yi7g/clip_image030_thumb%20598E18A4.jpg

添加先验向量

https://pfztpq.bay.livefilestore.com/y1mXnhVvl0fFK_s4T5YfGxDQsneBwyrS8v0szo60OoNC8MUaeu4EWz3D99yGo1xBMhKZenwvPu4VQ4ZQ6q4IG-x1NkihbOqHJgPj5ZGgQgMAl9HWauo8uSb3owp5VGxgF5ovI40qf7qJQHHfrcoT_Xwog/clip_image032_thumb%2038C641BE.jpg

6查看区间变量和类别变量的统计信息

可以发现没有缺失值,但是AMOUT变量倾斜度过高

https://pfztpq.bay.livefilestore.com/y1me-UaPEFmR9-OkZPddeSzEJ0BCsDEO9YYr_AVTVOTcs9Aohyg-R4niU_dIvTVgas12s2px1tTuJRExv06lkC6hJi0E_EgPOpczHr8N6GrqtcD3siS2liFQHGbd5SN2jTR0-L7T8v9TXWDC1PtRmGWiw/clip_image034_thumb%203E40C430.jpg

7创建训练数据集和验证数据集

https://pfztpq.bay.livefilestore.com/y1m5qsm8Nar266lGou0D3BmDudMQranhvqV4O0rNqiLGrLFlg2ELFP8Wzoa93bovq3UnNgvWS8LYllDxgiddGjlL_LZ5_QY8NsHz_9UZAC7YW9A0ASSaLjKJmLWOPBY-E9D2iDDUC3SkBXtygHk3xN6Yg/clip_image036_thumb%20163A26DF.jpg

https://pfztpq.bay.livefilestore.com/y1mbyMGTM6TM_yrfZAwso-kGJSeri-xzXVQqU_LH3CfVO5xkNESy7fmfeQ7ObZKscT1WqphmI31r8WNeLfbUhV07V5nI_Hbi9SDqz9AU6UGoYM3Y1NLHiqDnvrWqr3KPfgD8qLZd9H2KLjoxq7xp7bqQA/clip_image038_thumb%2042168C9C.jpg

https://pfztpq.bay.livefilestore.com/y1m5q-eMvEkz16G44vMeo5JpuppIDXdGcIN0K_E2tNcazcqRF5siK5KmtgqTFwf88BdDCAeFK1i-stuMjSUVJJsu7YXu0LHceWN_cDtAJQaXoc6UNEelVmmGmARstXZWA12DrA3gEQmwhWW9HNiMx4uIw/clip_image040_thumb%20317A7DAF.jpg

8创建变量变换

https://pfztpq.bay.livefilestore.com/y1mSTxiRinH-YvxGJVZhi-FvzlNCxIjTrEHplJUxWonpu3DzYGf0_xQBDNCL-jJwiGIALEORr8wHFmeY2hLyHPGb0IlkMhtdpvazNsAGHIICm3TwXgoXVksMQcnNiV18FzxJ2iJeFLqWwPgGE4D1EOmKA/clip_image042_thumb.jpg

创建新的输入变量,使其规范化。右击AMOUNT变量,选择transform,maximize normality。

https://pfztpq.bay.livefilestore.com/y1m0J_R0a18Go0xOQ9ojrLDomzdBcBCl449dzVO5Y_AEBmSFCBsUr1mTeG17dAdHmhVXonVdT1CW2QeSmfkzoRUmPQvMz-oChksXJxsqXVM39pBTP5pxiGFaA0Rz1_oRwgd4b3gwQ8f4Ffl8lJe8oZPZg/clip_image044_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mZPjOMvNwmHMpRiGJOYXsZ_UEQb5La6yBZjopRR8qBYUytPBZ-7B1IhVamzZAtl_hvGCf0YS5Nj-UQCdzac--W9iDiagAPYi7ojj31ubIYxAAj2kM6TGBpNk69glxfzTokCCiK635PrKsgYudXS2fkw/clip_image046_thumb.jpg

创建顺序分组变量

https://pfztpq.bay.livefilestore.com/y1mqBx1mLu64vtNUPBGPamPojZhP4tG7O2PhdVyU1FeV14SGNiRAwW-aBTPHWfa9lyAgW626bjDV1hgg9yPO6poalT-qOQTcizyNGq4A--ZzeGAJGRuHWfyf07wXRstDy5pKf9jgJ-BFfN_TjZeppCtBg/clip_image048_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mo_Sj0FcjwpPsK_Dnia3RGacywkF5hmVvwgIfBGVg8yBSvQIhDFPiWFbzCDvQg6igXIn7HpVxF2uahxszexDnohWEkOiXDCSVcfl5r4rMAKgoGzBi3dh5Kpx1IgA_2YBdP8R_9IuM_z3Ty_L-iSCXxA/clip_image050_thumb.jpg

9创建变量选择

https://pfztpq.bay.livefilestore.com/y1m2r0noBnaQHtEBnTZHlliEH6E7oSr0PLCxcfSP7olPGSHu6X4bZFCUtln5yUoan4aV4PuGMZt1CU8HWUoTEvUfPTuWgszu58PycZiDt3rm2AFLNj5bk2C0azVkJE5XZwD-kR_RcVFexLMEN4cd7iDaQ/clip_image052_thumb.jpg

选择目标关联选项卡,设置变量选择方法为卡方。

https://pfztpq.bay.livefilestore.com/y1mNKwqcDH_wF8cPwLnsV1cWeLF75yGrYABEvpsp9ZYzQ86e0mDcnoOkt3_MJmHWB9xYQmrfzXDOrTCM_nkhAkxYpExVwwTClekTJz6wpyT8JHc0Zp58KwLdyyAeHn_4lfG431XUUPkPI9XahxtVwlSew/clip_image054_thumb.jpg

运行节点,查看结果。

在21个变量中,9个是rejected,包括分组变量AGE_GA2Y,将其设置为input

10创建逐步回归逻辑模型

https://pfztpq.bay.livefilestore.com/y1mlFD7kYJEkPfIFe5-9JmTla4wsEVq9bPy4bpOZ0CFrLvfKMIiWvDSaT6fIBKbsUmcYo0KjzOTJSh2WyZkt7PV-KLGOfySgLH7gAibzjPdX4mwOlczCrAFoflY3u5IVM7kqXAn5LU3wRkOWnGcXNC-Bg/clip_image056_thumb.jpg

11创建决策树模型

https://pfztpq.bay.livefilestore.com/y1m7OQVyEDzkpq6eIQafIhUECfnG9-mLELWjc0NH0mCnzazolNXrv6qRrAgyupM7WX8BTca55E4afRWlcuuKi4QLpBNNmSrpa12toTUtC5SYF1yNP7WYajZ9PFL7KcK54JObUT-y4zrUDXqj9NlhghmwA/clip_image058_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mFU3iG_A1YqkDYqF-PFckfdx47U9Fz1XwI3YexlFBUrRQCrtVVV20Hg5X7vqIJInCixKRCRJZJRrenT9IaFxFQ8W47APxMHxUI6exzHv3qww27SuyfvXu_lyQZSWO9VEWvopnxixh_jx4E25D-XJcxg/clip_image060_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1m6xhIBIL4gfXE1PR9Bq56awltdXJi9_mEwOiUoCwNlMSnD_n8kqrzZA4AbMLCgYGenNVk9WjrP9c-weDhdJqn8mpLEioRuoOiU4qeD88GM71FvIkPcFdWpav0oDD9BOMfw2UIMrNe9Hmrc2S7pYHPCA/clip_image062_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mwzVMJVNHHqEpSFDSl3qbzBQiKEPgjpBoP5a87p9BTZEuIAwe0eDrBSSP6QXZ6eQNsN27gztz1Q4j4P2UDbdvXH_XhJyL7dJA9vAvu9tzd0Vtabqg54a9ERBJnlic3YKFyH38XT9dSFqvoppcfLTC3Q/clip_image064_thumb.jpg

12评估模型

https://pfztpq.bay.livefilestore.com/y1moN6tRRUUoO7tRD-MOpk102LhuewmN7P5iiDNXgoIYWFhccdsSVukGSlbbhs0znE87KEiyIOhy33o2JEcmrJaCruySUn8y5UjjvSJf5ByWhp4H1I2B5rRG4Djyeaz67kp9rWphJe_7fXn7huzzWKdxg/clip_image066_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mvOzyUESbG1UI27MqqCkejcVtBBI7y3NorksNE1QuW-woon6h7hVLo1YLNagUk2LbLvwJrbVIhBzGBRy84MZEYh-N_XmggQRoWRuDrdnKkVA6MgrptfepR4zNvt5ys-BE-H3RnWsPBVj4__f3vMH7bA/clip_image068_thumb.jpg

运行评估节点,查看模型比较。从工具菜单中选择提升图。

https://pfztpq.bay.livefilestore.com/y1mqngDEFNjAICsuYUmIIatD8TGcAPWoFQMldr1g9gO7s3SIbxUl7sQI-Ddpo5mW65VWGWcZBS4ZnK97FFsoaKofMGWac75HfIo_PisiiVjaPoF_rFnwqVj4KNxr-olfbSxKFKfn1k6BJTVsiHNjUnxog/clip_image070_thumb.jpg

查看损失图表,选择loss单选按钮。

https://pfztpq.bay.livefilestore.com/y1mnPMhi1vwmfe6OCqFUXYEI35F0PlrZZYUj7x0CmlszZwXQPku3IzQPb4CGxtpF_9am-eoLaF878hBpZgHNiGlU9CkKypSXdVQAwqzsUXusDvzABJ5CNxpd5KWS63QtiXFO0Asl6Ya942XA0YaH0896A/clip_image072_thumb.jpg

13定义评分数据集

https://pfztpq.bay.livefilestore.com/y1mal7xl28BApb_s9JtmEl5d7NcIaM-xI0KC7hA4pBvO92ROjOn0rglQO5NvhwYUtOpzRQ3W-piz2p7qrb-p0mWZuv688Y_RIWiAxcsl0oDboQ9VxU6_kCG9Uu6tZ55h4NydmDYINg00gwDBNUC3EF86w/clip_image074_thumb.jpg

14评分数据集

https://pfztpq.bay.livefilestore.com/y1mbsfNxn6Z2VCf2-8yoittFiF_q1y575kVMOUaDycW2-g3alGcJ4CXu-p3M4P27Yi__WVtmG5B0t8zE0VAqWpqmr5hOlH14RkC1NrPHDBRew-lNTHClbUre7xzI0NuI4f5wjie7x8OlHJv6VjXFW2hKA/clip_image076_thumb.jpg

打开score节点,选择如下

https://pfztpq.bay.livefilestore.com/y1m7g_13-uBTHWjhIW4xRvwVYtZFvktqHeEsflYHv0Nq2Dx03AfuSKxdZzGXAW1egADKucMWtNj7iwN4p1QTo8oJMFi4GOj4DGIriWvrvBUcmEej6offSuaWVArA2wMkqlcabr9JzyN9WLm43Fkd4ouvw/clip_image078_thumb.jpg

15查看期望损失

https://pfztpq.bay.livefilestore.com/y1m0buZkUh2lZLISMvVQ5ifRrRh4k80BDklX5tYrAvvRXNnj0MstrM-veaysyXeZKNfFoLeDSxQ-qZKLFKq-GLlJNtHJsHMRW2w4LcDfGD-JwjZQQJJna1lRX3mEmFErZ58B81G-OdwoXVy-mwhlLo-ZA/clip_image080_thumb.jpg

打开分布浏览节点,

https://pfztpq.bay.livefilestore.com/y1mybzh6gcJ8GZuNa-Vx0NULgZCVnnUfK33S9ayLt3uMqviDHQ4ay2RnBcBg36vLTlaXzauODbuO96mNK4pJgylCsZVgMc_WgpNGr0nzT5xFs_cBEsAg7P1N8jF0RwbBe8Kq4Hb9cMd0vN7wbZElaOVug/clip_image082_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mzSxBLUPZvweBX5945NdPkrHU_YBwHMlkbCi62Bwm9xHq149mUC0-ErMCmY4D5SBza12g2OoKwu-t_Xkz1fbfzl8D8vtE39tLzFwLVELckV61cEKjv5SClWgQjWZpiw4rDCr_AKaIvWVlQK753KhO4w/clip_image084_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mrP-bi79pYYjHOVJTm6uIH_q3WNcslD4AEevzo0mEuhCffIcGFZgcGVO_fdnk_yFcuWFYanyrCrJFuMMWxk85R7tYpRYHvzYSDflNd-kJ69QRqFevcvTYgVRVf6uE2HDu2-fTJKRyUVRl1ugLoe_osA/clip_image086_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mgtb-jLhz7wHsof8_-ZmvnEOGPrbFLcNWUM4Jl2H6_JLvHopKeNVK6BWUNDOpHiExZ6o6OhwdFT4xdJ4IqEWb5AgAUgUrKtKugmL_O-hZu14Ql5dQWbTJthB8udQ0SgQcI8vo4zvwTU2Y5gU6UbGVug/clip_image088_thumb.jpg

16创建好的信用贷款申请评分卡

https://pfztpq.bay.livefilestore.com/y1mincQmbCGwRW96oE8qn8gIkggpOK69aKknbcJ6apHcIq_XgL8-tlLEC59Eweba5jx1CFzNpl7kVET5U9zu_EV7UWdUrVcw-YartouZoBoZWnliNmRbRp8wP4F7oP86GG-IKagBcTPXqPK14XKTfe2lA/clip_image090_thumb.jpg

https://pfztpq.bay.livefilestore.com/y1mMGtibvNqmISruUIGQw3k6n_cypIiWaNWH5nLgnvEAM8Gbpqv3UNg2BkKL86sf9U6BJTAfpGIaAi2F0BdAVY9YKSwCCCBEf4WAdP4XN6eIUKdNOicjCINOcoSfWhNP4UNjf2RME_iepta9iw6cVeeNQ/clip_image092_thumb.jpg

在program选项卡中输入下面的代码,显示信用好的贷款申请者

options nocenter nodate;

data goodapps;

set &_SCORE;

if D_GOOD_BAD_ = 'accept';

run;

proc print data = goodapps lable;

var custid D_GOOD_BAD_ EL_GOOD_BAD_;

title "Good Credit Risk Applicants";

run;

https://pfztpq.bay.livefilestore.com/y1mXo3TmDNGE0rNxW1VJWDQvsrY4Hg-PNiywYSuTR7r1c4xg_OVUAa9hZhiL_ZJgl30CoLmFVHQ0NiR9b67fDjUjVe3wUkr_5HGdLne11IbJgl_T013z_j5H7nOVsgigJYr7Xvm9BXAscobiXXg0pOqLQ/clip_image094_thumb.jpg

17创建报告

https://pfztpq.bay.livefilestore.com/y1mhBh5Lk99dSd6NSUQtWyJ1qWda4hyiyjWPRwkeEu7PoPSLKqNRJQnCyjTcIXjquDun_FzEoiywpqWyKYIqYdK9WdQtSjBtZW7dHLJeb_U9yI8xKjT27-bqTT81goelDjeLG47CMeKgP_kK9Zde4QCrg/clip_image096_thumb.jpg

查看创建的报表,在浏览器中查看。

18关闭工程,完成项目。

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有