Tips 50:regexm筛选字符_Stata

http://blog.sina.com.cn/u/1654372184

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

Tips 50:regexm筛选字符

(2012-02-09 18:57:04)

标签：

筛选字符

stata

杂谈

分类：数据管理

【问题】

刚在QQ群里看到这么一个问题：

如何将下列的各种“罚款”数据筛选出来？

http://s1/middle/629bb758hb885e1ca2260&69050:regexm筛选字符" TITLE="Tips 50:regexm筛选字符" />

【方法】

1、可以将带有罚款的变量命名逐个找出来，慢慢写程序：

keep if var=="罚款滞纳金支出" | var=="罚没支出" |等等

但是，这样是否太麻烦，关键这么多会计明细，怎么可能都找出来呢。

2、利用regexm这个小运算，具体内容：

    regexm(s,re)
       Domain s:     strings
       Domain re:    regular expression
       Range:        strings
       Description: performs a match of a regular expression and evaluates to 1 if regular expression re is satisfied by the string s, otherwise returns 0. Regular expression syntax is based on Henry Spencer's NFA algorithm and this is nearly identical to the POSIX.2 standard.

【例子】

*只有带“罚”字就将其保留。

keep if regexm(var1, "罚") == 1

*其实，Excel里面很好筛选，不过没有Stata这个小运算方便。

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：Tips 49:center将变量标准化

后一篇：Tips 51:restore恢复数据

新浪BLOG意见反馈留言板　欢迎批评指正