[转载]Censored data删失数据_燕大雪雁

http://blog.sina.com.cn/u/1568232671

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

[转载]Censored data删失数据

(2013-11-30 09:43:53)

标签：

转载

原文地址：Censored data删失数据作者：微笑着的小猪

Censored data删失数据

这两天看MPLUS统计手册，总看见censored data这个概念，censored的原意是“审查过的”，在网络用语上是“屏蔽的”，通俗的理解就是敏感词的意思。但什么是“审查过的数据”呢？又查了一下censored data，统计术语上称作“删失数据”，解释为：就是在数据收集过程中，某些在检测线下的数据被检测线的值或是0值代替了。

这一解释，稍微能理解点意思，但仍是没有概念。后来，查到下面这段话，是Amos7.0可以处理censored data，并且给出了一个具体例子，才总算明白了啥叫censored data。英文和翻译如下：

Censored data occurs when you know that a measurement exceeds some threshold, but you don’t know by how much. (There is a less common kind of censored data where you know that a measurement falls below some threshold, but do not know by how much.) As an example of censored data, suppose you watch people as they try to solve a problem and record how long each person takes to solve it. Suppose that you don’t want to spend more than 10 minutes waiting for a person to reach a solution, so that if a person has not solved the problem in 10 minutes, you call a halt and record the fact that “time to solve” was greater than 10 minutes. If five people solve the problem and two don’t, the data from seven people might look like this:

当你知道一个测量值超过了临界值，但又不知道具体超过了多少，就叫“删失数据”（较少出现那种低于临界值，但又不知道具体低多少的删失数据）。举个例子，假设你想要进行一个关于问题解决的研究，记录下被试进行问题解决所花的时间。假如有的被试花了10分钟还没有答完题，而你又不想继续等待，到10分钟的时候你就让被试停止答题，然后记录这名被试的答题时间为“大于10分钟”。假设7名被试中有2名没答完，那么数据记录如下表：

Case	Time to solve
1	6
2	2
3	9
4	>10
5	4
6	9
7	>10

In Amos 6.0, you could either treat the observation for cases 4 and 7 as missing, or substitute an arbitrary number like 10 or 11 or 12 for cases 4 and 7. Treating cases 4 and 7 as missing has the effect of biasing the sample by excluding poor problem solvers. Substituting an arbitrary number for a censored value is also undesirable, although the exact effect of substituting an arbitrary number is impossible to know.

在Amos6.0中，你可以把被试4和被试7的数据作为缺失数据处理，也可以人为地给他们赋值为10、11或12等。作为缺失值处理的话，有可能会因为删去了一些问题解决较差者而出现样本偏差。虽然赋其他值的效果还不能确切知道，但这个做法同样也不是个好办法。

In Amos 7.0 you can take advantage of all the information you have about cases 4 and 7 without making assumptions other than the assumption of normality.

在Amos7.0中，研究者就可以利用被试4和被试7的数据信息，而不用作正态假设之外的其他假设。

总结起来，所谓censored data，就是那些在研究中被掐头（当然也有的是去尾）的数据，知道是被掐了头，但不知道掐了多少。以前很多研究都是当缺失值处理了，但这样做容易导致样本有偏差（即差的都被去掉了，不能反映数据全貌）。可见，统计软件的更新和升级是多么的重要。

阅读┊ 收藏 ┊转载原文 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：[转载]Matlab Hilbert-Huang 变换分析总结

后一篇：[转载]Matlab中下标，斜体，及希腊字母的使用方法

新浪BLOG意见反馈留言板　欢迎批评指正