[转载]Censored data删失数据
(2013-11-30 09:43:53)
标签:
转载 |
Censored data删失数据
这两天看MPLUS统计手册,总看见censored data这个概念,censored的原意是“审查过的”,在网络用语上是“屏蔽的”,通俗的理解就是敏感词的意思。但什么是“审查过的数据”呢?又查了一下censored data,统计术语上称作“删失数据”,解释为:就是在数据收集过程中,某些在检测线下的数据被检测线的值或是0值代替了。
这一解释,稍微能理解点意思,但仍是没有概念。后来,查到下面这段话,是Amos7.0可以处理censored data,并且给出了一个具体例子,才总算明白了啥叫censored data。英文和翻译如下:
Censored data occurs when you know that a
measurement exceeds some threshold, but you don’t know by how much.
(There is a less common kind of censored data where you know that a
measurement falls below some threshold, but do not know by how
much.) As an example of censored data, suppose you watch people as
they try to solve a problem and record how long each person takes
to solve it. Suppose that you don’t want to spend more than 10
minutes waiting for a person to reach a solution, so that if a
person has not solved the problem in 10 minutes, you call a halt
and record the fact that “time to solve” was greater than 10
minutes.
当你知道一个测量值超过了临界值,但又不知道具体超过了多少,就叫“删失数据”(较少出现那种低于临界值,但又不知道具体低多少的删失数据)。举个例子,假设你想要进行一个关于问题解决的研究,记录下被试进行问题解决所花的时间。假如有的被试花了10分钟还没有答完题,而你又不想继续等待,到10分钟的时候你就让被试停止答题,然后记录这名被试的答题时间为“大于10分钟”。假设7名被试中有2名没答完,那么数据记录如下表:
|
Case |
Time to solve |
|
1 |
6 |
|
2 |
2 |
|
3 |
9 |
|
4 |
>10 |
|
5 |
4 |
|
6 |
9 |
|
7 |
>10 |
In Amos 6.0, you could either treat the observation for cases 4 and 7 as missing, or substitute an arbitrary number like 10 or 11 or 12 for cases 4 and 7. Treating cases 4 and 7 as missing has the effect of biasing the sample by excluding poor problem solvers. Substituting an arbitrary number for a censored value is also undesirable, although the exact effect of substituting an arbitrary number is impossible to know.
在Amos6.0中,你可以把被试4和被试7的数据作为缺失数据处理,也可以人为地给他们赋值为10、11或12等。作为缺失值处理的话,有可能会因为删去了一些问题解决较差者而出现样本偏差。虽然赋其他值的效果还不能确切知道,但这个做法同样也不是个好办法。
In Amos 7.0 you can take advantage of all the information you have about cases 4 and 7 without making assumptions other than the assumption of normality.
在Amos7.0中,研究者就可以利用被试4和被试7的数据信息,而不用作正态假设之外的其他假设。
总结起来,所谓censored data,就是那些在研究中被掐头(当然也有的是去尾)的数据,知道是被掐了头,但不知道掐了多少。以前很多研究都是当缺失值处理了,但这样做容易导致样本有偏差(即差的都被去掉了,不能反映数据全貌)。可见,统计软件的更新和升级是多么的重要。

加载中…