加载中…
个人资料
  • 博客等级:
  • 博客积分:
  • 博客访问:
  • 关注人气:
  • 获赠金笔:0支
  • 赠出金笔:0支
  • 荣誉徽章:
正文 字体大小:

转载:测试学,区分度,难度,选项分析

(2018-11-11 10:11:36)
标签:

测试学

区分度

难度

选项分析

分类: 统计学常识

原文链接:http://www.specialconnections.ku.edu/?q=assessment/quality_test_construction/teacher_tools/item_analysis

from 堪萨斯大学。

批注:大猫咪。本文术语可能非专业术语,但会尽力帮助理解


*****************************

Item Analysis (本文的item指一道题:1个题目4个选项)

What is item analysis? 

Item analysis is a process of examining class-wide performance on individual test items. There are three common types of item analysis which provide teachers with three different types of information:

  • Difficulty Index(难度:做对本道题的人数/参加考试的人数) - Teachers produce a difficulty index for a test item by calculating the proportion of students in class who got an item correct. (The name of this index is counter-intuitive, as one actually gets a measure of how easy the item is, not the difficulty of the item.) The larger the proportion, the more students who have learned the content measured by the item.
  • Discrimination Index(区分度:下面有详细公式) - The discrimination index is a basic measure of the validity of an item. It is a measure of an item's ability to discriminate between those who scored high on the total test and those who scored low. Though there are several steps in its calculation, once computed, this index can be interpreted as an indication of the extent to which overall knowledge of the content area or mastery of the skills is related to the response on an item. Perhaps the most crucial validity standard for a test item is that whether a student got an item correct or not is due to their level of knowledge or ability and not due to something else such as chance or test bias.
  • Analysis of Response Options(选项分析:ABCD四个选项,分别有多少比例的学生选这个选项) - In addition to examining the performance of an entire test item, teachers are often interested in examining the performance of individual distractors (incorrect answer options) on multiple-choice items. By calculating the proportion of students who chose each answer option, teachers can identify which distractors are "working" and appear attractive to students who do not know the correct answer, and which distractors are simply taking up space and not being chosen by many students. To eliminate blind guessing which results in a correct answer purely by chance (which hurts the validity of a test item), teachers want as many plausible distractors as is feasible. Analyses of response options allow teachers to fine tune and improve items they may wish to use again with future classes.

 

Performing item analysis 

Here are the procedures for the calculations involved in item analysis with data for an example item. For our example, imagine a classroom of 25 students who took a test which included the item below. The asterisk indicates that B is the correct answer. (本例中有25个学生参加考试,其中有一道题目是:《了不起的盖茨比》的作者是谁?

Number of Students Choosing Each Answer Option

Who wrote The Great Gatsby?
A. Faulkner
*B. Fitzgerald
C. Hemingway
D. Steinbeck

4
16
5
0

(分别有这些学生选了ABCD)

Total Number of Students

25

Item Analysis Method

Procedures

Example

*********************

Difficulty Index(难度)- Proportion of students who got an item correct

Count the number of students who got the correct answer. 

Divide by the total number of students who took the test.

Difficulty Indices range from .00 to 1.0.

16 

16/25 = .64

难度:25个学生有16个选中了正确答案,所以难度是0.64。显然,难度的取值范围是0%-100%。

*********************

Discrimination Index (区分度)- A comparison of how overall high scorers on the whole test did on one particular item compared to overall low scorers. 

首先,全班总成绩分两组,卷面不止这一道题,还有其他题目,根据整个卷面总成绩,把全班成绩从高到低对半分两组,成绩好的一组,成绩差的一组。

本例中,13个总成绩排名前一半的一组,12个后一半的一组。分别简称“高分组”和“低分组”。

高分组里,这道题目做对的有10人,占比10/13= 0.77。

低分组里,这道题目做对的有6人,占比6/12= 0.50。

那么,区分度就是0.77-0.50=0.27。

区分度的取值范围是 -1.0~+1.0。+1.0的情况是最理想的,即总成绩排名前一半的同学,这道题目都做对了,排名后一半的同学都做错了。那么这道题目就完美的“区分”了这两组同学。-1.0的情况是最奇怪的,即成绩排名前一半的同学都做错了,后一半的同学都做对了,那么,这道题目完全不能“区分”总成绩的高低,这道题可能出的有问题。

总之,区分度,是把一道题目,放到本次整个班级的整套试卷中去看待,看是否能够较好的区分开总成绩高和总成绩低的学生。

Sort your tests by total score and create two groupings of tests- the high scores, made up of the top half of tests, and the low scores, made up of the bottom half of tests.

For each group, calculate a difficulty index for the item.

Subtract the difficulty index for the low scores group from the difficulty index for the high scores group.

Discrimination Indices range from -1.0 to 1.0.

Imagine this information for our example: 10 out of 13 students (or tests) in the high group and 6 out of 12 students in the low group got the item correct.

High Group 10/13= .77
Low Group 6/12= .50

.77-.50=.27

Analysis of Response Options(选项分析:多少人选了A,多少选B,C和D- A comparison of the proportion of students choosing each response option.

For each answer option divide the number of students who choose that answer option by the number of students taking the test.

Who wrote The Great Gatsby?
A.  Faulkner 4/25 = .16

*B.  Fitzgerald 16/25 = .64

C.  Hemingway 5/25 = .20

D.  Steinbeck 0/25 = .00

 

Interpreting the results of item analysis (如何解释难度、区分度、选项分析这些指标

In our example, the item had a difficulty index of .64. This means that sixty-four percent of students knew the answer. If a teacher believes that .64 is too low, he or she can change the way they teach to better meet the objective represented by the item. Another interpretation might be that the item was too difficult or confusing or invalid, in which case the teacher can replace or modify the item, perhaps using information from the item's discrimination index or analysis of response options.(难度0.64,即64%的同学这道题做对了。这个数值是高是低,要看老师自己去理解。例如,要是觉得过低,那就上课再教教好。老师也可能觉得64%不能分析出啥来,可能这道题目没有出好。那就有可能借助区分度和选项分析来研究下。

The discrimination index for the item was .27. The formula for the discrimination index is such that if more students in the high scoring group chose the correct answer than did students in the low scoring group, the number will be positive. At a minimum, then, one would hope for a positive value, as that would indicate that knowledge resulted in the correct answer. The greater the positive value (the closer it is to 1.0), the stronger the relationship is between overall test performance and performance on that item. If the discrimination index is negative, that means that for some reason students who scored low on the test were more likely to get the answer correct. This is a strange situation which suggests poor validity for an item.(区分度0.27。是个正数。正值越大代表越能区分。区分度是负数的话,就代表总成绩低的学生反而更能做对这道题目,这就说明这个题目的效度(Validity)有问题了。效度也是题目的重要指标,而区分度是能够指示效度的一个指标。效度Validity,就需要另外一篇文章解释了。

The analysis of response options shows that those who missed the item were about equally likely to choose answer A and answer C. No students chose answer D. Answer option D does not act as a distractor. Students are not choosing between four answer options on this item, they are really choosing between only three options, as they are not even considering answer D. This makes guessing correctly more likely, which hurts the validity of an item.(选项分析:本例题目中,没人选D,说明D就不是一个有效的干扰选项distractor。没人选的干扰项,就会降低这道题目的效度。

How can the use of item analysis benefit your students, including those with special needs? 

The fairest tests for all students are tests which are valid and reliable(效度和信度). To improve the quality of tests, item analysis can identify items which are too difficult (or too easy if a teacher has that concern), are not able to differentiate between those who have learned the content and those who have not, or have distractors which are not plausible.(太难的或太简单的题目,都不会帮助区分学生有没有学到知识。干扰项太明显没人去选的话,也会降低区分度

If items are too hard, teachers can adjust the way they teach. Teachers can even decide that the material was not taught and for the sake of fairness, remove the item from the current test, and recompute scores.(题目太难的话,老师甚至可以把这道题从卷面分中移除并重新计算分数

If items have low or negative discrimination values, teachers can remove them from the current test and recomputed scores and remove them from the pool of items for future tests. A teacher can also examine the item, try to identify what was tricky about it, and either change the item or modify instruction to correct a confusing misunderstanding about the content.(区分度太低或者是负值的话,这题目就有问题,要改,或干脆不用了

When distractors are identified as being non-functional, teachers may tinker with the item and create a new distractor. One goal for a valid and reliable classroom test is to decrease the chance that random guessing could result in credit for a correct answer. The greater the number of plausible distractors, the more accurate, valid, and reliable the test typically becomes.(某个干扰项没人选的话,就需要重新出一个干扰项。效度和信度都很高的考试,能够减少学生靠猜来得分的情形。干扰项越好,测试就越有效,越可信。

References 

Research Articles

Haladyna, T.M. & Downing, S.M. & Rodriguez, M.C. (2002). A review of multiple-
choice item-writing guidelines for classroom assessment. Applied Measurement 
in Education, 15(3), 309-334.

0

阅读 收藏 喜欢 打印举报/Report
  

新浪BLOG意见反馈留言板 欢迎批评指正

新浪简介 | About Sina | 广告服务 | 联系我们 | 招聘信息 | 网站律师 | SINA English | 产品答疑

新浪公司 版权所有