ITEM ANALYSIS
1. The Definition of Item Analysis
According to James Dean Brown and Thom Hudson, “Item analysis is the systematic statistical evaluation of the effectiveness of individual test items.”[1]
While Wilmar Tinambunan stated that: “Reexamining each test item to discover its strength and flaws is known as item analysis”[2]
Meanwhile Robert Lado stated that: “Item analysis is the study of validity, reliability, and difficulty of test items taken individually as if they were separate tests.”[3]
Anthony J. Nitko stated in his book “item analysis refers to the process of collecting, summarizing, and using information about individual test items, especially information about pupils’ responses to items.”[4]
Based on dictionary of education, “item analysis is an examination of student performance for each item on a test. It consists of reexamination of the responses to items of a test by applying mathematical techniques to assess two characteristics-difficulty and discrimination-of each objective item on the test.”[5]
2. Kinds of Item Analysis
Item analysis usually concentrates three vital features: level of difficulty, discriminating power and the effectiveness of each alternative (the effectiveness of distracter).[6]
a. Difficulty Level
Level of difficulty means the percentage of students who give the right answer.[7] While James Dean Brown stated, it is a statistical index used to examine the percentage of students who correctly answer a given item.[8]
The difficulty of a test item is measured by the number of students in the two criterion groups who missed the item compared to the number who tried the item. The greater the percentage of students who missed the item, the more difficult that item was for that administration of the test. And when a test item is answered correctly by nearly all students, the difficulty index is close to zero. [9]
In other words the larger the proportions getting an item right the easier the item. And an item is classified as being of medium difficulty if the proportion of students answering incorrectly is about halfway between a chance value and the point where no student misses the item.[10]
A good test item should have a certain degree of difficulty. It may not too easy or too difficult, because the test that is too easy or too difficult for the group tested yield score distribution that makes it hard to identify reliable differences in achievement levels between members of the group.[11]
By analyzing the students’ response to the items, the level of difficulty of each item can be known and the information will be helpful for the teacher in identifying concepts to retaught the study material and giving the students feedback about their learning.
Item difficulty goes by many other names: item facility, item easiness, p-value, or abbreviated simply as IF.[12] To make easier in computing the level of difficulty, the writer divides the students into three groups. They are upper, middle, and lower group. Upper and lower group are be focused in analysis and the middle group is aside. The writer uses following formula to find out difficulty level of each item in the English National Examination Tryout Test For Junior High School Level period of 2007-2008 packet 12, tested at third year students of “SMPN” 2 Ciputat :
The formula for computing item difficulty is as follows:
TK = U + L : T
Where:
TK : The index of difficulty level (each item).
U : The number of students in the upper group who answered the item correctly.
L : The number of students in the lower group who answered the item correctly.
T : Total number of students in upper and lower group.[13]
Based on the techniques above, the writer tries to find out the difficulty level of all the items in the English National Examination Tryout Test for Junior High School level period of 2007-2008 packet 12, tested at third year students of “SMPN” 2 Ciputat by the following formula:
P = ∑ b : N
Where:
P : Difficulty level of all items.
b : Difficulty level of each item.
∑ : Sigma (Total)
N : Total number of test items.[14]
Score “TK” (the index of difficulty level of each item) and “P” (difficulty level of all items) can be ranged from 0.00 to 1.00. If “TK” or “P” is less than 0.30, it means almost the students from upper and lower group can not answer the item test correctly (these items belong to difficult one). If “TK” or “P” is 0.30 – 0.70, it means the proportion of students answering incorrectly is about halfway between a chance value and the point where no student misses the item (These items belong to moderate one). And if “TK” or “P” is more than 0.70, it means almost the students from upper and lower group can answer the item test correctly (these items belong to very easy one).
To make clear, the writer will give the table of difficulty level range as follows:[15]
Table 1
Level of Difficulty
P | Interpretation |
≤ 0.30 | Difficult |
0.30 – 0.70 | Moderate |
≥ 0.70 | Easy |
The level of difficulty shows the easiness or difficultness of item test for that group. So the level of difficulty is influenced by the students’ competence. It will be different if the test is given to another group.
a. Discriminating Power
The discriminating power of a test item is its ability to differentiate between pupils who have achieved well (the upper group) and those who have achieved poorly (the lower group).[16] Students with high scores on the test (the upper group) answered the item correctly more frequently than students with low scores on the test (the lower group). If the test items given to the students who have studied well, the score will be high and if they are given to those who have not, the score will be low. On the contrary, if the test items yield the same score when they are given to the two group, or even to the upper group yield the low score and to the lower group yield the high score, so they are not good test items.
Effective and ineffective distracters can be identified from analysis, and those which are not working as planned can be rewritten or replaced. A change in alternatives for a multiple choice item can increase discrimination.
Item discrimination statistic is calculated by subtracting the number of students in the upper group who answered the item correctly from the number of students in the lower group who answered the item correctly then it is divided by half of total number of students in upper and lower group. The formula is as follows:
DP = U - L : ½ T
Where:
DP : The index of item discriminating power.
U : The number of students in the upper group who answered the item correctly
L : The number of students in the lower group who answered the item correctly
T : Total number of students in upper and lower group.[17]
c. The Effectiveness of Distracter
One important aspect affecting the difficulty of multiple choice test items is the quality of distracters. Some distracters, in fact, might not be distracting at all, and therefore serve no purpose.[18] Because the parts of multiple choice items include the item stem, or the main part of the item at the top, the options, which are the alternative choices presented to the student, the correct answer, which is the option that will be counted as correct, and the distracters, which are the options that will be counted as incorrect.[19]
A good distracter will attract more students who have not studied well (the lower group) than the upper group. On the contrary, a weak distracter will not be selected by any of the lower achieving students
In a good test item, the distracters must be functioned effectively, if the distracters are not functioned, they should be rewritten or discarded. And to know whether the distracters are functioned or not, distracter analysis is done that is by comparing the number of students in the upper group and the lower group who selected each incorrect alternatives.[20]
[1] James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (New York: Cambridge University Press, 2002), p. 113.
[4] Anthony J. Nitko, Educational Tests And Measurement an Introduction, (New York: Harcourt Brace Jovanovich, Inc, 1983), p. 284.
[5] Charles D. Hopkins and Richard L. Antes, Classroom Testing: Construction, (Illinois, F.E Peacock Publishers, Inc, 1979), p. 181.
[8] James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (New York: Cambridge University Press, 2002), p. 64.
[9] Charles D. Hopkins and Richard L. Antes, Classroom Testing: Construction, (Itasca: F.E. Peacock Publishers, Inc, 1979), p. 155.
[10] Charles D. Hopkins and Richard L. Antes, Classroom Testing…, p. 155.
[12] James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (New York: Cambridge University Press, 2002), p. 114.
[13] Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Remaja Rosdakarya, 1986), p. 119.
[14] Asmawi Zainul and Noehi Nasoetion, Penilaian Hasil Belajar, (Jakarta: PAU-PPAI, UT, 1993), p. 153.
[15] Sumarna, Surapranata, Analisis, Validitas, Reliabilitas dan Interpretasi Hasil Tes, (Bandung: Remaja Rosdakarya, 2006), p. 21.
[17] Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Remaja Rosdakarya, 1986), p. 120.
[18] Nana Sujana, Penilaian Hasil Proses Belajar Mengajar, (Bandung: Remaja Rosda Karya, 2001), p. 141.
Tidak ada komentar:
Posting Komentar