Senin, 27 April 2009

The Importance of Item Analysis

The Importance of Item Analysis

The result of item analysis can be used to select items of desired difficulty that best discriminate between high and low achieving students. However the results of an item analysis can be useful in identifying faulty items and can provide information about student misconceptions and topics that need additional work.[1]

The benefits of item analysis are not limited to the improvement of individual test items; however there are a number of fringe benefits of special value to classroom teachers. The most important of these are the following:

a. Item analysis data provide a basis for efficient class discussion of the test results.

b. Item analysis data provide a basis for remedial work.

c. Item analysis data provide a basis for the general improvement of classroom instruction.

d. Item analysis procedures provide a basis for increased skill in test construction. [2]

While Anthony J. Nitko states in his book, the important of item analysis are: Determining whether an item functions as the teacher intends, feedback to students about their performance and as basis for class discussion, feedback to the teacher about pupil difficulties, areas for curriculum improvement, revising the items, improving item writing skills.[3]



[1] Robert L. Linn and Norman E. Grondlund, Measurement and Assessment in Teaching, (New Jersey: Prentice Hall, Inc, 1995), p. 315.

[2] Robert L. Linn and Norman E. Grondlund, Measurement and…, p. 316.

[3] Anthony J. Nitko, Educational Tests and Measurement an Introduction, (New York: Harcourt Brace Jovanovich, Inc, 1983), p. 284.

Item Analysis

ITEM ANALYSIS

1. The Definition of Item Analysis

According to James Dean Brown and Thom Hudson, “Item analysis is the systematic statistical evaluation of the effectiveness of individual test items.”[1]

While Wilmar Tinambunan stated that: “Reexamining each test item to discover its strength and flaws is known as item analysis”[2]

Meanwhile Robert Lado stated that: “Item analysis is the study of validity, reliability, and difficulty of test items taken individually as if they were separate tests.”[3]

Anthony J. Nitko stated in his book “item analysis refers to the process of collecting, summarizing, and using information about individual test items, especially information about pupils’ responses to items.”[4]

Based on dictionary of education, “item analysis is an examination of student performance for each item on a test. It consists of reexamination of the responses to items of a test by applying mathematical techniques to assess two characteristics-difficulty and discrimination-of each objective item on the test.”[5]

2. Kinds of Item Analysis

Item analysis usually concentrates three vital features: level of difficulty, discriminating power and the effectiveness of each alternative (the effectiveness of distracter).[6]

a. Difficulty Level

Level of difficulty means the percentage of students who give the right answer.[7] While James Dean Brown stated, it is a statistical index used to examine the percentage of students who correctly answer a given item.[8]

The difficulty of a test item is measured by the number of students in the two criterion groups who missed the item compared to the number who tried the item. The greater the percentage of students who missed the item, the more difficult that item was for that administration of the test. And when a test item is answered correctly by nearly all students, the difficulty index is close to zero. [9]

In other words the larger the proportions getting an item right the easier the item. And an item is classified as being of medium difficulty if the proportion of students answering incorrectly is about halfway between a chance value and the point where no student misses the item.[10]

A good test item should have a certain degree of difficulty. It may not too easy or too difficult, because the test that is too easy or too difficult for the group tested yield score distribution that makes it hard to identify reliable differences in achievement levels between members of the group.[11]

By analyzing the students’ response to the items, the level of difficulty of each item can be known and the information will be helpful for the teacher in identifying concepts to retaught the study material and giving the students feedback about their learning.

Item difficulty goes by many other names: item facility, item easiness, p-value, or abbreviated simply as IF.[12] To make easier in computing the level of difficulty, the writer divides the students into three groups. They are upper, middle, and lower group. Upper and lower group are be focused in analysis and the middle group is aside. The writer uses following formula to find out difficulty level of each item in the English National Examination Tryout Test For Junior High School Level period of 2007-2008 packet 12, tested at third year students of “SMPN” 2 Ciputat :

The formula for computing item difficulty is as follows:


TK = U + L : T

Where:

TK : The index of difficulty level (each item).

U : The number of students in the upper group who answered the item correctly.

L : The number of students in the lower group who answered the item correctly.

T : Total number of students in upper and lower group.[13]

Based on the techniques above, the writer tries to find out the difficulty level of all the items in the English National Examination Tryout Test for Junior High School level period of 2007-2008 packet 12, tested at third year students of “SMPN” 2 Ciputat by the following formula:

P = b : N




Where:

P : Difficulty level of all items.

b : Difficulty level of each item.

: Sigma (Total)

N : Total number of test items.[14]

Score “TK” (the index of difficulty level of each item) and “P” (difficulty level of all items) can be ranged from 0.00 to 1.00. If “TK” or “P” is less than 0.30, it means almost the students from upper and lower group can not answer the item test correctly (these items belong to difficult one). If “TK” or “P” is 0.30 – 0.70, it means the proportion of students answering incorrectly is about halfway between a chance value and the point where no student misses the item (These items belong to moderate one). And if “TK” or “P” is more than 0.70, it means almost the students from upper and lower group can answer the item test correctly (these items belong to very easy one).

To make clear, the writer will give the table of difficulty level range as follows:[15]

Table 1

Level of Difficulty

P

Interpretation

­0.30

Difficult

0.30 – 0.70

Moderate

0.70

Easy

The level of difficulty shows the easiness or difficultness of item test for that group. So the level of difficulty is influenced by the students’ competence. It will be different if the test is given to another group.

a. Discriminating Power

The discriminating power of a test item is its ability to differentiate between pupils who have achieved well (the upper group) and those who have achieved poorly (the lower group).[16] Students with high scores on the test (the upper group) answered the item correctly more frequently than students with low scores on the test (the lower group). If the test items given to the students who have studied well, the score will be high and if they are given to those who have not, the score will be low. On the contrary, if the test items yield the same score when they are given to the two group, or even to the upper group yield the low score and to the lower group yield the high score, so they are not good test items.

Effective and ineffective distracters can be identified from analysis, and those which are not working as planned can be rewritten or replaced. A change in alternatives for a multiple choice item can increase discrimination.

Item discrimination statistic is calculated by subtracting the number of students in the upper group who answered the item correctly from the number of students in the lower group who answered the item correctly then it is divided by half of total number of students in upper and lower group. The formula is as follows:

DP = U - L : ½ T

Where:

DP : The index of item discriminating power.

U : The number of students in the upper group who answered the item correctly

L : The number of students in the lower group who answered the item correctly

T : Total number of students in upper and lower group.[17]

c. The Effectiveness of Distracter

One important aspect affecting the difficulty of multiple choice test items is the quality of distracters. Some distracters, in fact, might not be distracting at all, and therefore serve no purpose.[18] Because the parts of multiple choice items include the item stem, or the main part of the item at the top, the options, which are the alternative choices presented to the student, the correct answer, which is the option that will be counted as correct, and the distracters, which are the options that will be counted as incorrect.[19]

A good distracter will attract more students who have not studied well (the lower group) than the upper group. On the contrary, a weak distracter will not be selected by any of the lower achieving students

In a good test item, the distracters must be functioned effectively, if the distracters are not functioned, they should be rewritten or discarded. And to know whether the distracters are functioned or not, distracter analysis is done that is by comparing the number of students in the upper group and the lower group who selected each incorrect alternatives.[20]



[1] James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (New York: Cambridge University Press, 2002), p. 113.

[2] Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: DEpdikbud, 1988), p. 137.

[3] Robert Lado, Language Testing, (New York: McGraw-Hill book Company, 1962), p. 342.

[4] Anthony J. Nitko, Educational Tests And Measurement an Introduction, (New York: Harcourt Brace Jovanovich, Inc, 1983), p. 284.

[5] Charles D. Hopkins and Richard L. Antes, Classroom Testing: Construction, (Illinois, F.E Peacock Publishers, Inc, 1979), p. 181.

[6] Wilmar Tinambunan, Evaluation of…, p.137.

[7] Andrew Harison, A Language Testing Handbook, (London: McMillan Press London, 1983), p. 128.

[8] James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (New York: Cambridge University Press, 2002), p. 64.

[9] Charles D. Hopkins and Richard L. Antes, Classroom Testing: Construction, (Itasca: F.E. Peacock Publishers, Inc, 1979), p. 155.

[10] Charles D. Hopkins and Richard L. Antes, Classroom Testing…, p. 155.

[11] Wilmar Tinambunan, Evaluation of…, p.137.

[12] James Dean Brown and Thom Hudson, Criterion-referenced Language Testing, (New York: Cambridge University Press, 2002), p. 114.

[13] Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Remaja Rosdakarya, 1986), p. 119.

[14] Asmawi Zainul and Noehi Nasoetion, Penilaian Hasil Belajar, (Jakarta: PAU-PPAI, UT, 1993), p. 153.

[15] Sumarna, Surapranata, Analisis, Validitas, Reliabilitas dan Interpretasi Hasil Tes, (Bandung: Remaja Rosdakarya, 2006), p. 21.

[16] Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: DEpdikbud, 1988), p.139.

[17] Ngalim Purwanto, Prinsip-prinsip dan Teknik Evaluasi Pengajaran, (Bandung: Remaja Rosdakarya, 1986), p. 120.

[18] Nana Sujana, Penilaian Hasil Proses Belajar Mengajar, (Bandung: Remaja Rosda Karya, 2001), p. 141.

[19] James Dean Brown, Testing in Language Programs, (New Jersey: Prentice Hall Regents, 1996), p. 70.

[20] Wilmar Tinambunan, Evaluation of Students Achievement, (Jakarta: Depdikbud, 1988), p.141.

Jumat, 24 April 2009

Concept of Tryout Test (English Version)

TRYOUT TEST CONCEPT

(English version)


1. The Concept of Tryout Test

The word “tryout” is originally denoting a test to ascertain the qualifications of applicants, as for an athletic team or theatrical role, or an experimental performance of a play before its official opening. Also it is a procedure that ascertains effectiveness, value, proper function, or other quality.[1]

In educational aspect, tryout test is an exercise means for students to face final test, either semester or national tests. Today, however, tryout test stresses more on preparing students to face national test. In addition, tryout test has been routinely implemented by classes 7, 8, 9, 10, 11 and 12 meaning that this test does not merely focus on national test exercise. What differentiates is that in tryout test for class 9 and 12, time and procedures are predetermined by regional education office, while tryout test for classes 7, 8, 10 and 11 depend on each school policy.


As we know, there are four kinds of test (placement, formative, diagnostic and summative). Generally, tryout test fall into more broader category; it can be used by the teacher as a means to examine daily grade to monitor learning progress during the instruction and to provide continuous feedback to both pupil and teacher concerning learning success and failures (formative test) as well as a means to know students comprehension of a material learned. And, it is concerned with the persistent or recurring learning difficulties that are left unresolved by the standard corrective prescriptions of formative evaluation (diagnostic test).


Besides, this test can be used so as to the teacher can find out whether or not students comprehension are under “UKM/SKL” (graduate competence standard). The result of this test will, in turn, be mapped, meaning that students with grades under average will be given “remedial” (material enriching).


In fact, before 1994 curriculum, tryout test has been applied with the name of “exercise”. In “EBTANAS” era, this test is called “EBTANAS exercise”, not tryout test. Therefore, it is concluded that the name of “tryout” test is only a matter of terminology coming from English. Initially, this term is used by courses center “BIMBEL” (Bimbingan Belajar) to attract the students to join the course.


In reality, the determination of how many times tryout test is implemented mainly depends on the readiness and the resources of each regional school. It means that tryout test is different from one school to other. In “SMPN” 2 Ciputat, Tangerang, tryout test is done three times (for class 9). The first tryout test is called “pre test” that is done before enriching program to evaluate how far the students comprehend four subjects that will be examined in national test ”UN”. The second test is “mapping test”, in which the students are categorized as “low”, “medium”, and “high” in their comprehension of the materials. The students falling into “low” category will be given material enriching program. Finally, the third test is “strengthening test” given before national test.


The materials of tryout test refer fully to “SKL” that in the development process is called “SKL exploration” (Bedah SKL). Hence, not all materials included in the curriculum will be considered as reference in making tryout test materials. For example, there are only two “SKL” for Indonesian language subject and English language subject. Meanwhile, there are four “SKL” for Math and ten for Natural Science “IPA”. It means that all this materials do not represent materials in curriculum. Items for national test will not deviate from “SKL”. Therefore, “SKL exploration” is completed in making tryout test materials.

Theoretically, tryout test is correlated to the national examination “UN”, because the tryout test is a kind of practice to the national examination. Tryout test is able to trigger off students’ bravery and motivation to extend to which individual works or strives to learn the language because of desire to do so and the satisfaction in the activity by doing the tryout test but also known how to deal with national examination, besides, the result of the tryout test can encourage the students to study harder and to prepare themselves better than before. Finally, they will study more intensively and tend to have good scores in the national examination.

The teachers believe that tryout test can improve the quality of national exam “UN” score, since the students and the schools will try hard to meet the demand of the examination. However, education deals with human beings and the process of education are not like the production process in factories. The examination is based on the accumulation of knowledge and skills mastered by the students during the process and the items of the test have been carefully framed on certain statistical considerations.

Materials for tryout test are gathered from all classes, namely classes 7, 8, and 9 (for junior high school) and 10, 11 , and 12 (for senior high school) with different percentage:

§ 20% from class 7 (for JHS), 10 (for SHS)

§ 30% from class 8 (for JHS), 11 (for SHS), and

§ 50% from class 9 (for JHS), 12 (for SHS).


The expenditure for tryout test is collected from each school contribution, with the coordination by regional unit, including drafting, copying, to the inspection of the result. It differs from national test in which all expenditure covered by national budget.


In general, there are two kinds of tryout tests developer:

· MGMP (Musyawarah Guru Mata Pelajaran) in school level, and

· MGMP in regional level.

Basically, tryout test is developed by subject teacher. If tryout test is for classes 7, 8 10, and 11, then MGMP (Musyawarah Guru Mata Pelajaran) in school level is responsible for the development of the material (usually, almost all subjects are examined). Meanwhile, in tryout test for class 9 and 12, then MGMP in regional level is responsible for the development of the materials. The subjects in this tryout test are four (that will also be given in national examination “UN”), namely Indonesian Language, English, Math and Natural Science. Before composing the tryout test materials, all teachers of a district usually gather to discuss “SKL exploration” determined by “BSNP” (Badan Standar Nasional Pendidikan). Draft for tryout test has been formulated by “Depdiknas” (National Education Department)[2].

3. The Purpose of Tryout Test

The purpose of tryout test is to familiarize students in completing items in a collective examination, held by school, regions, and central department in which the test for classes 9 and 12 is aimed at preparing students to face national examination, while test for classes 7, 8, 10, and 11 is aimed at preparing students to face general examination.

The teachers believe that tryout test can improve the quality of national examination “UN” score, since the students and the schools will try hard to meet the demand of the examination.

The tryout test should provide data for such purposes as:

a. To detect and correct weakness in test directions.

b. Identify weak items.

c. Find out item difficulties and item discriminations.

d. Identify and eliminate distracters which are too close to the keyed answers or which are not selected at all, and

e. Determine appropriate time limits for the final test.[2]



[2] H.J.X. Fernandes, Testing and Measurement, (Jakarta: National Educational Planning, Evaluation and Curriculum Development, 1984), p. 14.

Concept of Tryout Test (Indonesian Version)

KONSEP TES “TRYOUT”

(Indonesian version)

Tes tryout itu dapat didefinisikan sebagai wahana latihan ujian siswa, baik itu untuk menghadapi ujian semester maupun ujian nasional. Sebenarnya tes tryout itu sudah merupakan suatu kegiatan rutin yang dilaksanakan baik itu oleh kelas 7, 8, 9, 10, 11 maupun 12, bukan hanya terfokus kepada ujian nasional saja. Akan tetapi untuk saat ini tes tryout itu lebih ditekankan untuk menghadapi ujian nasional. Hal yang membedakannya adalah bahwa tes tryout untuk kelas 9 dan 12 itu waktunya sudah ditentukan oleh Dinas pendidikan kabupaten, sedangkan untuk kelas 7, 8, 10 dan 11 itu tergantung sepenuhnya kepada sekolah masing-masing.

Tujuan dilaksanakannya tes tryout itu adalah sebagai wahana pembiasaan siswa terhadap penyelesaian soal-soal terutama yang sifatnya ujian bersama, baik yang diselenggarakan oleh dinas, gugus kecamatan, maupun pusat, kelas 9 dan 12 berarti mempersiapkan untuk menghadapi ujian nasional, sedangkan kelas 7, 8, 10 dan 11 mempersiapkan untuk menghadapi ulangan umum bersama.

Kita ketahui terdapat 4 jenis tes (Placement, Formative, Diagnostic dan Summative), dalam hal ini, tes tryout masuk kedalam jenis yang lebih luas, bisa diambil sebagai nilai harian oleh guru masing-masing, ataupun bisa juga dijadikan sebagai standar penguasaan siswa terhadap suatu materi pelajaran. Bisa termasuk kedalam kategori formatif maupun diagnosis (khususnya menjelang ujian nasional). Pelaksanaan tes tryout dapat dijadikan sebagai alat untuk mengetahui peta kemampuan siswa, sehingga guru dapat mengetahui siswa manakah yang penguasaannya masih dibawah UKM/SKL (standar kompetensi lulusan), manakah yang sudah cukup, maupun yang sudah melebihi standar. Tahap selanjutnya akan dikategorikan dalam “pemetaan” yang artinya bagi kelompok siswa yang masih dibawah rata-rata, sang guru akan memberikan “remedial” (pengayaan materi).

Awal mula pelaksanaan tes tryout itu, sesungguhnya sebelum pelaksanaan kurikulum 1994 tes “tryout” itu sudah dilaksanaan, hanya saja namanya belum “tryout” akan tetapi hanya dinamakan sebagai “latihan”. Ketika zaman EBTANAS dahulu, saat itu dinamakan sebagai “latihan EBTANAS” bukan “tryout”. Jadi bisa dikatakan bahwa nama tes “tryout” itu hanya mengambil istilah saja, yang diambil dari bahasa asing (bahasa Inggris) baru-baru ini, yang asal mulanya dikembangkan/dipopulerkan oleh BIMBEL (bimbingan belajar) yang dijadikan sebagai salah satu daya tarik bagi “instansi-intansi” tersebut dengan maksud menarik para pelajar untuk bergabung dengan instansi mereka. Untuk masalah pelopor terlaksananya tes “tryout” ini kurang dapat dipastikan.

Pada dasarnya tes tryout itu dibuat oleh para guru bidang studi masing-masing, hanya saja dibedakan kepada jenis tryout yang akan dilaksanakan. Jika tes tryout itu hendak dilaksanakan untuk kelas 7, 8, 10 dan 11 maka yang bertanggung jawab untuk membuat soalnya adalah “MGMP” (Musyawarah Guru Mata Pelajaran) tingkat sekolah, yang diserahkan sepenuhnya kepada guru masing-masing sekolah yang bersangkutan (biasanya hampir seluruh mata pelajaran diujikan). Sedangkan Jika tes tryout itu hendak dilaksanakan untuk kelas 9 dan 12 maka yang bertanggung jawab untuk membuat soalnya adalah “MGMP” tingkat kabupaten, yang dilaksanakan secara serentak. Akan tetapi cakupannya hanya terfokus kepada 4 mata pelajaran saja yang diujikan, (yaitu subjek yang sama dengan yang akan diujikan ketika ujian nasional), diantaranya adalah bahasa Indonesia, bahasa Inggris, matematika dan IPA. Hal ini diterapkan demi mempersiapkan siswa menghadapi ujian nasional. Sebelum membuat soal tryout biasanya para guru se-kabupaten berkumpul untuk membahas hal yang dinamakan dengan “bedah SKL (Standar Kompetensi Lulusan)” yang sudah ditentukan oleh BSNP (Badan Standar Nasional Pendidikan), jadi kisi-kisi/rambu-rambu untuk materi soal tryout sudah ditentukan dari “Depdiknas”.

Sebenarnya untuk memastikan berapa kali pelaksanaan tes tryout itu tidak dapat ditentukan, tergantung kepada daya/kesiapan dan sumber dana dari setiap sekolah masing-masing daerah. Dikarenakan pelaksanaan tes tryout itu membutuhkan dana sekaligus kesiapan yang cukup. Sehingga di masing-masing sekolah itu berbeda jumlah pelaksanaannya. Untuk pelaksanaan tes tryout di (SMPN 2 Ciputat) biasanya dilakukan sebanyak 3 kali (untuk kelas 9) tes tryout yang pertama itu adalah “pre test” (yang dilaksanakan sebelum pelaksanaan bimbingan belajar/pengayaan), mengevaluasi sejauh mana kemampuan/penguasaan siswa terhadap 4 materi pelajaran yang akan di ujikan di “UN” nanti. Lalu test tryout yang kedua yaitu untuk “pemetaan” siswa, yang kemudian akan dikategorikan siswa yang masih “rendah”, “sedang” dan “tinggi” dari penguasaan materi yang akan diujikan nanti. Sehingga siswa yang masuk kedalam kategori “rendah” akan dikelompokan lagi dan diberikan pengayaan materi, dan begitu menjelang ujian nasional tes tryout yang ketiga pun dilaksanakan (sebagai penguatan materi).

Soal-soal tes tryout itu mengacu sepenuhnya ke “SKL”, yang dalam proses pembuatannya dikenal dengan istilah “bedah SKL” (yang sudah dijelaskan di awal), jadi tidak seluruh materi yang dipaparkan dalam kurikulum itu dijadikan sebagai acuan dalam pembuatan soal tes tryout. Contohnya untuk bahasa Indonesia dan bahasa Inggris itu hanya ada 2 “SKL”. Matematika itu ada 4, dan IPA ada 10. Dan ini tidak mewakili seluruh materi yang ada di dalam kurikulum. Dan juga soal-soal ujian nasional itu tidak akan melenceng/keluar dari “SKL”. Oleh karena itu diadakan “bedah SKL” dalam proses pembuatan soal tryout.

Untuk soal tes tryout itu diambil dari materi seluruh kelas, yaitu dari kelas 7, 8, dan 9 (untuk tingkat SMP) dan kelas 10, 11, dan 12 (untuk tingkat SMA). Hanya persentasenya saja yang berbeda, yaitu: untuk materi kelas 7 atau 10 hanya diambil sebanyak 20 %, kelas 8 atau 11 sebanyak 30 %, dan dari kelas 9 atau 12 diambil sebanyak 50 %. (yang sudah terangkum dalam “SKL”).

Biaya tryout itu merupakan hasil dari iuran sekolah masing-masing, yang dikoordinir oleh dinas kabupaten, proses dari penyusunan, penggandaan, hingga sampai ke pemeriksaan hasil, merupakan hasil dana dari iuran komite/iuran sekolah, bukan dari pemerintah (karena ini bukan program nasional). Berbeda halnya dengan pelaksanaan “UN” yang biaya keseluruhanya sudah tersedia di “APBN”.

Function of Test

Function of Test

“Test is the message.” The teacher, in the most direct and meaningful manner, tells the student what he really thinks is important through his test. Test has some functions of course. Penny Ur wrote some functions of tests. In her book stated that tests may be used as a mean to:

a. Give the teacher information about where the students are now, to help decide what to teach next.

b. Give the students information about what they know, so that they also have an awareness of what they need to learn or review.

c. Assess for some purpose external to current teaching (a final grade for the course selection).

d. Motivate students to learn or review specific material.

e. Get a noisy class to keep quite and concentrates.

f. Provide a clear indication that the class has reached a ‘station’ in learning. Such as the end of a unit, thus contributing to a sense of structure in the course as a whole.

g. Get students to make an effort (in doing the test itself), which is likely to lead to better results and a feeling of satisfaction.

h. Give students tasks which themselves may actually provide useful review or practice, as well as testing.

i. Provide students with a sense of achievement and progress in their learning.[1]



[1] Penny Ur, A Course in Language Testing, (New York: Cambridge University Press, 1996), p. 34.

Kinds of Test

Kinds of Test

The teacher is the best position to know which tests are appropriate for his class. The teachers may give classroom tests in the intention of motivating student efforts to learn or to assess the outcomes of the efforts, and the appropriateness of a test is largely determined by purpose.[1] Test can be categorized according to the types of information they provide. This categorization will prove useful both in deciding whether an existing test is suitable for a particular purpose and in writing appropriate new tests where these are necessary.[2]

a. Based on Its Purpose

1) Aptitude test

The aptitude test is conceived as a prognostic measure that indicates whether a student is likely to learn a second language readily. It is generally given before the student begins language study, and may be used to select students for a language course or to place students in sections appropriate to their ability.[3]

Aptitude tests are designed to predict future performance in some activity. It can provide information that is useful in determining learning readiness, individualizing instruction, organizing classroom groups, identifying underachievers, diagnosing learning problems, and helping students with their educational and vocational plans. It makes a special contribution.[4]

Aptitude tests are designed to predict potential. It attempts to indicate what a person could learn if opportunity and motivation are present. Aptitude tests are attempting to assess what the student “could do” more than what the student “will do”[5]

Aptitude tests do not measure a fixed capacity. Rather, they provide an indication of present level of learned abilities and can be useful in predicting future performance. Performance on aptitude tests is influenced by previous learning experiences, but it is less directly dependent on specific courses of instruction than is performance on achievement tests.[6]

There are some useful functions provided by aptitude tests.[7] The first is, from aptitude tests’ result, a tester can determine test taker’s readiness to have instructional programs. The second is the tester can classify or place individuals in appropriate class. The third is the tester can diagnose the individual’s specific strength and weakness. The last is the tester can measure aptitude for learning.

2) Achievement test

An achievement test (also called an attainment or summative test) looks back over a longer period of learning than the diagnostic test, for example a year’s work, or a whole course, or even a variety of different courses. It is intended to show the standard which the students have now reached in relation to other students at the same stage.[8]

Achievement tests are directly related to language courses, their purpose being to establish how successful individual students, groups of students, or the courses themselves have been in achieving objectives.[9] Achievement test is similar to the progress test in that it measures how much the student has learned in the course of second-language instruction.[10]

Achievement test is designed to indicate degree of success in some past learning activity. This purpose of achievement test is obviously different with the purpose of aptitude test, where the aptitude test is designed to predict success in some future learning activity.[11]

There are two kinds of achievement tests: final achievement tests and progress achievement tests. Final achievement tests are those administered at the end of a course of study. They may be written and administered by ministries of education, official examining boards, or by members of teaching institutions. Clearly the content of these tests must be related to the courses with which they are concerned, but the nature of this relationship is a matter of disagreement amongst language testers. The content of a final achievement test should be based directly on a detailed course syllabus or on the books and other materials used. Progress achievement tests, as their name suggest, are intended to measure the progress that students are making. They contribute to formative assessment.[12]

Achievement test is conducted at the end of an instructional segment to determine if learning is sufficiently complete to warrant moving the learner to the next segment of instruction. It is to determine the status of achievement at the end of an instructional segment, to determine how well things went.[13]

John E. Horrocks wrote some functions of achievement tests. In his book he stated that achievement tests may be used as a means :

a) To gain a picture of the range and nature of individual differences in a group where some specified aspect of achievement is concerned

b) To equate groups for research and sectioning purposes.

c) To determine an examinee’s level of achievement in relation to his age and ability.

d) To provide a basis for selection, promotion, and termination.

e) To group students into relatively homogenous groups for instructional purpose.

f) To determine rate of progress by comparing present and past achievement.

g) To diagnose learning difficulties.

h) To evaluate the results of a method of instruction.

i) To evaluate teachers’ success in teaching students.

j) To provide a basis for counseling with parents as well as with students.

k) To provide a basis for grading.

l) To compare the status of instructional units (schools, classroom, cities, countries, states, etc.).

m) To diagnose a given schools’ strengths and weakness.

n) To diagnose a given school entrants.

o) To determine, in part, the efficiency of certain administrative policies.

p) To predict future success as well as present readiness.

q) To act as an adjunct to instruction and as a teaching tool.

r) To act as a motivating device.[14]

Fundamentally, achievement tests have different features in nature from aptitude tests. An aptitude test is primarily designed to predict success in some future learning activity, whereas achievement test is designed to indicate degree of success in some past learning activity.[15] From a comparison above, it can be comprehended that a distinction founded between these two tests is made in terms of the use of the results rather than of the qualities of the test themselves

In order to have a good achievement test form, it should be considered that achievement test must be constructed well by paying attention to some following basic principles providing a firm base for constructing and using classroom tests as a positive force in the teaching learning process:

§ Achievement tests should measure clearly defined learning outcomes that are in harmony with the instructional objectives

§ Achievement tests should measure a representative sample of the learning tasks included in the instruction.

§ Achievement test should include the types of tests items that are most appropriate for measuring the desired learning outcomes.

§ Achievement tests should fit the particular uses that will be made of the results.

§ Achievement test should be made as reliable as possible and should then be interpreted with caution.

§ Achievement tests should improve student learning. [16]

3) General Proficiency Test

Language proficiency tests are designed to measure control of language or cultural items and communication skills already present at the time of testing, irrespective of formal training.[17]

The proficiency test also measures what students have learned, but the aim of the proficiency test is to determine whether this language ability corresponds to specific language requirements. For example, is the student able to read professional literature in another language with a specific level (such as 90 percent) of accuracy?[18]

Proficiency tests are designed to measure people’s ability in a language, regardless of any training they may have had in that language. The content of a proficiency test, therefore, is not based on the content or objectives of language courses that people taking the test may have followed. Rather, it is based on a specification of what candidates have to be able to do in the language in order to be considered proficient. In the case of some proficiency tests, ‘proficient’ means having sufficient command of the language for a particular purpose.[19]

The aim of a proficiency test is to assess the student’s ability to apply in actual situations what he has learnt. This type of test is not usually related to any particular course because it is concerned with the student’s current standing in relation to his future needs. A proficiency test is the most suitable vehicle for assessing English for specific purposes (ESP).[20]

Proficiency tests normally measure a broad range of language skills and competence, including structure, phonology, vocabulary, integrated communication skills, and cultural insight. There is also proficiency test, which includes the appropriateness of language of language usage in its specified social context, in other words, communicative competence.[21]

b. Based on Test Maker

1) Standardized Test

Standardized tests are a test which presupposes certain standard objectives, or criteria, that are held constant across one form of the test to another. The criteria in large-scale standardized tests are designed to apply to a broad band of competencies that are usually not exclusive to one particular curriculum.[22] In order to have good standardized tests, those should be produced with a thorough process of empirical research and development.

Standardized tests focus on general skills and content those are included among the educational objectives of virtually all school districts. Standardized test must span a much wider range of content than most teacher-constructed tests.[23]

A standardized test has certain distinctive features. These include a fixed set of test items designed to measure a clearly defined sample of behavior, specific directions for administrating and scoring the test, and norms based on representative groups of individuals like those for whom the test was designed.[24]

Standardized tests usually consist of a set of materials including, (1) a test booklet with the test items and instruction to the test taker, (2) an answer sheet, (3) an administration manual containing instructions on how to administer the test, (4) a technical manual with information on uses o the test, how it was developed, how it is to be scored, and how the scores can be interpreted.[25]

The best standardized tests are carefully developed and refined by means of editorial writing and item analysis from a field testing so that every item functions well. Intrinsic ambiguity should be removed, and implausible distracters are modified or replaced.[26]

A standardized test should be the product of a carefully conducted program of research and development. Such a program involves the work of many persons and includes the following steps.

a) Considering preliminary planning and marketing.

b) Developing test blueprint and item drafts.

c) Designing and professionally producing test items, materials, answer documents, and directions.

d) Pre testing items; collecting and analyzing data on them.

e) Selecting items for the final forms and professionally producing standardization edition.

f) Locating schools willing to participate in standardization and conducting standardization testing.

g) Collecting and analyzing standardization data and preparing norms tables, collecting and analyzing data for reliability and validity studies.

h) Professionally producing the final forms of the test and writing test manuals.

i) Marketing and selling the final edition.

j) Conducting post-publication special studies and developing special technical publications.[27]

2) Teacher Made Test

Teacher made tests focus on a much more restricted range of content than standardized tests; they usually reflect a particular unit of study or a semester of study.[28] Teachers usually feel that standardized test do not adequately measure their own or the local objectives of instruction.[29] Consequently, teacher must construct their own tests based on instructional objectives or subject matters having been learnt by students.

Teacher made tests can be constructed to measure how well a specific set of objective has been met, something that standardized tests are not expected to do.[30]

Teacher-made tests would provide some functions or information benefited by teacher. From teacher-made tests, teachers can see how well students have mastered a limited unit of instruction, they can determine the extent to which distinctive local objectives have been achieved, and teacher-made tests can provide a basis for assigning course marks.

It should be clear that teacher-made and standardized tests complement each other. They serve related but somewhat different purposes. Both kinds of test are needed for an adequate evaluation of educational achievement by individual students, school, and school districts. [31]

c. Based on the Way of Scoring

1) Objective Test

The objective test is a test, which is highly structured requires the pupils to supply a word or two, or to select the correct answer from among a limited number of alternatives.[32]

An objective item is one for which there a specific correct response is; therefore, whether the item is scored by one teacher or another, whether it is scored today or last week, it is always scored the same way.[33]

Objective tests usually have only one correct answer, they can be scored mechanically. But, objective tests require far more careful preparation than subjective tests. Objective tests are frequently criticized on the ground that they are simpler to answer than subjective tests. Items in objective test however, can be made just as easy or as difficult as the test constructor wishes.

The objective test includes a variety of item types. Objective test items can be classified into supply type and selection type.

a) Supply type.

The supply type test requires pupil to supply the answer. This type is also known as ‘short answer’ or ‘completion’. The short answer and completion are essentially the same. They differ only in the method of presenting the problem. In the case of the short answer item consists of an incomplete statement.[34]

The short answer item and the completion item both are supply-type test items that can be answered by a word, phrase, number, or symbol. They are essentially the same, differing only in the method of presenting the problem. The short answer item uses a direct question, whereas the completion item consists of an incomplete statement.[35]

Example of the short answer item:[36]

§ What is the name of the first President of Republic of Indonesia?

(Ir. Soekarno).

Example of the completion item:

§ The name of the first President of the republic of Indonesia is………

(Ir. Soekarno).

The short answer item is subject to a variety of defects, even though it is considered one of the easiest to construct. The following suggestion will avoid possible pitfalls and provide greater assurance that the items will function as intended:

§ Word the item so that the required answer is both brief and specific.

§ Do not take statements directly from textbooks to use as a basis for short answer item.

§ A direct question is generally more desirable than an incomplete statement.

§ If the answer is to be expressed in numerical units, indicate the type of answer wanted.

§ Blanks for answers should be equal in length and in a column to the right of the question.

§ When completion items are used, do not include too many blanks.[37]

b) Selection type.

The selection type test requires pupil to select the answer from a given number of alternatives. This type can be further subdivided into: True false, Matching, Multiple choice.[38]

§ True false

A true false item consists of a declarative statement and the student responds ‘true’ if it conforms to accepted truth, or ‘false’ if it is essentially incorrect. True false items are also referred to as alternative-response items [39]

The alternative-response test item consists of a declarative statement that the pupil is asked to mark true or false, right or wrong, correct or incorrect, yes or no, fact or opinion, agree or disagree and the like. In each case there are only two possible answers.[40]

True false item doesn’t directly test writing or speaking abilities: only listening or reading. It may be used to test aspects of language such as vocabulary, grammar, content of a reading or listening passage. It is fairy easy to design; it is also easy to administer, whether orally or in writing, and to mark.[41]

The most common uses of the true false item are:

§ To measure the ability to identify the correctness of statements of fact, definition of terms, statement of principles, and the like. [42]

Example:

Directions:

Read the following statement. If the statement is true, circle the T, if the statement is false, circle the F.

(T) (F) 1. The green coloring material in a plant leaf is called chlorophyll.

§ To measure the pupil’s ability to distinguish fact from opinion[43]

Example:

Direction:

Read the following statement. If the statement is a fact, circle the F. If the statement is an opinion, circle the O.

(F) (O) 1. Other countries should adopt a constitution like that of the United States

§ To measure aspect of understanding, that is, the ability to recognize cause-and-effect relationships. This type of item usually contains two true propositions in one statement, and the pupil is to judge whether the relationship between them is true or false.[44]

Example:

Direction:

In the following statement, both parts of the statement are true. You are to decide whether the second part explains why the first part is true. If it does, circle Y, If it doest not, circle N.

(Y) (N) 1. Some plants do not need sunlight because they get their food from other plants.

§ To measure the simple aspect of logic.[45]

Example:

Direction:

Read the following statement. If the statement is true, circle the T; if it is false circle the F. Also, if the converse of the statement is true, circle the CT; if the converse is false, circle the CF; be sure to give two answers for each statement.

(T) (F) (CT) (CF) 1. All tress are plants.

§ Matching

The matching exercise consists of two parallel columns with each word, number, or symbol in one column being matched to a word, sentence, or phrase in the other column. The items in the column for which a match is sought are called premises and the items in the column from which the selection is made are called responses.[46] Matching items are useful in measuring students’ ability to make associations, discern relationships, and make interpretations or measure knowledge of series of facts.

For example:

Direction:

On the line to the left of each province listed in column I, write the letter of the capital city in column II. Each capital city is using one or not at all.

Column I

Provinces

( ) 1. Central Java

( ) 2. Central Kalimantan

( ) 3. East Java

( ) 4. Irian Jaya

( ) 5. North Sumatra

( ) 6. South Kalimantan

( ) 7. South Sulawesi

Column II

Capital cities

A. Bandung

B. Banjarmasin

C. Jayapura.

D. Medan.

E. Palangkaraya.

F. Samarinda.

G. Semarang.

H. Surabaya.

I. Ujung Pandang.

The advantages of using matching items are that the matching items can be used for a large quantity of associated factual material to be measured in a small amount of space while students’ time needed to respond is relatively short.[47]

The following suggestions are designed to construct matching exercises:

§ Use only homogeneous material in a single matching exercise.

§ Include an unequal number of responses and premises, and instruct the student that responses may be used once, more than once, or not at all.

§ Keep the list of items to be matched brief, and place the shorter responses on the right.

§ Arrange the list of responses in logical order: place words in alphabetical order and numbers in sequence.

§ Indicate in the directions the basis for matching the responses and premises.

§ Place all of the items for one matching exercise on the same page.[48]

§ Multiple choice

The multiple choices item consists of a premise and a set of alternatives. The premise, known as the “stem”, is presented as a question or incomplete statement which the student answers or completes by selecting one of several alternatives. Usually either four or five alternatives (also called options or choices) are available.[49]

The pupil is typically requested to read the stem and the list of alternatives and to select the one correct, or best alternative. The correct alternative in each item is called merely the answer, while the remaining alternatives are called distracters. [50] These incorrect alternatives receive their name from their intended function-to distract those students who are in doubt about the correct answer.[51]

Some of the more typical uses of the multiple choice form in measuring knowledge outcomes common to most school subjects:

§ Knowledge of terminology

For this purpose, the pupil can be requested to show his knowledge of a particular term by selecting a word which has the same meaning as the given term or by selecting a definition of the term.[52]

For example:

1. Which one of the following words has the same meaning as the word “plush”?

a. Smart and confident.

b. Wet and dirty.

c. Expensive and comfortable

d. Poor and sad.

§ Knowledge of specific facts

This type provides a necessary basis for developing understanding, thinking skills, and other complex learning outcomes. Multiple choice items designed to measure specific facts can take many different forms, but questions of who, what, when, and where variety are most common.[53]

For example:

1. Who is the latest prophet of Islam?

a. Isa AS.

b. Muhammad SAW.

c. Yahya AS.

d. Ilyasa AS.

§ Knowledge of principles

Multiple choice items can be constructed to measure knowledge of principles as easily as those designed to measure knowledge of specific facts. The items appear a bit more difficult but this is because principles are more complex than isolated facts.[54]

For example:

1. Which one of the following principles of taxation is characteristic of the federal income tax?

a. The benefits received by an individual should determine the amount of the tax.

b. A tax should be based on an individual’s ability to pay.

c. All citizens should be required to pay the same amount of tax.

d. The amount of tax an individual pays should be determined by the size of the federal budget.

§ Knowledge of methods and procedures

This multiple choice form is also be able to measure the knowledge of laboratory procedures; knowledge of methods underlying communication, computational, and performance skills; knowledge of methods used in problem solving; knowledge of governmental procedures; and knowledge of common social practices.[55]

For example:

1. If you were making a scientific study of a problem, your first step should be to?

a. Collect information about the problem.

b. Develop hypotheses to be tested.

c. Design the experiment to be conducted.

d. Select scientific equipment.

The multiple choice item is generally recognized as the most widely applicable and useful type of objective test item. It can more effectively measure many of the simple learning outcomes measured by the short item or completion, the true false item and matching item.[56]

The following list shows some reasons, why do teachers, schools, and assessment organizations use multiple choice items so often?

§ Multiple choice tests are fast, easy, and economical to score. In fact, they are machine scorable.

§ They can be score objectively and thus may give the test appearance of being fairer and/or more reliable than subjectively scored tests.

§ They “look like” tests and may thus seem to be acceptable by convention.

§ They reduce the chances of learners guessing the correct answer, in comparison to true false.[57]

Even though the types of objective test items are various, they have one feature in common which distinguished them from the essay test. That is, they present the pupil with a highly structured task which limits the type of response the pupil can make. The pupil is not free to redefine the problem or to organize and present the answer in his own words[58].

2) Subjective Test

Subjective test is one that does not have a single right answer. A short composition or an impromptu interview may be scored in different ways by different teachers, and even by the same teacher scoring the answer twice under different circumstances. Test questions where students may give a variety of responses, each somewhat different from the other.[59]

The most well know item type for subjective test is the essay test, it requires examinees to read the question, formulate his response and express the response on his own words.[60] Essay items permit the testing of a student’s ability to organize ideas and thoughts and allow for creative verbal expression.[61]

Typical key words in the questions set in examinations of this kind are: ‘discuss’, ‘compare’, ‘contrast’, ‘describe’, the answers they elicit may range from a single sentence to a dozen or more paragraph. These answers are commonly called ‘essays’, the question ’essay questions’, and the whole examination is of the ‘essay type’.[62]

The essay test, based on the amount of freedom of response, is subdivided into two types:

a) Extended response type.

In the extended response type, it permits the pupil to decide which facts he thinks are most pertinent, to select his own method of organization, and to write as much as he deems necessary to provide a comprehensive answer.[63] Students are given almost complete freedom in making their response.

For example:

1. Compare the strengths and the weakness of the multiple choice test and essay question?

b) Restricted response type.

The restricted response question usually limits both the content and the response. The content is usually restricted by the scope of the topic to be discussed. Limitations on the form of response are generally indicated in the question.[64]

In the restricted response type, the pupil is not given complete freedom in making his response.[65]

For example:

1. State three advantages of saving money in the bank?



[1] Wilmar Tinambunan, Evaluation of Students Achievement…, p. 7.

[2] Arthur Hughes, Testing for Language Teachers, (New York: Cambridge University Press, 2003), p.11.

[3] Rebecca M. Vallette, Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc, 1977), p. 5.

[4] Robert L. Linn and Norman E. Grondlund, Measurement and Assessment in Teaching, (New Jersey: Prentice Hall, Inc, 1995), p. 391.

[5] Kenneth D. Hopkins, Educational and Psychological Measurement and Evaluation, (Boston: Walsh & Associates, Inc, 1998), p. 369.

[6] Robert L. Linn and Norman E. Grondlund, Measurement and Assessment…, p.391.

[7] David P. Harris, Testing English as a Second Language, New Delhi: McGraw Hill, 1977, p. 2.

[8] Andrew Harrison, A Language Testing Handbook, (London: Macmillan Press, 1983), p. 7.

[9] Arthur Hughes, Testing for Language Teachers, (New York: Cambridge University Press, 2003), p. 13.

[10] Rebecca M. Vallette, Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc, 1977), p. 5.

[11] Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: Depdikbud, 1998), p.7.

[12] Arthur Hughes, Testing for…, pp. 13-14.

[13] Robert L. Ebel and David A. Frisbie, Essential of Educational Measurement, (New Jersey: Prentice Hall, 1991), p. 24.

[14] John E. Horrocks, Assessment of Behavior, (Ohio: Charles E. Merrill Publishing Company, 1964), pp. 484-485.

[15] Wilmar Tinambunan, Evaluation of…, p. 7.

[16] Norman E. Gronlund, Constructing Achievement Tests, New Jersey: Prentice Hall, Inc., 1968, p. 8.

[17] Mary Finnochiaro and Sydney Sako, Foreign Language…, p. 21.

[18] Rebecca M. Vallette, Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc, 1977), p. 6.

[19] Arthur Hughes, Testing for Language Teachers, (New York: Cambridge University Press, 2003), p. 11.

[20] Andrew Harrison, A Language Testing Handbook, (London: Macmillan Press, 1983), pp. 7-8.

[21] Mary Finnochiaro and Sydney Sako, Foreign Language…, p. 22

[22] H. Douglas Brown, Language Assessment, Principle and Classroom (principle & Classroom Practice), ( New York: Pearson Education Inc., 2004), p. 67

[23] Kenneth D. Hopkins, Educational and Psychological Measurement and Evaluation, (Boston: Walsh & Associates, Inc, 1998), p. 368.

[24] Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Co., Inc., 1976), p. 287.

[25] Freed Genesee and John A. Upshur, Classroom-Based Evaluation in Second Language Education, (New York: Cambridge University Press, 1996), p. 233.

[26] Kenneth D. Hopkins, Educational and Psychological Measurement and Evaluation, (Boston: Walsh & Associates, Inc, 1998), p. 368.

[27] Anthony J. Nitko, Educational Tests And Measurement an Introduction, (New York: Harcourt Brace Jovanovich, Inc, 1983), p. 468.

[28] Kenneth D. Hopkins, Educational and Psychological Measurement…, p. 368.

[29] Victor H. Noll, Introduction to Educational Measurement, Boston: Hougthon Mifflin Company, 1965, p. 125

[30] Charles D. Hopkins and Richard L. Antes, Classroom Testing: Construction, (Illinois, F.E Peacock Publishers, Inc, 1979), p. 9.

[31] Kenneth D. Hopkins, Educational and Psychological Measurement…, p. 369.

[32] Wilmar Tinambunan, Evaluation of Student…, p. 55

[33] Rebecca M. Vallette, Modern Language Testing, (New York: Harcourt Brace Jovanovich, Inc, 1977), p. 10.

[34] Wilmar Tinambunan, Evaluation of Student…, p. 55

[35] Robert L. Linn and Norman E. Grondlund, Measurement and Assessment in Teaching, (New Jersey: Prentice Hall, Inc, 1995), p. 148.

[36] Wilmar Tinambunan, Evaluation of Student…, p. 61.

[37] Robert L. Linn and Norman E. Grondlund, Measurement and…, p. 154.

[38] Wilmar Tinambunan, Evaluation of Student…, p. 55.

[39] Wilmar Tinambunan, Evaluation of Student…, p. 70.

[40] Norman E. Gronlund, Measurement and Evaluation in Teaching, (New York: Macmillan Publishing Co., Inc., 1981), p. 162.

[41] Penny Ur, A Course in Language Testing, (New York: Cambridge University Press, 1996), p. 39.

[42] Wilmar Tinambunan, Evaluation of Student…, pp. 70.

[43] Norman E. Gronlund, Measurement and…, p. 163.

[44] Wilmar Tinambunan, Evaluation of Student…, p. 71.

[45] Wilmar Tinambunan, Evaluation of Student…, p. 71.

[46] Norman E. Gronlund, Measurement and…, p. 170.

[47] Wilmar Tinambunan, Evaluation of Student…, pp. 65.

[48] Robert L. Linn and Norman E. Grondlund, Measurement and Assessment in Teaching, (New Jersey: Prentice Hall, Inc, 1995), pp. 168-170.

[49] Wilmar Tinambunan, Evaluation of Student…, p. 74.

[50] Norman E. Gronlund, Measurement and…, p. 178

[51] Robert L. Linn and Norman E. Grondlund, Measurement and…, pp. 173-174.

[52] Norman E. Gronlund, Measurement and…, p. 180.

[53] Robert L. Linn and Norman E. Grondlund, Measurement and…, pp. 176-177.

[54] Norman E. Gronlund, Measurement and…, p. 182.

[55] Robert L. Linn and Norman E. Grondlund, Measurement and…, p. 178.

[56] Wilmar Tinambunan, Evaluation of Student Achievement, (Jakarta: Dept. P&K Dirjen. Pendidikan Tinggi Proyek Pengembangan Lembaga Pendidikan Tenaga Kependidikan, 1998), p.75.

[57] Kathleen M. Bailey, Learning about Language Assessment: Dilemmas, Decisions, And Directions, (New York: Heinle&Heinle Publishers, 1998), pp.130-131.

[58] Wilmar Tinambunan, Evaluation of Student…, p. 55.

[59] Rebecca M. Vallette, Modern Language…, p. 10.

[60] Wilmar Tinambunan, Evaluation of Student…, p. 56.

[61] Anthony J. Nitko, Educational Tests And Measurement an Introduction, (New York: Harcourt Brace Jovanovich, Inc, 1983), p. 21.

[62] A.E.G. Pilliner, Language Testing Symposium, Great Britain: Headley Brothers LTD., 1976, p. 19.

[63] Wilmar Tinambunan, Evaluation of Student…, p. 56.

[64] Robert L. Linn and Norman E. Grondlund, Measurement and Assessment in Teaching, (New Jersey: Prentice Hall, Inc, 1995), p. 220.

[65] Wilmar Tinambunan, Evaluation of Student…, p. 56.