Criterion-related va...

Summarize result (50%)

Criterion-related validity
The second form of evidence of a test's construct validity relates to the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate's ability.Content validity, concurrent validity and predictive validity all have part to play in the development of a test.Whether a or not a particular level of agreement is regarded as satis- factory will depend upon the purpose of the test and the importance of the decisions that are made on the basis of it. If, for example, a test of
oral ability was to be used as part of the selection procedure for a high level diplomatic post, then a coefficient of 0.7 might well be regarded as too low for shorter test to be substituted for a full and thorough test of oral ability.These students would then be subjected to the full 45 minute oral component necessary for coverage of all the functions, using perhaps four scorers to ensure reliable scoring (see next chapter).Of course, if ten minutes really is all that can be spared for each student, then the oral component may be included for the contribution that it makes to the assessment of students' overall achievement and for its backwash effect.There are, in fact, standard procedures for comparing sets of scores in this way, which generate what is called a 'correlation coefficient' (or, when we considering validity, a 'validity coefficient') - a mathematical measure of similarity.If the comparison between the two sets of scores reveals a high level of agreement, then the shorter version of the oral component may be considered valid, inasmuch as it gives results similar to those obtained with the longer version.If, on the other hand, the two sets of scores show little agreement, the shorter version cannot be considered valid; it cannot be used as a dependable measure of achievement with respect to the functions specified in the objectives.This is partly because of the other factors, and partly because those students whose English the test predicted would be inadequate are not normally permitted to take the course, and so the test's (possible) accuracy in predicting problems for those students goes unrecognised*.For instance, in developing an English placement test for language schools, Hughes et al (1996) vali-dated test content against the content of three popular course books used by language schools in Britain, compared students' performance on number of language schools, and then examined the success of the test in placing students in. classes.The question then arises: can such a ten-minute session give a sufficiently accurate estimate of the student's ability with respect to the functions specified in the course objectives?A test may be validated against, for example, teachers' assessments of their students, provided that the assessments themselves can be relied on. This would be appropriate where a test was developed that claimed to be measuring something different from all existing tests.To exemplify this kind of validation in achievement testing, let us consider a situation where course objectives call for an oral component as part of the final achievement test.From the point of view of content validity, this will depend on how many of the functions are tested in the component, and how representative they are of the complete set of functions included in the objectives.The criterion measure here might be an assessment of the student's English as perceived by his or her supervisor at the university, or it could be the outcome of the course (pass/fail etc.).Should we rely on the subjec-tive and untrained judgements of supervisors?This could well be impractical.

Original text

Criterion-related validity
The second form of evidence of a test's construct validity relates to the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate's ability. This independent assessment is thus the criterion measure against which the test is validated. There are essentially two kinds of criterion related validity: concur-rent validity and predictive validity. Concurrent validity is established when the test and the criterion are administered at about the same
time. To exemplify this kind of validation in achievement testing, let us consider a situation where course objectives call for an oral component as part of the final achievement test. The objectives may list a large number of 'functions' which students are expected to perform orally, to test all of which might take 45 minutes for each student. This could well be impractical. Perhaps it is felt that only ten minutes can be devoted to each student for the oral component. The question then arises: can such a ten-minute session give a sufficiently accurate estimate of the student's ability with respect to the functions specified in the course objectives? Is it, in other words, a valid measure? From the point of view of content validity, this will depend on how many of the functions are tested in the component, and how representative they are of the complete set of functions included in the objectives. Every effort should be made when designing the oral component to give it content validity. Once this has been done, however, we can go further. We can attempt to establish the concurrent validity of the component. To do this, we should choose at random a sample of all the students taking the test. These students would then be subjected to the full 45 minute oral component necessary for coverage of all the functions, using perhaps four scorers to ensure reliable scoring (see next chapter). This would be the criterion test against which the shorter test would be judged. The students' scores on the full test would be compared with the ones they obtained on the ten-minute session, which would have been conducted and scored in the usual way, without knowledge of their performance on the longer version. If the comparison between the two sets of scores reveals a high level of agreement, then the shorter version of the oral component may be considered valid, inasmuch as it gives results similar to those obtained with the longer version. If, on the other hand, the two sets of scores show little agreement, the shorter version cannot be considered valid; it cannot be used as a dependable measure of achievement with respect to the functions specified in the objectives. Of course, if ten minutes really is all that can be spared for each student, then the oral component may be included for the contribution that it makes to the assessment of students' overall achievement and for its backwash effect. But it cannot be regarded as an accurate measure in itself.
References to 'a high level of agreement' and 'little agreement' raise the question of how the level of agreement is measured. There are, in fact, standard procedures for comparing sets of scores in this way, which generate what is called a 'correlation coefficient' (or, when we considering validity, a 'validity coefficient') - a mathematical measure of similarity. Perfect agreement between two sets of scores will result in a coefficient of 1. Total lack of agreement will give a coefficient of zero. To get a feel for the meaning of a coefficient between these two extremes, read the contents of the box on page 29. Whether a or not a particular level of agreement is regarded as satis- factory will depend upon the purpose of the test and the importance of the decisions that are made on the basis of it. If, for example, a test of
oral ability was to be used as part of the selection procedure for a high level diplomatic post, then a coefficient of 0.7 might well be regarded as too low for shorter test to be substituted for a full and thorough test of oral ability. The saving in time would not be worth the risk of appointing someone with insufficient ability in the relevant foreign language. On the other hand, a coefficient of the same size might be perfectly acceptable for a brief interview forming part of a placement test?. It should be said that the criterion for concurrent validation is not necessarily a proven, longer test. A test may be validated against, for example, teachers' assessments of their students, provided that the assessments themselves can be relied on. This would be appropriate where a test was developed that claimed to be measuring something different from all existing tests. The second kind of criterion-related validity is predictive validity.
This concerns the degree to which a test can predict candidates' future performance. An example would be how well a proficiency test could predict student's ability to cope with a graduate course at a British university. The criterion measure here might be an assessment of the student's English as perceived by his or her supervisor at the university, or it could be the outcome of the course (pass/fail etc.). The choice of criterion measure raises interesting issues. Should we rely on the subjec-tive and untrained judgements of supervisors? How helpful is it to usefinal outcome as the criterion measure when so many factors other than ability in English (such as subject knowledge, intelligence, motivation, health and happiness) will have contributed to every outcome? Where outcome is used as the criterion measure, a validity coefficient of around 0.4 (only 20 per cent agreement) is about as high as one can expect. This is partly because of the other factors, and partly because those students whose English the test predicted would be inadequate are not normally permitted to take the course, and so the test's (possible) accuracy in predicting problems for those students goes unrecognised*. As a result, validity coefficient of this order is generally regarded as satisfactory. The Further reading section at the end of the chapter gives references to the reports on the validation of the British Council's ELTS test (the predecessor of IELTS), in which these issues are discussed at length. Another example of predictive validity would be where an attempt was made to validate a placement test. Placement tests attempt to predict the most appropriate class for any particular student. Validation would involve an enquiry, once courses were under way, into the proportion of students who were thought to be misplaced. It would then be a matter of comparing the number of misplacements (and their effect on teaching and learning) with the cost of developing and administering a test that would place students more accurately. Content validity, concurrent validity and predictive validity all have part to play in the development of a test. For instance, in developing an English placement test for language schools, Hughes et al (1996) vali-dated test content against the content of three popular course books used by language schools in Britain, compared students' performance on number of language schools, and then examined the success of the test in placing students in. classes. Only when this process was complete (and minor changes made on the basis of the results obtained) was the test published.

Latest summaries

يعتبر مفهوم الب...

يعتبر مفهوم البيروقراطية من المفاهيم المتداولة ، ولكنها تعنى فى كثير من الأحيان أشياء مختلفة عن المع...

4.5 : توصيات ت...

4.5 : توصيات تصميم شبكة التنقيط يختلف من منطقة الى منطقة حسب مساحة الأرض المستهدفة والطوبوغرافية لذ...

The usage of so...

The usage of social media platforms such as Facebook, Twitter has been major source of news consumpt...

Constitutionali...

Constitutionalists focsed on basic principles of legal security country that does practise morality ...

هذا العصر نما ا...

هذا العصر نما الفقه وترعرع وزاد واشتهر تأصيلاً، وتقعيداً وكان لأبي حنيفة ومالك والشافعي وأحمد ،والأو...

Application Of ...

Application Of Hg in Bio-inorganic Chemistry Mercury Mercury is a chemical element with the symbol ...

الفرعية.والأودي...

الفرعية.والأودية الفرعية. اتوجد مناطق العلمية الأودية عادة في مناطق مرتفعة في جبال السراوات ونصب مي...

Learning a seco...

Learning a second language at an early age has both advantages and disadvantages. On the positive si...

Adolf Hitler: L...

Adolf Hitler: Leader of Nazi Germany. Adolf Hitler was born in 1889 in Austria. He joined the Nation...

C-Control-4 To...

C-Control-4 To ensure that the new HVAC system is meeting the needs of both customers and staff, w...

يسعى التربويون ...

يسعى التربويون لإدخال التكنولوجيا في تطوير تدريس الرياضيات والاحصاء، وذلك بتوفير وسائل وتقنيات تهدف ...

النكرة : اسم ش...

النكرة : اسم شاع في جنس موجود أو مقدر، وهي نوعان: الأول: ما يقبل «أل» المؤثرة للتعريف، والثاني: ما ...

Lakhasly

Summarize result (50%)

Original text

Summarize English and Arabic text online

Latest summaries