tests and evaluation -metodology

178
2007 Program postuniversitar de conversie profesională pentru cadrele didactice din mediul rural THE METHODOLOGY OF EVALUATION AND TESTING Dumitru DOROBĂŢ Specializarea LIMBA ŞI LITERATURA ENGLEZĂ Forma de învăţământ ID - semestrul IV

Upload: patricia-lupas-marcus

Post on 28-Dec-2015

85 views

Category:

Documents


0 download

DESCRIPTION

Book about testing -means of assessment, types of tests.

TRANSCRIPT

Page 1: Tests and Evaluation -Metodology

Toţi copiii din mediul rural trebuie să meargă mai departe!

Tu îi poţi ajuta!

Toţi copiii din mediul rural trebuie să meargă mai departe!

Tu îi poţi ajuta!Program cofinanţat de Guvernul României, Banca Mondială şi comunităţile rurale.

Unitatea de Management a Proiectului pentru Învăţământul Rural

Str. Spiru Haret nr. 10-12, etaj 2,sector 1, cod poºtal 010176, Bucureºti

Tel: 021 305 59 99Fax: 021 305 59 89

http://rural.edu.roe-mail: [email protected]

Ministerul Educaţiei şi Cercetării

ISBN 00 000-0-00000-0;ISBN 00 000-000-0-00000-0.

TH

E M

ET

HO

DO

LO

GY

OF E

VA

LU

AT

ION

AN

D T

ES

TIN

GD

um

itru

DO

RO

Ţ2

00

7

2007

Program postuniversitar de conversie profesionalăpentru cadrele didactice din mediul rural

THE METHODOLOGYOF EVALUATION AND TESTING

Dumitru DOROBĂŢ

Specializarea LIMBA ŞI LITERATURA ENGLEZĂ

Forma de învăţământ ID - semestrul IV

Page 2: Tests and Evaluation -Metodology

Ministerul Educaţiei şi Cercetării

Proiectul pentru Învăţământul Rural

LIMBA ŞI LITERATURA ENGLEZĂ

The Methodology of Evaluation and Testing

Dumitru DOROBĂŢ

2007

Page 3: Tests and Evaluation -Metodology

© 2007 Ministerul Educaţiei şi Cercetării Proiectul pentru Învăţământul Rural Nici o parte a acestei lucrări nu poate fi reprodusă fără acordul scris al Ministerului Educaţiei şi Cercetării ISBN 978-973-0-04814-8

Page 4: Tests and Evaluation -Metodology

Table of contents

Proiectul pentru Învăţământ Rural i

TABLE OF CONTENTS Introduction....................................................................................................................... vi Unit 1 Introduction to Language Testing.................................................................................... 1 1.1 Unit Objectives ............................................................................................................. 1 1.2 Assessment. Testing. Evaluation .................................................................................. 2 1.3 Setting Testing Parameters ........................................................................................... 4 1.4 Participants in Testing .................................................................................................. 5 1.4.1 The Tester .................................................................................................................. 6 1.4.2 The Test Takers/ The Testees.................................................................................... 6 1.4.3 The Test User ............................................................................................................ 6 1.5 The Beneficiaries of Testing ......................................................................................... 6 1.6 The Overall Impact of Testing in Students’ Motivation .................................................. 7 1.7 Summary ....................................................................................................................... 9 1.8 Key Concepts ................................................................................................................ 9 1.9 Checklist ........................................................................................................................ 9 1.10 Answers to SAQs ...................................................................................................... 10 1.11 Further Readings....................................................................................................... 10 Unit 2 Conditions of a Good Test.............................................................................................. 11 2.1 Unit Objectives ........................................................................................................... 11 2.2 Principles of Good Practice for Assessing Student Learning ...................................... 13 2.3 Validity ........................................................................................................................ 14 2.3.1 Content Relevance .................................................................................................. 16 2.3.2 Content Coverage ................................................................................................... 16 2.3.3 Face Validity ............................................................................................................ 16 2.3.4 Content Validity ....................................................................................................... 16 2.3.5 Predictive Validity .................................................................................................... 17 2.3.6 Construct Validity ..................................................................................................... 17 2.3.7 Curricular Validity .................................................................................................... 19 2.3.8 Criterion Related Validity ......................................................................................... 19 2.3.9 Concurrent Validity .................................................................................................. 20 2.4 Reliability .................................................................................................................... 21 2.4.1 Measuring Reliability ................................................................................................ 21 2.4.1.1 Test-Retest Method .............................................................................................. 21 2.4.1.2 Parallel Forms of the Test to the Same Group ..................................................... 22 2.4.1.3 The Split-Half Method ........................................................................................... 22 2.4.1.4 Factors that Affect Language Scores .................................................................... 23 2.4.1.5 Test Length............................................................................................................ 26 2.5 Discrimination ............................................................................................................. 27 2.6 Feasibility .................................................................................................................... 28 2.7 Washback ................................................................................................................... 29 2.7.1 Negative Washback ................................................................................................. 30 2.7.2 Positive Washback .................................................................................................. 30 2.8 Summary ..................................................................................................................... 31 2.9 Key Concepts .............................................................................................................. 31

Page 5: Tests and Evaluation -Metodology

Table of contents

ii Proiectul pentru Învăţământ Rural

2.10 Checklist ...................................................................................................................31 SAA 1.................................................................................................................................32 2.11 Answers to SAQs ......................................................................................................33 2.12 Further Readings ......................................................................................................34 Unit 3 Types of Tests I ...............................................................................................................35 3.1 Unit Objectives ............................................................................................................35 3.2 Informal Assessment ...................................................................................................36 3.2.1 Informal Assessment of Speaking ............................................................................37 3.2.2 Informal Assessment of Writing ................................................................................38 3.2.3 Informal Assessment of Listening ............................................................................38 3.2.4 Informal Assessment of Reading .............................................................................39 3.2.5 Informal Assessment of Non – Linguistic Factors ....................................................39 3.2.6 Informal Assessment of Grammar and Vocabulary ..................................................39 3.3 Formal Assessment - Types of Tests and Testing ......................................................40 3.3.0 Classification by Stimulus Material ...........................................................................40 3.3.1 The purpose, or use, for which they are intended i.e. the types of decisions to be

made function of the scores ....................................................................................41 3.3.1.1 Selection Tests ......................................................................................................41 3.3.1.2 Entrance Tests ......................................................................................................43 3.3.1.3 Readiness Tests ...................................................................................................43 3.3.1.4 Placement Tests ...................................................................................................43 3.3.1.5 Diagnostic Tests ....................................................................................................44 3.3.1.6 Progress Tests ......................................................................................................45 3.3.1.7 Achievement/ Attainment Tests ............................................................................46 3.3.1.8 Mastery Tests ........................................................................................................46 3.3.2 Function of Content ..................................................................................................48 3.3.2.1 Proficiency Tests ...................................................................................................48 3.3.2.2 Achievement or Attainment Tests ..........................................................................52 3.3.2.3 Aptitude or Prognostic Tests .................................................................................53 3.3.3 The frame of reference .............................................................................................54 3.3.3.1 Norm-Referenced Tests ........................................................................................55 3.3.3.2 Criterion – Referenced Tests .................................................................................56 3.4 Summary .....................................................................................................................57 3.5 Key Concepts ..............................................................................................................58 3.6 Checklist .....................................................................................................................58 3.7 Answers to SAQs ........................................................................................................58 3.8 Further Readings ........................................................................................................60 Unit 4 Types of Tests II ..............................................................................................................61 4.1 Unit Objectives ............................................................................................................61 4.2 Formal Assessment - Types of Tests and Testing ......................................................62 4.2.1 Scoring Procedures ..................................................................................................62 4.2.1.1 Subjective Tests ...................................................................................................63 4.2.1.2 Objective Test .......................................................................................................63 4.2.1.3 Performance Tests ................................................................................................66 4.2.2 The Specific Technique or Method They Employ .....................................................67 4.2.2.1 Multiple Choice, Completion, Dictation, Cloze Tests .............................................67

Page 6: Tests and Evaluation -Metodology

Table of contents

Proiectul pentru Învăţământ Rural iii

4.2.3 The Approach to Test Construction ......................................................................... 79 4.2.3.1 Direct Tests .......................................................................................................... 79 4.2.3.2 Indirect Tests ........................................................................................................ 79 4.2.4 Function of the Number of Elements Tested at a Time ............................................ 79 4.2.4.1 Discrete Point Tests .............................................................................................. 79 4.2.4.2 Integrative Tests ................................................................................................... 79 4.2.5 Speed Tests vs. Power Tests .................................................................................. 80 4.2.6 Other Test Categories ............................................................................................. 80 4.3 Self – Assessment ...................................................................................................... 80 4.4 Standardized Tests ..................................................................................................... 85 4.5 Summary .................................................................................................................... 88 4.6 Key Concepts ............................................................................................................. 88 4.7 Checklist ..................................................................................................................... 88 SAA 2 ............................................................................................................................... 89 4.8 Answers to SAQs ....................................................................................................... 89 4.9 Further Readings ........................................................................................................ 91 Unit 5 Testing the Language Skills I ......................................................................................... 92 5.1 Unit Objectives ........................................................................................................... 92 5.2 Testing Speaking ........................................................................................................ 92 5.2.1 What Is Speaking? ................................................................................................... 93 5.2.2 Types of Speaking Based on Content and Function................................................. 93 5.2.3 Objectives ................................................................................................................ 94 5.2.4 Types of Speaking Tests ......................................................................................... 95 5.3 Testing Listening ...................................................................................................... 101 5.3.1 How Do We Comprehend? .................................................................................... 102 5.3.2 Micro Skills ............................................................................................................ 102 5.3.3 Informal Evaluation ................................................................................................ 103 5.3.4 Scoring the Listening Test ...................................................................................... 106 5.4 Summary .................................................................................................................. 110 5.5 Key Concepts ........................................................................................................... 110 5.6 Checklist ................................................................................................................... 110 5.7 Answers to SAQs ..................................................................................................... 111 5.8 Further Readings....................................................................................................... 112 Unit 6 Testing the Language Skills II ..................................................................................... 113 6.1 Unit Objectives ......................................................................................................... 113 6.2 Testing Reading ....................................................................................................... 114 6.2.1 Types of Reading based on Content and Function ................................................ 114 6.2.2 Types of Reading based on Context and Processing Variables ............................ 114 6.2.3 Types of Reading according to Purpose ................................................................ 115 6.2.4 Cloze Passages ..................................................................................................... 116 6.2.5 Passages with Questions ...................................................................................... 117 6.2.6 Microskills .............................................................................................................. 117 6.2.7 True – False – Don’t Know Checks ....................................................................... 118 6.2.8 Other Reading Techniques .................................................................................... 118 6.2.9 Assessing Overall Comprehension ........................................................................ 118 6.2.10 Issues in Teaching Reading ................................................................................ 119

Page 7: Tests and Evaluation -Metodology

Table of contents

iv Proiectul pentru Învăţământ Rural

6.2.10.1 Narrative Text. Reading for Pleasure ................................................................120 6.2.10.2 Reading for Information .....................................................................................120 6.2.10.3 An Instructive Test ............................................................................................120 6.2.10.4 Types of Test Procedures .................................................................................121 6.3 Testing Writing ..........................................................................................................121 6.3.1 Conditions under which Writing Takes Place .........................................................122 6.3.2 Current Theories of Writing with Particular Reference to Foreign Language Writing .......................................................................................123 6.3.2.1 Writing as a Product ............................................................................................123 6.3.2.2 Writing as a Process ...........................................................................................124 6.3.2.3 Writing as a Social Activity ..................................................................................124 6.3.3 The Main Approach to Teaching Writing. Text – Based Approaches .....................125 6.3.3.1 Grammatical Form Practice ................................................................................125 6.3.3.2 A Communicative Approach ................................................................................125 6.3.3.3 Writer – Based Approach ....................................................................................125 6.3.4 Various Choices of Writing Tasks ..........................................................................126 6.3.4.1 Scoring Essay Type Tests ..................................................................................126 6.3.4.2 The Point Score Method .....................................................................................128 6.4 Summary ...................................................................................................................130 6.5 Key Concepts.............................................................................................................130 6.6 Checklist ...................................................................................................................131 SAA 3 ..............................................................................................................................131 6.7 Answers to SAQs ......................................................................................................132 6.8 Further Readings ......................................................................................................132 Unit 7 Testing the Language System and Beyond ................................................................133 7.1 Unit Objectives ..........................................................................................................133 7.2 Testing Pronunciation ................................................................................................133 7.3 Testing Grammar and Usage.....................................................................................138 7.3.1 Multiple- Choice Fill – In .........................................................................................138 7.3.2 Modify and Fill – In .................................................................................................138 7.4 Testing Vocabulary ....................................................................................................140 7.4.1 Cloze ......................................................................................................................142 7.4.2 Multiple – Choice Fill- In Type.................................................................................142 7.4.3 Multiple – Choice Synonym Type ...........................................................................143 7.4.4 Matching ................................................................................................................143 7.4.5 Simple Prompts ......................................................................................................143 7.4.6 Selection of the Words to Be Tested ......................................................................143 7.4.7 Translation .............................................................................................................143 7.4.8 True/ False..............................................................................................................143 7.4.9 Checklist Tests .......................................................................................................143 7.5 Testing Beyond Language Form ...............................................................................144 7.5.1 Discourse and Culture ............................................................................................145 7.5.2 Speech events .......................................................................................................147 7.5.3 Literature ................................................................................................................148 7.6 Summary ...................................................................................................................149 7.7 Key Concepts ............................................................................................................149 7.8 Checklist ...................................................................................................................149 SAA 4 ..............................................................................................................................150 7.9 Answers to SAQs ......................................................................................................150

Page 8: Tests and Evaluation -Metodology

Table of contents

Proiectul pentru Învăţământ Rural v

7.10 Further Readings .................................................................................................... 150 Unit 8 New Trends in Testing ................................................................................................. 151 8.1 Unit Objectives ......................................................................................................... 151 8.2 General Trends.......................................................................................................... 151 8.3 Computer- Based Language Testing ........................................................................ 152 8.4 Alternative Assessment ............................................................................................ 156 8.4.1 Techniques ............................................................................................................ 156 8.4.2 Journals ................................................................................................................. 156 8.4.3 Conferences .......................................................................................................... 157 8.4.4 Cooperative test construction ................................................................................ 157 8.5 Portfolios ................................................................................................................... 157 8.5.1 Characteristics ....................................................................................................... 158 8.5.2 Assessing Portfolios .............................................................................................. 159 8.5.3 Portfolio Content .................................................................................................... 160 8.5.4 Useful advice on development of portfolios ........................................................... 161 8.6 Summary .................................................................................................................. 162 8.7 Key Concepts ........................................................................................................... 163 SAA 5 .............................................................................................................................. 163 8.8 Answers to SAQs ..................................................................................................... 164 8.9 Further Readings....................................................................................................... 164 Bibliography .................................................................................................................. 165

Page 9: Tests and Evaluation -Metodology

Introduction

vi Proiectul pentru Învăţământul Rural

INTRODUCTION

1. What this course is about

This course is an introduction to the methodology of evaluation

and testing in teaching and learning English as a Foreign Language. It is obvious to all educators that the issues of grading and reporting on student learning continue to challenge teachers. However more is known at the beginning of the 21st century than ever before about the complexities involved and how certain practices can influence teaching and learning. This introduction tries to identify grading and reporting practices that can beneficially influence teaching and learning. Developing teachers’ awareness is another area of interest. The practical side of the course is obvious: to encourage the design and use of effective techniques in English language testing. Summarily, the course addresses and answers a number of questions about testing, helping you to develop a scientific perspective before you begin using and devising tests.

2. Course objectives

One of the major goals is to assist you in recognizing that the purposes of measurement and evaluation are good – not bad. Measurement, evaluation and testing are essential to sound educational decision making. After reading this course you will be able to:

• recognize that evaluation and testing are essential to sound educational decision making;

• understand the components of a model of decision making; • recognize the way evaluation and testing can assist in

instructional, guidance, administrative, and research decisions; • have a better understanding of the role of testing in language

teaching; • analyse and assess different kinds of tests; • identify the different purposes of testing; • identify the way in which testing can encourage good

teaching and learning; • learn how teachers can test the main skills, the language

system and beyond; • learn and apply techniques of test construction and

administration; • design tests that can assist/ complete good teaching and

learning; • develop techniques of self-learning; • appreciate the variety of interesting issues in evaluation and

testing that will be covered in subsequent chapters.

Page 10: Tests and Evaluation -Metodology

Introduction

Proiectul pentru Învăţământul Rural vii

3. Course content and structure

This course is divided in 8 units of study. Each unit comprises general presentations and self-assessment questions (SAQs) that aim at actively involving you in the learning process. The solutions and suggestions to the SAQs are provided in a separate section. At the same time, SAQs also give you a sense of direction, motivating you to continue in the right direction. We also provide other instruments that might help you e.g. a summary, key concepts, a checklist, and, in four cases, assignments. All of them must be submitted for evaluation. Each assignment, accompanied by a written feedback, will be marked by your tutor and returned to you. If you fail to present adequate papers, you will be given two opportunities to submit work for assessment and feedback.

4. The units of learning

Unit 1 (Introduction to Language Testing) defines and differentiates the terms evaluation, assessment, testing; classifies the purpose of evaluation and testing. Unit 2 (Conditions of a Good Test) presents the principles of good practices for assessing student learning and the basic tools in assessing tests: validity, reliability, discrimination, feasibility, washback. Unit 3 (Types of Tests I) is an introduction to informal assessment of the main skills and of Grammar and Vocabulary; it also presents a classification of tests based on a number of criteria: the purpose for which they are intended (Selection, Entrance, Readiness, Placement, Diagnostic, Progress, Achievement/ Attainment, Mastery Tests), the content upon which they are based (proficiency, achievement, aptitude tests); the frame of reference within which the scores of the tests are interpreted (Norm-Referenced, Criterion-Referenced Tests). Unit 4 (Types of Tests II) continues the classification of the formal tests function of Scoring Procedures (Subjective, Objective, Performance Test), the specific technique they employ (multiple-choice, completion, dictation, cloze tests), of the approach to test construction (direct and indirect test), of the number of elements tested at a time (discrete point/ integrative tests, speed vs power tests); other test categories. Unit 5 (Testing the Language Skills I) explores the issues of testing two of the main language skills: speaking and listening. Unit 6 (Testing the Language Skills II) introduces you to the main techniques of testing the other two language skills: reading and writing.

Page 11: Tests and Evaluation -Metodology

Introduction

viii Proiectul pentru Învăţământul Rural

Unit 7 (Testing the Language System and Beyond) discusses the main issues of testing pronunciation, grammar ad usage, vocabulary, discourse, literature, culture etc Unit 8 (New Trends in Testing) tries to identify the main contemporary trends in evaluation and testing: computer –based language testing, alternative forms of assessment, authentic testing, etc.

5. Self-Assessment Questions (SAQs)

The self-assessment questions aim at actively involving you in the learning process. The tasks aim mainly at activating your schemata, at making you think creatively. The variety of SAQs (multiple-choice, answering questions, matching, true-false etc) tries to exemplify the theoretical and practical aspects of testing.

A self-assessment question (SAQ) is signalled by the icon on the left.

6. Point(s) to Ponder

Points to Ponder include aphorisms and quotations which may be starting points for personal reflection on various issues/ controversies.

Point to Ponder is signalled by the icon on the left.

7. Solutions and suggestions for SAQs

You are advised to check your answers to each SAQ by going

to this section at the end of each unit. You should not be discouraged if some of your answers are different from those offered in this section. Read them carefully and try to learn from them as, hopefully, you will find them interesting and thought-provoking.

8. Assessment and Evaluation

The course also contains four send-away assignments (SAAs) which will enable your tutor to assess your performance on the course.

A send-away assignment (SAA) is signalled in the course text by the icon on the left.

These SAAs count for 40% of your final grade. The exam at the

end of the semester will add 40% while portfolio assessment

Page 12: Tests and Evaluation -Metodology

Introduction

Proiectul pentru Învăţământul Rural ix

represents 20%. Your portfolio should contain samples of tests of various kinds, tests and essays of your pupils, other materials designed by you for evaluation purposes. In compiling a portfolio, your creativity is a must. The table below represents the place, number of tasks, and the weight of each assignment.

Assignment

no. The unit

containing the SAA

The number of tasks and their weight in

each SAA

Weight of each SAA in

the final assessment

SAA no. 1 Unit 2 1 1 100% 10% SAA no. 2 Unit 4 1 1 100% 5%

1 50% SAA no. 3 Unit 6 2 2 50%

5%

1 25% SAA no. 4 Unit 8 2 2 75%

20%

In the assessment of each assignment, the tutor will take into

account: • the degree to which your answer proves that you meet the

requirements of the task - 40% • the correctness of your metalanguage – 10% • discourse features (coherence, cohesion etc) – 20% • grammatical accuracy – 20% • spelling accuracy – 10%

Each assignment must be completed and sent to the tutor in the

allotted study week, function of your study schedule. A type written paper is recommended. If this is not possible, take care that your handwriting is legible.

My advice is to contact your tutor for any queries.

9. Further readings

Before starting studying this textbook, I recommend you read two books in Romanian:

• Vagler, Jean (2000) Evaluarea în învăţământul preuniversitar, translated by Cătălina Gârba şi Ionela Băluţă, Iaşi: Polirom

• Pavelcu, Vasile (1968) Principii de docimologie, Bucureşti: EDP

10. Your study schedule

This course is devised for 42 hours of study i.e. 28 hours are meant for individual study of the course material (the solving of the SAQs included); 6 hours are allotted to your tutorial meetings and 8 hours for the completion of your SAAs.

Plan your study by taking into account that an academic semester lasts 14 weeks. Function of the difficulty of the various topics, I recommend the following study schedule:

Page 13: Tests and Evaluation -Metodology

Introduction

x Proiectul pentru Învăţământul Rural

Week Unit Number of

study hours Assignment Number of

hours for the SAAs

1 Introduction 2 2 3

Unit 1 Unit 2

4 SAA no. 1

2

4 5

Unit 3 4

6 7

Unit 4 4 SAA no. 2 2

7 8

Unit 5 4

9 10

Unit 6 4 SAA no. 3 2

11 12

Unit 7 2

13 Unit 8 2 SAA no. 4 2 14 Revision 2

TOTAL 28 hours Planning your course work is important as it will enable you to

send your assignments to your tutor in due time.

11. Appendices

To facilitate your acquisition of the main issues, several appendices have been added:

• At the end of each unit you may find a Summary, a list of Key words and a Checklist. The Further Reading section gives you a minimal bibliography, indicating the pages where you may find the information you need.

At the end of the course your final grade will take into account: • attendance of and contribution to face-to-face meetings with your

tutor and assisted activities, solving of SAQs and SAAs – 40% • final examination – 40% • portfolio (containing your tests/ your models) – 20%

Page 14: Tests and Evaluation -Metodology

Introduction to language testing

Proiectul pentru Învăţământul Rural 1

Unit 1 INTRODUCTION TO LANGUAGE TESTING 1.1 Unit Objectives ............................................................................................................. 1 1.2 Assessment. Testing. Evaluation .................................................................................. 2 1.3 Setting Testing Parameters ........................................................................................... 4 1.4 Participants in Testing .................................................................................................. 5 1.4.1 The Tester .................................................................................................................. 6 1.4.2 The Test Takers/ The Testees.................................................................................... 6 1.4.3 The Test User ............................................................................................................ 6 1.5 The Beneficiaries of Testing ......................................................................................... 6 1.6 The Overall Impact of Testing in Students’ Motivation .................................................. 7 1.7 Summary ....................................................................................................................... 9 1.8 Key Concepts ................................................................................................................ 9 1.9 Checklist ........................................................................................................................ 9 1.10 Answers to SAQs ...................................................................................................... 10 1.11 Further Readings....................................................................................................... 10 1.1 Unit Objectives

After you have completed the study of this unit you will be:

• familiar with the background of language testing • aware of the fact that testing is an important part of every teaching

and learning experience • aware that both experienced and inexperienced teachers of

English as a Foreign Language (EFL) need to improve their skills in constructing and administering classroom tests

• able to understand how testing helps students create positive attitudes towards your class and able to identify the main issues of language testing

• able to define and differentiate the terms test, measurement, evaluation and assessment

• recognize that assessment, measurement, evaluation and testing are essential to sound educational decision making

• recognize the ways assessment, measurement and evaluation can assist in instruction, guidance, administrative and research decisions.

Page 15: Tests and Evaluation -Metodology

Introduction to language testing

2 Proiectul pentru Învăţământul Rural

1.2 Assessment. Testing. Evaluation

The terms test, measurement, evaluation and assessment

are occasionally used interchangeably, but some users make distinctions among them. Measurement often connotes a broader concept. We can measure characteristics in other ways than by giving tests (observation, rating scales, etc)

The term assessment refers to a variety of ways of collecting information on learner’s ability or achievement. Although testing and assessment are often used interchangeably, the latter is an umbrella term encompassing measurement instruments such as tests, as well as qualitative methods of monitoring and recording student learning such as observation, simulations or project works. Assessment is also distinguished from evaluation which in a TEFL setting is a process of collecting, analysing and interpreting information about teaching and learning in order to make informed decisions that enhance student achievement and the success of educational programmes. It means that evaluation is concerned with the overall language programme: textbooks, other instructional materials, student achievement.

Point to Ponder At the end of fifth-grade, we have two pupils who are both reading at the fifth-grade level. However, at the beginning of the year, one student was reading at the third-grade level, and one at the fourth-grade level. Are our evaluations of those outcomes the same? Answer: Measurement is not the same as evaluation. In this particular case, evaluations are not the same. One student progressed at the above-average rate, and the other at a below-average rate.

Assessment of achievement is what a student has learned in relation to a particular course content or course objectives. Formative assessment is carried out by teachers during the learning process with the aim of using the results to improve instruction. Summative assessment is done at the end of a course to provide information on programme to educational authorities.

When you teach you are part of a cultural and social system that extends beyond the walls of your classroom. Both you and your pupils have expectations about • what they will do, • what they should get from and give to the experience, and • how you will know if you are succeeding.

Beyond the walls of the classroom, almost everything is tested as the contemporary society values numbers, counting, and “doing research” based on figures. Measurement is a fact of life.

Page 16: Tests and Evaluation -Metodology

Introduction to language testing

Proiectul pentru Învăţământul Rural 3

Point to Ponder

“To teach without testing is unthinkable.” The Joint Committee of the American Association of School Administrators

Testing is a very widespread and common management

strategy if we accept the following: • testing represents the explicit codification of the real goals of a

teaching and learning programme. Contemporary trends in testing show that this management strategy is rarely the decision of the individual teacher. Rather, unless we speak about formative tests, it is passed down from the next administrative level, or even from the Ministry of Education and Research. But how you feel about it, and the way you let it affect your attitudes and the attitudes of your pupils, is still under your control.

• marking is a form of assessment. It involves giving the pupils a grade mark. Any assignment, oral or written can be marked. Marking is one of the most time-consuming parts of a teacher’s job. If you want to cope with all this marking, you have to take into account a number of options. You may:

• correct all errors • be selective in choosing particular errors • correct understanding or mistakes of content • suggest/ require corrections to be done • go over areas of common difficulty with the whole class • see individual pupils about their work • display the best papers • simply put a tick to show it has been read • students should know how they are to be assessed You should avoid: • building up a back lag of unmarked work • marking down a paper because a pupil misbehaved

SAQ 1

What is measured/ tested beyond the walls of your classroom? Write your answers in the space provided above (in no more than 60 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 17: Tests and Evaluation -Metodology

Introduction to language testing

4 Proiectul pentru Învăţământul Rural

1.3 Setting Testing Parameters

Experience has demonstrated that teachers, even successful

and long – experienced ones, are relatively unsophisticated and careless when it comes to the design, operation and interpretation of evaluation instruments, and in the interpretation and presentation of results.

The first stage in the preparation of an evaluation instrument is the setting of appropriate parameters.

WHY? What is the purpose of the evaluation?

Possible answers: to determine the extent to which a course/ chapter had achieved its stated aims; to measure pupils’ reaction/ satisfaction; to provide basis for comparison of different approaches, methods, techniques.

WHAT? What is to be evaluated?

Possible answers include some/ the entire course; content; methodology; participants, teachers etc. Always prepare a table of specifications function of what you want to evaluate.

WHEN? When is it to be evaluated? During the course? At the end? Before and after the course?

WHO? Who will evaluate? Teachers? Inspectors? An outside party?

WHY?

to give feedback to how your teaching is going

to provide feedback and guide improvement

To classify or grade learners

To enable student progression

To add variety to students’ learning experience

To enable grading

To provide statistics for the school

To maximize learners’ motivation

To diagnose faults and provide students with an essential tool to put things right

Page 18: Tests and Evaluation -Metodology

Introduction to language testing

Proiectul pentru Învăţământul Rural 5

HOW? How will the evaluation be carried out? What form will it take? Will it

be a pen-and-paper instrument or be conducted orally? Will it seek to elicit quantitative or qualitative data, or both?

Among these questions, the why? and the what? are obviously of major importance. Also, in the real world, there will be constraints (such as time, space, resources) that will operate on the parameters of when?, who? and how?.

Once parameters are defined, it is possible to clarify valid objects for evaluation and to agree on an appropriate methodology. For small – scale evaluation projects in which time is an important factor, the most convenient vehicle is the pen-and-paper instrument.

At the next stage, that of instrument design, it is necessary to consider such questions as validity, format and administration.

1.4 Participants in Testing

The participants in language testing are the: • tester • test taker/ the testee • test user

WHAT DO YOU REALLY WANT TO

ASSESS?

Do you test subject knowledge (information recall) or how well students can use such information for synthesis, anaysis and evaluation?

Do you evaluate group work or individual work?

Is assessment formative or summative?

Does testing encourage deep, surface or strategic learning?

Is the assessment convergent (aimed at identical results) or divergent (to demonstrate individuality and diversity)?

Is the assessment norm-referenced or criterion-referenced?

Is it teaching or learning that is being assessed?

Are you going to assess product (report, essay) or process (how the learners achieved the outcome)?

Is time/ context specific?

Is the assessment holistic?

Page 19: Tests and Evaluation -Metodology

Introduction to language testing

6 Proiectul pentru Învăţământul Rural

1.4.1 The Tester The tester may be:

• a foreign language teacher who designs, administers, and interprets tests given to his own learners

• a group of people responsible for developing tests requirements • a private or governmental testing agency (PALSO in Greece; ETS

– the Educational Testing Service in New Jersey, USA; CITO in Holland or the Ministry of Education and Research in Romania

• other organizations/ international meetings: the annual Language Testing Research Colloquium, The Scientific Commission on Language Tests and Testing of the International Association of Applied Linguistics, “Language Testing” – a professional and academic journal

1.4.2 The Test Takers/ The Testees

The Test Takers may be: • students in schools and universities • applicants for positions that require foreign language abilities • people seeking certification of language proficiency for their jobs

Candidates who are not test wise (familiar with the test format and content) are usually at a disadvantage. In order to avoid this, some programmes offer preparation for tests, practice sessions (for example, the TOEFL textbook).

1.4.3. The Test User The Test Users are the individual or institution that make use of

the interpretation of scores e.g. • foreign language teachers (to encourage and monitor learning, for

personal feedback) • the Ministry of Education uses tests to ensure that the National

Curriculum is followed and to assess the standards achieved in school work

• foreign universities (American or British) use language tests (TOEFL or Cambridge Examination) to assess the proficiency and predict if applicants can attend successfully a programme of instruction in English

• public and private institutions assess the linguistic competence of those employees who need a foreign language in their work

• foreign language teaching schools use tests for placement at an appropriate level in their courses

1.5 The Beneficiaries of Testing

• diagnostic and placement tests offer advantages of improved efficiency for learner, teacher, and educational system

• admission tests protect admitting institutions and agencies that offer scholarships from too high a failure rate

• certification tests offer advantages to the persons who pass the test and the agencies that hire them. They also offer protection to existing professionals organized in professional organizations who control access to certain professions

Page 20: Tests and Evaluation -Metodology

Introduction to language testing

Proiectul pentru Învăţământul Rural 7

• testing agencies TOEFL, University of Cambridge, Local Examination Syndicate, English as a Foreign Language, UCLES; tests are major sources of income for testing agencies

Points to Ponder

• The most effective teachers assess student learning often • One research study found that of all the things a student

learns, 80% is forgotten in one year. Most of what is forgotten are facts memorized for a quiz or test

What conclusion should you draw from these two statements?

1.6 The Overall Impact of Testing in Students’ Motivation

Testing has an impact on students’ self-esteem. Self-esteem is an outcome of educational experience and a factor determining future learning. However, you have to be very careful because one impact of the tests is the reduction in self-esteem of those students who did not achieve well.

Although at primary school level pupils are not aware that tests give a narrow view of their learning, test performance is more highly valued than what is being learned. Some pupils are also aware of the narrowing of the curriculum. Only the pupils confident of success enjoy tests. High achievers use appropriate test taking strategies. Low achievers may become overwhelmed and de-motivated when they repeatedly receive low grades. If we are not careful, the gap between low and high achieving students may be increased. The use of repeated practice tests is not a good practice because pupils may adopt test – taking strategies designed to avoid effort.

How assessment of students’ learning is reported back to the pupils (feedback) affects motivation to learn. Your feedback should focus on how to improve or build on what has been done (task- related feedback) rather than on marks which are formally or informally compared with those of others. Motivation is increased if you explain the purpose of their tests and provide task – related feedback.

SAQ 2 Maslow’s hierarchy of needs includes:

• The self –actualization needs • The esteem – needs • The belongingness and love needs • The safety needs • The psychological needs

What needs are satisfied if all learners experience success, and get praise and other reinforcement? Circle the correct answer. Compare your answer to that in the “Answers to SAQs” section at the end of the unit.

MOTIVATION AND

TESTING

Page 21: Tests and Evaluation -Metodology

Introduction to language testing

8 Proiectul pentru Învăţământul Rural

Improving traditional exam questions

Keep the language simple

Use questions that seek to discover what has been learned, to reinterpret their knowledge intelligently

Be sure that learners understand the instructions

Avoid trick questions

Be creative. Allow your students to write one question of their own in an exam

Coping with exam failure

Help students focus on what they can do in the future to improve

Offer opportunities for practising under simulated exam conditions

Build-up confidence (develop revisions and exam techniques

Let students play exams

Give students opportunities to reflect on unsuccessful exam performance

Help students identify their strengths

Take account of the feelings of students who fail exams

See failure as an opportunity for learning

Page 22: Tests and Evaluation -Metodology

Introduction to language testing

Proiectul pentru Învăţământul Rural 9

1.7 Summary

This unit aimed at sensitizing the learners about the main issues of testing. We cannot stop testing. Tests seem inevitable because they are part of a much larger cultural system. Testing can become a healthy part of an honest and responsive learning atmosphere. The following statements summarize the major points of the first unit:

1. Assessment and testing, measurement and evaluation are essential to sound educational decision making;

2. The concept of assessment is broader than that of testing. The same is true about measurement. We can measure characteristics in ways other than by giving tests.

Who wants tests?

• Learners – in order to be motivated, learners want solid evidence of progress

• Parents – are concerned when children are not making “normal progress”

• Schools – tests are used as the basis for grades which schools require teachers to give

• Adults – want tests to use in getting into schools or into jobs • Teachers – tests can give teachers ideas for effective

teaching procedures, tests can help teachers plan for the future, tests are useful management tools as tests can stimulate effort, guide learning, provide fair rewards for honest work.

1.8 Key Concepts

• Language Test • Assessment • Measurement • Assessment Criteria • Evaluation • Testing

1.9 Checklist

Do you ask questions which students can answer successfully? Do you leave time for student to think? Do you always praise or otherwise acknowledge correct

responses? Do you avoid ridiculing students’ answers? If no answer comes, are you able to ask a simpler question that

leads to the answer to the original question?

Page 23: Tests and Evaluation -Metodology

Introduction to language testing

10 Proiectul pentru Învăţământul Rural

1.10 Answers to SAQs SAQ 1 Your answer depends upon your personal experience.

Time – we are surrounded by machines, schedules, and

procedures intended to get more things done more quickly.Time – we tend to become impatient when someone/ something

wastes our time: a disorganized book, a tie-up in traffic. Possessions, money, land, other goods.

Anything that can be counted becomes more real, safer, more valuable (research results). Average income. Cars are tested repeatedly. Testing whether the consumer will buy a product. Drugs are tested.

SAQ 2

If your answer to SAQ 2 is not comparable to the one suggested below, please reread section 1.6 again. Esteem needs. Students gain respect from the teacher and other learners. At the same time, almost all of them want to be praised by their parents.

1.11 Further readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 1-4 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press, pp 1-9

Page 24: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 11

Unit 2 CONDITIONS OF A GOOD TEST

2.1 Unit Objectives ........................................................................................................... 11 2.2 Principles of Good Practice for Assessing Student Learning ...................................... 13 2.3 Validity ........................................................................................................................ 14 2.3.1 Content Relevance .................................................................................................. 16 2.3.2 Content Coverage ................................................................................................... 16 2.3.3 Face Validity ............................................................................................................ 16 2.3.4 Content Validity ....................................................................................................... 16 2.3.5 Predictive Validity .................................................................................................... 17 2.3.6 Construct Validity ..................................................................................................... 17 2.3.7 Curricular Validity .................................................................................................... 19 2.3.8 Criterion Related Validity ......................................................................................... 19 2.3.9 Concurrent Validity .................................................................................................. 20 2.4 Reliability .................................................................................................................... 21 2.4.1 Measuring Reliability ................................................................................................ 21 2.4.1.1 Test-Retest Method .............................................................................................. 21 2.4.1.2 Parallel Forms of the Test to the Same Group ..................................................... 22 2.4.1.3 The Split-Half Method ........................................................................................... 22 2.4.1.4 Factors that Affect Language Scores .................................................................... 23 2.4.1.5 Test Length............................................................................................................ 26 2.5 Discrimination ............................................................................................................. 27 2.6 Feasibility .................................................................................................................... 28 2.7 Washback ................................................................................................................... 29 2.7.1 Negative Washback ................................................................................................. 30 2.7.2 Positive Washback .................................................................................................. 30 2.8 Summary ..................................................................................................................... 31 2.9 Key Concepts .............................................................................................................. 31 2.10 Checklist ................................................................................................................... 31 SAA 1 ................................................................................................................................ 32 2.11 Answers to SAQs ..................................................................................................... 33 2.12 Further Readings ...................................................................................................... 34 2.1 Unit Objectives

Testing, including all forms of language testing, is among other things, one form of measurement. If we test reading comprehension or spelling, for example, we want to measure to what degree these abilities are present in the examinee. But there is potential for error whenever we weigh something. Tests of language may be inaccurate (or unreliable) or invalid. Tests, to be useful instruments, must offer reliable and valid scores.

The objectives of this unit aim at: • offering you tools for the evaluation of the adequacy of any given

test

Page 25: Tests and Evaluation -Metodology

Conditions of a Good Test

12 Proiectul pentru Învăţământ Rural

• making you recognize sources of error variance and factors that influence reliability estimate

• understanding and interpreting the reliability/ validity of different scores

• understanding the relationship between reliability and validity • understanding the basic kinds of validity evidence • interpreting various expressions of validity • recognizing what factors affect validity and how they affect it • recognizing the relationship between test validity and decision

making • making you familiar with the rudiments of statistical concepts • developing your awareness of the characteristics that make a good

test • offering you an instrument for your personal classroom research • giving you an example about the importance of interdisciplinary

studies in TEFL • making you capable of rating a test, taking into account several

criteria:

Rating: 10 – highly adequate, 0 – highly inadequate

10 9 8 7 6 5 4 3 2 1 0

1. Validity (the test should adequately measure what it is supposed to measure)

2. Difficulty (not too difficult or too easy; Is it a test for adults or for children; Has the test been piloted?)

3. Reliability (Does the test minimize the presence of measurement error; Can it be used for important post – test decisions; Is the test long enough?)

4. Applicability (Is the test format familiar to the testees, the administrator of a tape-recorded test or the one using live voices?, etc)

5. Relevance (Do all the testees have the same native language background? Is the sample of test items drawn from a relevant domain?)

6. Replicability (Is it possible to administer equivalent forms of the same test to avoid cases of security breakdown?)

7. Interpretability (How is the test to be scored, reported and interpreted?)

8. Economy (Is the test cheap or expensive to develop, purchase, duplicate, score, report, store and interpret?)

9. Availability (Can you find and administer an available standardized test?)

10. Acceptability (Is the test accepted by your immediate superiors or by the Ministry of Education and Research, are there any constraints?)

Page 26: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 13

2.2 Principles of Good Practice for Assessing Student Learning

• Assessment is not an end in itself but a vehicle for educational

improvement. Educational values should determine what we choose to assess and how. When questions about educational mission and values are skipped over, assessment threatens to be an exercise in measuring what is easy, rather than a process of improving what we really care about.

• Assessment is most effective when it reflects on understanding of

learning as multidimensional, integrated, and revealed in performance over time

• Assessment should reflect that learning is a complex process i.e. it

involves knowledge, values, attitudes, habits of mind that affect both academic success and performance in real life

• It follows that assessment should reflect the complexity of learning

by: - employing a diverse array of methods including those that call

for actual performance; - using methods that cover time so as to reveal change, growth

and increasing degrees of integration

• Assessment works best when the programmes it seeks to improve have clear, explicitly stated purposes

• Assessment is a goal-oriented process. It entails comparing educational performance with educational purposes and expectations – these are derived from the institution’s mission, from teachers’ intentions in programme and course design, and from knowledge of students’ own goals, it follows that assessment as a process pushes an institution of learning towards clarity about where to aim and what standards to apply. Assessment also prompts attention to where and how programme goals will be taught and learned

• Assessment requires attention to outcomes but also to the experiences that lead to those outcomes

• Assessment can help us understand which students learn best under what conditions. With such knowledge comes the capacity to improve the whole of their learning (students’ experience along the way, curricula, teaching

• Assessment works best when it is ongoing, not episodic. Assessment is a process whose power is cumulative. The teacher should monitor progress towards intended goals in a spirit of intended improvement.

Page 27: Tests and Evaluation -Metodology

Conditions of a Good Test

14 Proiectul pentru Învăţământ Rural

SAQ 1 Read the principles of good practice for assessing pupil learning and try to write your own assessment Decalogue on a single sheet of paper.

1. ---------------------------------- --------------------------------------------- 2. ----------------------------------- --------------------------------------------

3. ---------------------------------- --------------------------------------------

4. --------------------------------- --------------------------- ----------------

5. --------------------------------- ----------------------------------------------

6. ------------------------------------------------------------------------------

7. -------------------------------------------------------------------------------

8. -------------------------------------------------------------------------------

9. -------------------------------------------------------------------------------

10. -------------------------------------------------------------------------------

Write 10 principles in the spaces provided above. Your choices depend on your teaching and learning experience.

2.3 Validity

Validity, the most important quality of a test, refers not only to the degree to which the test actually measures what is intended to measure, but also to the adequacy and appropriateness of the way we interpret and use test scores.

A valid test is one in which a testee’s score gives a true reflection of his ability on the trait. Statistical and descriptive means have been used to check validity. Content analysis of tests determines:

• the language items present in a test (quality, number, whether

they are representative samples) • the skills, or some aspects of a skill (the reading speed, the

variety of text types, etc) Validity might also be a function of language knowledge, skill in

using the language, ability to negotiate certain language activities, task authenticity.

Page 28: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 15

Points to Ponder • If a large number of students do poorly on an exam, reconsider

its worth. • When used to describe a test, the term valid should be

accompanied by the preposition for e.g. This test is valid for • Make the first test relatively easy to build up students’

confidence. • Never argue with a student about a grade in front of the class.

Offer to meet him/ her the next day. Give him/ her some time to cool off first.

If the test scores are reliable, then performance on the test cannot

be affected by measurement errors but by other causes. In examining validity, we consider the relationship between performance and other types of performance in other contexts. It also implies: • the uses or interpretation we make of the test results • the value systems that justify a given use of test scores • the educational and social consequences of the uses we make of

tests “In test validation we are not examining the validity of the test content or of even the test scores themselves, but rather the validity of the way we interpret or use the information gathered through the testing procedure” (Bachman, 1990: 238)

Reliability is a requirement for validity. A test is not valid unless it meets the conditions of reliability. The investigation of reliability and validity are complementary aspects identifying, estimating and interpreting different sources of variance in test scores. Correlation between scores on parallel test demonstrates reliability. Correlation between scores on a multiple choice test of grammar and ratings of grammar on an oral interview demonstrate validity.

Validity is a unique concept. The distinction among content

validity, criterion-related validity and construct validity is inadequate. Practically, they are complementary types of evidence.

Point to Ponder Reliability is a necessary but not sufficient condition for validity to be present i.e. it is possible for a test to be reliable without being valid for a specific purpose, but it is not possible for a test to be valid without first being reliable.

Page 29: Tests and Evaluation -Metodology

Conditions of a Good Test

16 Proiectul pentru Învăţământ Rural

2.3.1 Content Relevance (validity)

Content relevance (validity) requires the specification of the ability domain, of the task, or task domain. i.e. what is that the test measures, the attributes of the stimuli presented to the tester, the nature of his responses.

2.3.2 Content Coverage

Content coverage refers to the extent to which the tasks required in the test “adequately represent the behavior of domain in question” i.e. to ensure that the tasks required by the test were representative of that domain.

2.3.3 Face Validity

A test has face validity when it looks right to other people (testers, teachers, testee). J.B. Heaton gives the following example: Is photography an art or a science? (discussion from a public matriculation examination) It is obvious that this question demands specialized knowledge. Adapted tests may lack face ability (the presence of culturally bound words, for example). Face validity may increase motivation as testees “will try harder if the test looks sound.” Face validity means that the testees feel that the test tests what it is “supposed” to test. In order to increase face validity use the following advice: • use a carefully constructed format; • include items that are clear; • give clear directions; • be sure that the tasks are familiar and relate to their course of study;

For Oller (1979), face validity is a kind of impressionistic reaction on the part of the examinees. SAQ 2 Why is face validity a desirable feature of a test? Write your answer in the space provided above (in no more than 45 words) and compare it to that in the “Answers to SAQs” section at the end of the unit.

2.3.4 Content Validity

This concept answers the question: Is the content a comprehensive and representative sample of what you want to measure? i.e. of the language skill, structures. It follows that the test constructor needs a specification of the skills or structures from which to make a principled selection. Lack of content validity has a harmful backwash effect i.e do the items in the test represent an adequate sample of ability (the neglected areas in the test are usually neglected by

Page 30: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 17

testees). A written test which tries to measure pronunciation lacks face validity.

2.3.5 Predictive Validity Predictive validity concerns the degree to which a test can predict testee’s future behavior. Predictive validity answers the question: Does the score predict a testee’s ability to cope with a graduate course at an American university. If a university admission exam is administered and its scores are correlated with successive annual grades, we notice the highest validity only after one year of study. There is a tendency for predictive validity to decrease with each successive year, reflecting maturational changes in the students. In this case, the predictive validity is poor. Other criteria than annual grades should be selected as success measures (annual ratings of job performance).

The choice of criterion measure raises interesting questions:

• should we rely on the subjective judgments of supervisors? • how helpful is it to use final outcome as the criterion measure

when so many factors other than ability (subject knowledge, intelligence, motivation, health) will have contributed to the outcome?

The typical example of predictive validity would be where an attempt was made to validate a placement test. How many of the students were misplaced.

Information about criterion relatedness – concurrent or predictive – is by itself insufficient evidence for validation.

2.3.6 Construct Validity

Construct validity concerns the extent to which a test measures just the ability which it is supposed to measure i.e. the purpose of construct validation is to provide evidence that underlying theoretical constructs being measured are themselves valid. The word construct refers to a complex idea formed by combining single ideas. A synonym might be the word concept. Examples of constructs: reading ability, writing ability. Construct validation answers the question: “To what extent performance on tests is consistent with predictions that we make on the basis of a theory of abilities”. Construct validity is central to the appropriate interpretation of test scores and provides the basis for the view of validity as a unitary concept.

Bachman considers that “in conducting construct validation, we are empirically testing hypothesized relationships between test scores and abilities”. Construct validation can thus be seen as a special case of verifying, or falsifying a scientific theory, and just as a theory can never be proven, the validity of any given test use or interpretation is always subject to falsification. Construct validation requires both logical analysis and empirical investigation. It also reflects the extent to which the content of a test or of assessment reflects current understanding of the skill(s) or sub-skill(s) being used.

Page 31: Tests and Evaluation -Metodology

Conditions of a Good Test

18 Proiectul pentru Învăţământ Rural

SAQ 3

You want to test competence in vocabulary and grammar. You decide to use two kinds of tests: a multiple-choice test and a writing sample. The scores of multiple-choice tests are highly correlated with other. The correlation between the multiple choice and writing tests of grammar is poorer. What is the cause of this lack of correlation? Write your answers in the space provided above (in no more than 30 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Messick (after Bachman) considers the following types of

empirical evidence among the means of construct validation: 1. the examination of patterns of correlations among item scores

and test scores, and between characteristics of items and tests and scores on items and tests;

2. analysis and modeling of the processes underlying test performance;

3. studies of group differences; 4. studies of changes over time; 5. Investigation of the effect of experimental treatment

A correlation is a functional relationship between two measures: two sets of scores may be correlated with each or they may vary.

Correlation: • a high score on a test of grammatical competence and • a high grade in writing classes.

A correlation coefficient tells us to what extent variation in one (measure) goes with variations in the other. Due to the influence of Samuel Messick, an outstanding representative of educational measurement in the United States, construct validity, content validity, criterion-related validity and consequential validity have become included within a single unitary concept of validity centered on construct validity (Messick, 1980, 1983).

In discussing validity we have to take into account experimental evidence and the use of self-report data

i.e. what testees say about the experience of answering particular test items in order to try and separate the measurement of relevant aspects e.g. in using reading strategies, from the use of test-taking strategies. This may be seen as a way of opening up construct validity, or the accuracy of measurement of the theoretical essentials of a given skill or one of knowledge.

Other factors that may influence more results • test bias (the result of differences in individual characteristics other

than the ability).

Page 32: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 19

Test bias might includes: - misinterpretation of test score; - sexist or racist content; - unequal prediction of criterion performance; - unfair content with respect to the experience of test taker; - inappropriate selection procedures; - inadequate criterion - threatening atmosphere; - conditions of testing - background knowledge; - cognitive characteristics. - field independence (the extent to which a person perceives

analytically ) - ambiguity tolerance (a person’s ability to function rationally and

calmly in a situation in which interpretation of all stimuli is not clear)

- native language, ethnicity - sex and age

Messick has identified areas to be considered in the ethical use and interpretation of test results. Construct validity (Does the evidence offered by the test support the particular interpretation we wish to make? Does the test, for example, guarantee the certification of teachers?) • The value systems that inform the particular test use • The practical usefulness of the test • Consequences to the educational system or society of using test

results for a particular purpose.

Establishing the validity of a test or assessment may, thus, include an evaluation of the intended or unintended consequences of a test’s interpretation and use.

2.3.7 Curricular Validity

The term curricular validity relates to the question of the degree to which the test content is covered in the curriculum. This is certainly important if one wishes to make inferences about instructional effectiveness. Curricular validity is considered by many to be important for any type of minimal competency test required for, say, secondary school graduation. It seems unfair to withhold a diploma for someone who did not learn something that was not covered in the curriculum.

2.3.8 Criterion Related Validity

This approach to test validity answers the question: how far results on the test agree with those provided by some outside independent criterion measure:

• concurrent validity • predictive validity

This concept is usually discussed together with those of concurrent validity.

Page 33: Tests and Evaluation -Metodology

Conditions of a Good Test

20 Proiectul pentru Învăţământ Rural

2.3.9 Concurrent Validity

When the test and the criterion are administered at about the same time we speak about concurrent validity.

Example:

The course objectives call for an oral component as part of the final achievement test. The testee is expected to perform orally a large number of functions. The duration of the test might take 45’ for each student. Because of the great number of testees, only ten minutes can be devoted to each of them. Does the test have content validity? In order to check this, a sample of testees chosen at random are fully tested (45’). The result of this extension becomes the criterion against which the shorter tests will be judged. A high level of agreement between the two tests indicates that the shorter version of the oral component may be considered valid. The mathematical measure of similarity is called validity coefficient. Perfect agreements between the two scores will result in a validity coefficient of 1. Total lack of agreement will give a coefficient of zero.

The criterion for concurrent validation might be also considered

the teacher’s assessment of his student. Point to Ponder Before introducing a new test, its specifications and sample items have to be made available to everyone concerned with preparation for the test.

SAQ 4 Match the threats to Test Validity with the practical examples given below:

1. Misapplication of tests 2. Standardized proficiency tests, developed from a distinct

population, are administered to subject drawn from a qualitatively different population.

3. Items do not match the objectives or content of instruction 4. Imperfect cooperation of the examinee

a. The examinees are insincere, misinformed or hostile. They

consider that the test is a waste of time. They respond quickly making a series of answers which do not at all reflect their opinions.

b. A test of reading comprehension designed to measure achievement in reading comprehension in accordance with the syllabus of 4th year high – school students applied to measure achievement of 4th year general school students.

c. TOEFL test is a standardized proficiency test of high validity

Page 34: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 21

2.4 Reliability

If we buy a kilo of fruit, each time we weigh the parcel on the

same scales, we expect to get the same weight. The same thing is expected from a test. In order to be reliable, a test must be consistent in its measurements i.e if the test is given to the same learners on different occasions with no further language lessons between the two dates, the same scores are obtained.

Points to Ponder

• Make the first test relatively easy to build students’ confidence.

• Provide adequate feedback on students’ test performance. Any assessment should help a student to learn.

Reliability is vital especially when the test is used for an

entrance examination. Factors affecting the reliability of a test: • the size or the extent of the sample i.e. the longer the sample, or

the mere task the pupil has to perform, the greater the reliability. Objective tests are favoured because they allow for a wide field to be covered.

2.4.1 Measuring reliability

Well-constructed tests usually have a reliability coefficient of r=0.90 or greater.

2.4.1.1 Test-Retest Method

• Test – Retest method i.e. re-administer the test to the same testees after a lapse of time (no more than two weeks). Comparison of the two results would then show how reliable the test is. The following formula is recommended: r tt = r1, 2 where r tt = the reliability coefficient using this method, r1, 2 = the correlation of the scores at time one with those at time two for the same test used with the same persons. A frequent use of it is not to be recommended because: a number of pupils will benefit more than others by a familiarity with the test and format of the test;

and reliability designed for the purpose of screening foreign students for entry to American universities. In spite of this, the vocabulary section is not difficult at all for Romanian testees. (words of Romance origin which are difficult for Anglo-Saxons are easy for Romanians)

d. The test requires knowledge of vocabulary and structures to which the students were never exposed. The test requires knowledge of if-clauses only. 1. ……….. 2. …………. 3. ………….. 4. ……………..

Write your answers in the space provided and compare them to those at the end ofthe unit.

Page 35: Tests and Evaluation -Metodology

Conditions of a Good Test

22 Proiectul pentru Învăţământ Rural

changes in performance resulting from the memory factor; personal factors.

In researches that involve data-collection at more than one point in time, the same question can be asked in different locations without drawing the respondent’s attention to what is being done. The test – retest method can sometimes produce surprising results, especially in the case of questionnaires. The test – retest method can be approximated through a different technique, if the tester has several indicators of the variable in question.

2.4.1.2 Parallel Forms to the Same Group

• Administer parallel forms of the test to the same group. The second test should be identical in its sampling, difficulty, length, rubrics. If the correlation is high, the tests can be termed reliable. The following formula is recommended: r tt = r A, B where r tt = the reliability coefficient, r A, B = the correlation of form A with form B of the test when administered to the same people at the same time.

2.4.1.3 The Split-Half Method

• The split – half method. Divide the test and the corresponding scores obtained. The test is reliable if the two halves correlate with each other. Ways of splitting into halves:

Divide it into the first and second halves. Language tests are designed as “power” test i.e. the easiest questions at the beginning and the questions becoming progressively more difficult

Split the test into random halves, the odd – even method: when we measure the same ability e.g. multiple choice tests of grammar and vocabulary. The first half may comprise items 1, 4, 5, 8, 9, 12. The second half may contain items 2, 3, 6, 7,

10, 11. The following formula is recommended: BrA

rABrtt,1

2+

=

where r tt = reliability estimated by the split, rAB = the correlation of half-method, the scores from one half of the test with those from the other half.

Point to Ponder Tension between Reliability and Validity The best measurements are those ranking high in both validity and reliability. However, there is a tension between the two characteristics. Increasing one often reduces the other. The solution to this dilemma is to use a variety of measurement techniques – varying in validity and reliability – whenever possible.

Page 36: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 23

SAQ 5 Circle T(rue) or F(alse) 1. Reliability is a necessary but not sufficient condition for validity. 2. Reliabilities of the prediction and criterion measures, group heterogeneity cannot affect validity. 3. Availability of other data, cost of testing and faulty decisions, selection ratio, success ratio cannot affect whether a test is valid enough to be useful in decion making. 4. Reliability is the degree of consistency between two measures of the same thing. 5. Reliability will be higher when a test is given to a heterogenous group. 6. All measurement is subject to error.

In order to give good answers read 2.3 and 2.4. circle T or F. Compare your answers to those at the end of the unit.

T F T F T F T F T F T F

2.4.1.4 Factors that affect language scores

1. Test method features

• Features of the test environment - familiarity with the place and the equipment - personnel - time of testing - physical conditions

• Features of the test - the relative important features, sequence and relative

importance of parts - time allocation - instructions (language, visual or oral channel)

• features of the input format (language form and vehicle of presentation)

• nature of language (vocabulary, contextualization, distribution of new information, type of information)

• discourse characteristics • features of the expected response (format, nature of

language) • restrictions on response (channel, format, time) • relationship between input and response (reciprocal,

nonreciprocal, adaptive) 2. Personal attributes

• systematic individual (cognitive state, knowledge) • particular content areas and group characteristics (sex, race,

and ethnic background)

Page 37: Tests and Evaluation -Metodology

Conditions of a Good Test

24 Proiectul pentru Învăţământ Rural

3. Random (unsystematic) factors

• emotional state • mental alertness • changes in the test environment

The effects of all the above features:

• Individuals who take a language test are no likely to perform equally well

• Variation is due to the different factors above • Different factors affect different individuals differently • Individuals are affected by different methods of testing (some

may do very well on a multiple choice test and perform poorly an a composition)

SAQ 6 If all error of measurement could be removed from a testee’s score, what would we call the remaining quantity?

Write your answer in the space provided above (in no more than 15 words) and compare it to that in the “Answers to SAQs” section at the end of the unit.

Conclusions

• A major concern is to minimize the effects of test method, personal attributes and random factors that are not part of language ability

• The interpretation and use of language test scores must be moderated by your assessment (or estimates) of the extent to which these scores reflect personal, test method or random features.

Points to Ponder Are you aware that:

• Students with neat handwriting get higher marks on essay tests?

• A halo effect exists in the assignment of grades? Students who performed well on previous essays tend to be rated higher on subsequent ones, even if the quality diminishes?

• Longer essays get rated higher than better shorter essays? • Students with common names get rated higher than students

with unusual names? • Grades have proven of little value in predicting any criteria of

post-school success in any field? (after Ronald L. Partin)

Page 38: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 25

Random factors and test method features are sources of

measurement error that affect reliability. Personal attributes are sources of test bias or test invalidity. Two statistical concepts are useful in discussing reliability: mean and variance.

The mean (symbolized by X) is the arithmetic average of the scores of a given group of test takers. The variance (symbolized by s² - square s of standard deviation is a statistic that shows how much individual scores vary from the group mean. x, t, e – indicate specific types of variance, s²x refers to the variance in observed test scores.

NR – Norm – referenced test results are interpreted with reference to the performance of a given group, or norm. A norm – group is made up of a large group of individuals who are similar to the testees.

Stages in the development of NR – group: • a test is given to the norm group • the results are used as reference points for interpreting the

performance of other students who take the test • the reference points are the mean x (or average score) • S – the standard group deviation indicates how spread out the

scores of the group are • A NR test is graphically distributed in the shape of a bell –

shaped curve • statistical characteristics of a normal distribution of scores: 50% of

the scores are below the mean, 50% are above, 34% of the scores are between the mean and one standard deviation, 25% (+1 s) are below, 37% are between one and two standard deviations from the mean (13.5% above and 13.5% below), 5% of the scores will be as far away as two or more standard deviations from the mean

Example: The mean of the TOEFL is about 512, s = 66.

A score of 578 (512 + 66) is above average with reference to the norm group i.e. his/ her performance is equal to or greater than that of 84% of the students in the norm group.

SAQ 7 Imagine an interview aimed at testing speaking ability in EFL. Conditions: one rater; each testee was interviewed twice (test – retest reliability). What are the threats to this reliability?

Write your answers in the space provided above (in no more than 50 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

MEAN AND VARIANCE

Page 39: Tests and Evaluation -Metodology

Conditions of a Good Test

26 Proiectul pentru Învăţământ Rural

2.4.1.5 Test Length The test must be of sufficient length to yield reliable scores.

Usually, the longer the test, the more reliable the scores. If the teacher follows the table of specifications (i.e. the relative emphasis of each content area that usually reflects the relative importance to the instructional objectives), the three classifications of the cognitive domain (knowledge, understanding and application) should be indicated. e.g. knowledge 30%, understanding 40%, application 30%). The test should be valid if it is reliable. After taking these decisions and some others (test and item characteristics, test difficulty, test instructions and layout, and obviously scoring) all that is now required is to construct a test of sufficient length. The following factor should be considered:

• If a test is to be administered during a class section, it should be

constructed so that most of the examinees can easily finish it during the examination period.

• The age of the testees should also be considered and the item length should take into account the pupils’ schemata and attention span.

Summarily, a test should be long enough to be adequately

reliable and short enough to be administered. Hints:

• 35 to 45 items should be reliable for the average end –of-unit revision

• 75 or more items for a final examination • The time needed to answer test items varies with the grade level,

the type of items used, the difficulty of the items, the level of cognitive activity required

• A typical learner can usually answer about two knowledge items per minute, one application or understanding item per minute

• Allow ten minutes to distribute materials, explain procedures and collect materials

• A teacher can utilize only 40 minutes of a 50 minute class period if he/ she administers an examination to 35 to 50 multiple – choice items

• 2 or 3 written true or false items, 2 or 3 matching items, 1 or 2 completion items may be answered in 1 minute

When the testee must give the answer, the amount of time depends

on the amount of thinking time and the amount of writing involved.

• Do not include more than 6 – 7 essay questions per hour • At the elementary level, allow more time per item • Rely on your personal experience

HINTS ABOUT TEST

LENGTH

Page 40: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 27

SAQ 8 What general relationship exists between test reliability and the number of the items on the test?

Write your answers in the space provided above (in no more than 30 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Summarily, a test should be long enough to be adequately

reliable and short enough to be administered.

2.5 Discrimination

Another characteristic of a good test is discrimination i.e. a test has to have the power to discriminate between testees. This is not a problem with tests for learners at much the same level (e.g. class achievement tests). In order to discriminate reliably, the test should be fairy long. Short tests are not always able to discriminate. The solution is to require the testees who score highly on a short test given to the majority to take a further longer extension test that meets this condition.

Discrimination is also a property of individual items in a test. Each item should contribute to the discrimination power of the test as a whole. Item analysis answers the following questions: • Is an item answered correctly by candidates who answer most of

the rest of the items right? (good discrimination)

SAQ 9 What extraneous variables can be anticipated and controlled:

a. the sex of the examiner b. the uniformity of the procedures followed in administering

the test e.g. time, clearly specified instructions c. the uniformity of the procedures followed in scoring the test d. the explanations given by the examiner e. the examiner’s facial expression, tone of voice f. the manner in which the examiner presents the materials

Write your answers in the space provided above (in no more than 40 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 41: Tests and Evaluation -Metodology

Conditions of a Good Test

28 Proiectul pentru Învăţământ Rural

• Is an item answered correctly by poor testees and incorrectly by good ones? (poor discrimination)

• Do the items show a reasonably good record of agreeing with the overall score given by the rest of the items?

A sophisticated technique known as Item Response Theory models the response to items on the assumption that each item “has a level of difficulty associated with it”. More than that, IRT assumes that “the items can be ordered with respect to each other, as in a ‘power’ or ‘ladder’ test, and that candidates’ response can be expected to be inconsistent within the limits of their ability level. The consequence is that with those techniques comes a “more sophisticated approach to test design and a pre-administration”. In determining discriminability with sample separation, the first step is to separate the highest scoring group and the lowest scoring group from the entire sample on the basis of total score on the test in the high group. Then the following formula is applied:

64.047

7 =+

=+

=LcNc

NcD

where, D = discriminability, Nc = the number of correct responses Lc = the number of correct responses in the low group. The discriminability for item six is 0.64. A discriminability index of 0.67 is considered the lowest acceptable discriminability by this method.

2.6 Feasibility

Something that is feasible can be done, made, or achieved. This requirement has been made possible by a number of technological developments, such as

SAQ 10 Case study. Half of the testees pass a given item and half fail it. If we take difficulty into account, we would rate this item as an easy one. Unfortunately, the testees who passed the item were the weaker half of the testees, and those who failed the item were the better testees in the ability being measured. What is your conclusion? Write your answer in the space provided above (in no more than 35 words) and compare it to that in the “Answers to SAQs” section at the end of the unit.

Page 42: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 29

• computer working for large-scale tests, in which the testee marks the answer on a special computer readable sheet;

• tests run on PC’s; • tests using video presentation (e.g. BBC English Video Test); • Prerecorded oral tests such as Simulated Oral Proficiency

Interview, The Test of Spoken English (TSE).

2.7 Washback

Washback is the effect a test has on teaching in the classroom. It is true we are always recommended not to teach “toward” a test. However, we can use tests as teaching tools. Tests (especially formative tests) may be used as feedback devices that make teachers aware of the areas where the learners need improvement. Formal tests may be channels through which learners can receive a diagnosis of areas of strengths and weakness. Your prompt return of written tests with your feedback is a must if you really want to use washback positively. It is also important to comment upon your evaluation. Give praise for good strengths and offer constructive criticism of weaknesses. Give learning hints on how a learner might improve his performance. Encourage learners to seek clarification about their grades / scores.

Tests have the power of influencing over the method and content of language courses. Their backwash effect may be positive or negative. In their turn, the teachers who use such tests and testees who suffered the negative backwash may be able to influence the decision of the testing organizations who respond positively to positive feedback (e.g. the new TOEFL test reflects this type of feedback).

Washback can be positive (beneficial) or negative.

SAQ 11 What is affected in 1 and 2 below: reliability or validity? 1. a recording for an oral comprehension test is poor in quality 2. b. The quality of the recording is good. A group hears it played

under good acoustics conditions while another group hears it under poor conditions.

Write your answers in the space provided above (in no more than 40 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 43: Tests and Evaluation -Metodology

Conditions of a Good Test

30 Proiectul pentru Învăţământ Rural

2.7.1 Negative Washback

Examples of negative effects: • teaching is dominated by coaching for the testing session /

examination; • the test content and testing techniques differ from the objective of

the course; Examples of positive effects: • when motivation is increased;

2.7.2 Positive Washback

How can positive backwash on teaching and learning be achieved? • test the skill / abilities whose development you want to promote

(if you want to develop oral skills, then test oral skills); If tests set two kinds of task: compare / contrast, describe/ interpret. Then teaching will be concentrated on these tasks. Backwash is harmful in this case.

• employ direct testing (i.e. tasks / tests that are as authentic as possible);

• make testing criterion – referenced (norm – referenced testing makes teachers and learners assume that a certain percentage of candidates will fail the exam). Use a series of criterion – referenced tests representing different levels of achievement and allow learners to choose the tests they are able to pass. This will encourage positive attitude to language learning;

• Construct achievement test on objectives rather than on textbook content;

• Be sure that students understand what the test demands of them.

Point to Ponder

On the whole, if learners fail to learn it is the fault of the

teacher, the school, the curriculum, or poor curriculum.

SAQ 12 Diagnose a situation in which questions are difficult for the candidate to understand, or are eventually biased.

Write your answers in the space provided above (in no more than 30 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 44: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 31

2.8 Summary

The principal ideas, conclusions, and implications presented in

the chapter Conditions of a Good Test are summarized in the following statements: • Reliability is a necessary but not sufficient condition for validity; • Validity can be defined as the degree to which certain

inferences can be made from the test scores (or other measurements). Since a single test may have many different purposes, there is no single validity index for a test;

• Various factors affect validity; • Reliability is the degree of consistency between measures of the

same thing; • The different methods of estimating reliability consider different

sources of error. Which should be used depends upon how onewishes to use the results of the test;

• In general, longer tests are more reliable; • Reliability will be higher when a test is given to a heterogeneous

group.

2.9 Key Concepts

• construct validity • content coverage • content relevance • content validity • criterion related validity • curricula • discrimination • face validity • feasibility • predictive validity • reliability • reliable variance • the Split – Half Method • test length • test – retest Method • washback • validity

2.10 Checklist

Do you make questions short and clear? Do you accept student’s standard, however high or low, and set

about improving it in steps, by reinforcement and encouragement? Do you allow resubmission of unsatisfactory work?

Page 45: Tests and Evaluation -Metodology

Conditions of a Good Test

32 Proiectul pentru Învăţământ Rural

Do you ask your learners to evaluate their own work, and set themselves target?

Do you set each student achievable goals?

SAA No. 1 This activity aims at reviewing unit 2. Match the principles (numbers) with their main characteristics (letters) I. Principles: a) face validity; b) practicality; c) authenticity; d) content; e) validity; f) reliability; g) washback; h) discrimination. II. Caracteristics: 1. a well - constructed format with familiar tasks; timing is clear; uncomplicated items; crystal clear instructions; a difficulty level that presents a reasonable challenge; 2. tasks that relate the course work of the learners 3. spending classroom time after reviewing the content; students discover their areas of strength and weakness; asking students to use test results as a guide to setting goals for their future effort; items can serve in diagnostic capacity; 4. the language in the test is as natural as possible; contextualized items; tasks represent real-world tasks; 5. objective scoring procedures; classroom conditions are equal and optimal for all students; 6. te test is not expensive and stays within appropiate time constraints; relatively easy to administer; a scoring procedure that is specific and time efficient; Please not that each correct answer will count for 15 points. 10 points will be given for ordering the principles function of their importance. The maximum score for this assignment is 100 points. Do not forget to send your answers to your tutor in due time.

Page 46: Tests and Evaluation -Metodology

Conditions of a Good Test

Proiectul pentru Învăţământ Rural 33

2.11 Answers to SAQs SAQ 1

If your answer to SAQ 1 is not comparable to the one suggested below, please reread section 2.2 again. The answer might consist of a number of brief sentences: • When I assess my pupils I aim at educating them. • I ensure that my pupils understand the aim of my assessment. • Learning is complex and multidimensional. So is assessment. • Use tests that reveal change and growth. • The goals of assessment should be clear and shared. • Curricula, teaching, students’ effort should aim at reaching a certain outcome. • My assessment is fair. • I use formative tests. • I prefer criterion – referenced tests. Traditionally, the characteristics of a good test have been seen to be validity, reliability, discrimination and feasibility.

SAQ 2

If your answer to SAQ 2 is not comparable to the one suggested below, please reread section 2.2.3 again. Face validity is useful from a public acceptance standpoint. Untrained people who look at or take the test should think the test is measuring what its author claims. If a test appears irrelevant, examinees may not take the test seriously, or potential users may not consider the results useful.

SAQ 3

If your answer to SAQ 3 is not comparable to the one suggested below, please reread section 2.3.6 again. The poor correlation may be attributed to the effect of test method factors since the two highly correlated tests of different abilities shared the same multiple choice test method.

SAQ 4

If your answer to SAQ 4 is not comparable to the one suggested below, please reread section 2.3 again. 1 – b, 2 – c, 3 – d, 4 – a

SAQ 5

If your answer to SAQ 5 is not comparable to the one suggested below, please reread section 2.4 again. T: 1, 4, 5, 6 F: 2, 3

SAQ 6

If your answer to SAQ 6 is not comparable to the one suggested below, please reread section 2.4.1 again. Removal of all error leaves only the testee’s true score.

Page 47: Tests and Evaluation -Metodology

Conditions of a Good Test

34 Proiectul pentru Învăţământ Rural

SAQ 7 If your answer to SAQ 7 is not comparable to the one suggested below, please reread section 2.4.1.4 again.

• fluctuations in the interviewer (fatigue, anxiety, mental awareness) • fluctuations in the testee (fatigue, boredom, changing attitude

towards the interviewer) • fluctuations in the administration of the interview (length, time,

place)

SAQ 8 If your answer to SAQ 8 is not comparable to the one suggested below, please reread section 2.4.1.5 again. In general, reliability increases as the number of items has increased, up to a point of asymptote, where little is gained through the addition of new items.

SAQ 9 If your answer to SAQ 9 is not comparable to the one suggested below, please reread section 2.4.1 again. The reliability and validity of a test depend on the uniformity and standardization of the procedures. We can anticipate and minimize: b, c, d, f. We cannot control a,e (although such variables cannot be controlled, their influence should be taken into consideration)

SAQ 10 If your answer to SAQ 10 is not comparable to the one suggested below, please reread section 2.5 again. The item is not suitable. If a test comprises only such items, a high score would be an indication of inability and a low score would be an indication of comparative ability.

SAQ 11 If your answer to SAQ 11 is not comparable to the one suggested below, please reread sections 2.3 and 2.4 again. In the first case, the recording is poor in quality for all testees. In this case we may speak about the invalidity of the test. In the second case, we may speak about unreliability and therefore invalid.

SAQ 12 If your answer to SAQ 12 is not comparable to the one suggested below, please reread section 2.3 again. Validity is compromised. It is common for teachers to confuse poor learning with a student’s difficulty in understanding examination questions

2.12 Further Readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 10-16 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge

University Press, pp 22-48

Page 48: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 35

Unit 3 TYPES OF TESTS I

3.1 Unit Objectives ........................................................................................................... 35 3.2 Informal Assessment .................................................................................................. 36 3.2.1 Informal Assessment of Speaking ........................................................................... 37 3.2.2 Informal Assessment of Writing ............................................................................... 38 3.2.3 Informal Assessment of Listening ............................................................................ 38 3.2.4 Informal Assessment of Reading ............................................................................. 39 3.2.5 Informal Assessment of Non – Linguistic Factors .................................................... 39 3.2.6 Informal Assessment of Grammar and Vocabulary ................................................. 39 3.3 Formal Assessment - Types of Tests and Testing ...................................................... 40 3.3.0 Classification by Stimulus Material .......................................................................... 40 3.3.1 The purpose, or use, for which they are intended i.e. the types of decisions to be

made function of the scores .................................................................................... 41 3.3.1.1 Selection Tests ..................................................................................................... 41 3.3.1.2 Entrance Tests ..................................................................................................... 43 3.3.1.3 Readiness Tests ................................................................................................... 43 3.3.1.4 Placement Tests ................................................................................................... 43 3.3.1.5 Diagnostic Tests ................................................................................................... 44 3.3.1.6 Progress Tests ..................................................................................................... 45 3.3.1.7 Achievement/ Attainment Tests ............................................................................ 46 3.3.1.8 Mastery Tests ....................................................................................................... 46 3.3.2 Function of Content ................................................................................................. 48 3.3.2.1 Proficiency Tests .................................................................................................. 48 3.3.2.2 Achievement or Attainment Tests.......................................................................... 52 3.3.2.3 Aptitude or Prognostic Tests ................................................................................. 53 3.3.3 The frame of reference ............................................................................................ 54 3.3.3.1 Norm-Referenced Tests ....................................................................................... 55 3.3.3.2 Criterion – Referenced Tests................................................................................. 56 3.4 Summary .................................................................................................................... 57 3.5 Key Concepts ............................................................................................................. 58 3.6 Checklist ..................................................................................................................... 58 3.7 Answers to SAQs ....................................................................................................... 58 3.8 Further Readings ........................................................................................................ 60 3.1 Unit Objectives

Just as there are many purposes for which language tests are developed, so there are many types of language tests. Some types serve a variety of purposes, while others are more restricted in their applicability. If we were to consider all kinds of language tests, the remainder of this book might not suffice. However, there are some broad groups of tests that deserve description and explanation. Many stand in opposition to one another. We cannot but recognize that there is much overlap.

Page 49: Tests and Evaluation -Metodology

Types of Tests I

36 Proiectul pentru Învăţământ Rural

Many people still view tests as the best available means of determining what people can do. Others think that they are narrow and restrictive as they do not measure innovation, social skills, and qualities of leadership. As we describe and examine various kinds of tests, we will look at the evidence on both sides of this controversy. At the end of this unit, you will be able to:

• distinguish among the following concepts: informal assessment, formal assessment, and self- assessment

• learn how to informally assess students’ language skills • be aware of the wide choice you have • make changes in the way you assess and test your learners • realize that testing aids you by helping: a) to provide knowledge concerning the students’ entry

behaviours; b) to set, refine, and clarify realistic goals for each student; c) to determine the degree to which objectives have been

achieved; d) to determine, evaluate and refine your instructional techniques. • understand how testing aids the student by: a) communicating the goals of the teacher; b) increasing motivation; c) encouraging good study habits; d) providing feedback that identifies his/ her strength and

weaknesses.

3.2 Informal Assessment

Informal assessment is carried out by the teacher in normal classroom conditions. Characteristics

• a way of collecting information • a kind of continuous assessment (an academic year or more) • the result of systematic observation

Where? In and outside the classroom (looking at samples of learner’s work/ portfolio)

What? Linguistic and non – linguistic factors How? Establishing what we are going to assess:

• criteria for assessing learners (do not rely only on impressions) • link informal assessment with formal assessment (tests) and with

self – assessment Why? To help learners identify difficulties, to give students positive

feedback, to develop students’ awareness

It is useful to consider what things we are going to assess formally and which factors we are just going to get an impression of.

Page 50: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 37

SAQ 1 Which of the following items are assessed informally or formally? Circle in the margin I (informally) or F (formally)

Written homework I F Written grammar activities I F Speaking I F Projects I F Portfolios I F Listening tasks I F Reading tasks I F Writing tasks I F Vocabulary activities I F Attitude / effort I F Participation in class I F Group work I F Pair work I F Organization of work I F Presentation of work I F Circle in the margin I (informally) or F (formally). Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

3.2.1 Informal Assessment of Speaking

Assessing speaking informally is important when you have practical difficulties in organizing oral tests. It is a way of providing positive feedback and motivation to the learners. The criteria you may take into account are: fluency (speed/ hesitations), the relevancy and appropriacy of the message, grammatical and lexical accuracy, pronunciation (sounds, intonation, and rhythm)

How • walk around the classroom monitoring pair work or group work,

thus learning about your learners’ pronunciation problems, their intonation

• give learners points based on pre-established criteria SAQ 2

What kind of assessment is favoured by teachers? Circle in the margin I (informally) or F (formally)and compare your answers to those at the end of the unit. A. who teach small classes I F B. who teach small classes and have more than two hours a week I F C. teach large classes and have two hours of English each week I F Circle in the margin I (informally) or F (formally). Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

Page 51: Tests and Evaluation -Metodology

Types of Tests I

38 Proiectul pentru Învăţământ Rural

3.2.2 Informal Assessment of Writing

Assessing your learners’ written work can be very time consuming. It follows that a teacher, in order to avoid neglecting other aspects of teaching, has to take a number of decisions: • correct the most important pieces of writing • organize group writing activities (in this case, you correct only a

limited numbers of essays) • link informal assessment with formal assessment • avoid unreliability (which is often the case when you correct essays) • establish clear criteria (for a 5 band scale):

5. Excellent writer - speaks fluently, no errors, little hesitation 4. Good writer - speaks quite fluently, not many mistakes 3. Modest writer - some difficulties, limited structures, difficult to understand 2. Marginal writing - difficulty in speaking, almost incomprehensible

1. Poor writer - unable to use vocabulary, grammatical structures

3.2.3 Informal Assessment of Listening

We can informally assess learners’ listening comprehension abilities by: • observing which learners seem to understand • getting an overall impression of what they have understood • looking at the learners • monitoring pair work activities • assessing learners’ reactions to instructions • asking for a show of hands i.e. how many learners have put up

their hands • going through the answers one by one • asking learners to summarize what they have heard • using a recorded text for a speaking activity • using TPR techniques

SAQ 3 Is informal/ impressionistic assessment reliable? Answer the following questions.: What kind of mark do you give a composition? a. when you are tired H L b. on a Friday H L c. on a Monday H L d. at the beginning of the activity (when you have to correct 50 essays) H L e. at the end of the activity H L Circle the letter in the margin L (lower mark), H (higher mark). Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

Page 52: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 39

3.2.4 Informal Assessment of Reading Reading can be assessed informally by: • observing • checking class understanding of several important points • using a reading text for developing speaking and writing abilities • using a reading text for role- play • asking for a summary in Romanian • assessing learner’s personal opinion

3.2.5 Informal Assessment of Non – Linguistic Factors

Non – linguistic factors are important in assessing learners’ overall educational development, in encouraging personal effort. Informal assessment of non-linguistic factors implies assessment of: • attitude (passive versus active learner) • co-cooperativeness (ability to work with other people in group work) • independence ( able to use dictionaries, other language materials) • creativity (original through initiative)

Informal assessment of non – linguistic factors is carried out: • by observing learners in class and giving an impression or rating

them using a band scale • by collecting vocabulary notebooks and marking them • by using peer assessment of group work

3.2.6 Informal Assessment of Grammar and Vocabulary

• By observing and identifying problems students are having • By observing what they are doing while they perform speaking and

writing tasks • Going round the class and writing down the most important mistakes • By organizing language awareness exercises: “What is wrong with

this sentence?”

Points to Ponder

Reflect upon some of your arguments against formal proficiency tests.

• Infants do not need to be tested formally. We interact with them offering them comprehensible input. i.e. we subtly adjust to their level of proficiency

• Testing erects a barrier between us and our pupils • The results of testing are often used in ways that cause learners

/ teachers pain • The test is artificial, destructive • Reward acquisition and not test results • Help students understand that testing is needed and that they

have to accept it • Humanize the experience (teachers must openly comment on

test results when they seem to misrepresent abilities) If you are not happy with the test, make suggestions to testing organizations (local inspector, Ministry of Education and Research, textbook authors)

Page 53: Tests and Evaluation -Metodology

Types of Tests I

40 Proiectul pentru Învăţământ Rural

3.3 Formal Assessment - Types of Tests and Testing

Language tests can be classified according to some distinctive criteria: 3.3.0. The types of stimulus material used to present the problems

to the learners: verbal and non-verbal 3.3.1. The purpose, or use, for which they are intended 3.3.2. The content upon which they are based 3.3.3. The frame of reference within which their results (scores) are

to be interpreted 3.3.1. The purpose, or use, for which they are intended i.e. the types of decisions to be made function of the scores

• Tests with regard to admission decisions 3.3.1.1. Selection tests 3.3.1.2. Entrance tests 3.3.1.3. Readiness tests

• Tests with regard to identifying the appropriate instructional level or the specific language areas in which instruction is needed:

3.3.1.4. Placement tests 3.3.1.5. Diagnostic tests

• Tests with regard to decisions about how learners should proceed through the language programme, or how well they are attaining the programme’s objectives:

3.3.1.6. Progress tests 3.3.1.7. Achievement tests/ attainment tests 3.3.1.8. Mastery tests 3.3.2. The content upon which they are based (tests may be based on a certain theory of language or a specific domain of content) 3.3.2.1. proficiency tests 3.3.2.2. achievement or attainment tests 3.3.2.3. aptitude or prognostic tests 3.3.3. The frame of reference within which their results (scores) are to be interpreted 3.3.3.1. norm-referenced tests 3.3.3.2. criterion – referenced tests

3.3.0 Classification by Stimulus Material: verbal – nonverbal

There are many instances where the stimulus material used to present the problem to the student need not be verbal. The stimulus can be pictorial or (in humanities, art courses, foreign languages, mainly at an elementary level), a recording (in a musical test). Although nonverbal stimulus material items are infrequently used in the classroom, this does not mean that they are not a good medium to use.

CLASSIFICATION CRITERIA

Page 54: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 41

3.3.1 The purpose or use for which they are intended i.e. the types of decisions to

be made function of the scores

3.3.1.1. Selection Tests

A selection test is a special form of placement test. It excludes learners who are below a certain percentage. It selects candidates for a particular job or course of study (success or failure depend on the number of places available). Proficiency tests are often used for selection.

The true-false test is the most popular among the selection type examinations with classroom teachers.

Weaknesses:

• Its fifty – fifty chance of guessing the correct answer encourages students to guess wildly

• It does not discriminate well between those examinees receiving the highest score on the total test and those receiving the lowest score

• It is not as reliable as a multiple choice test of equal length • It is quite difficult to develop statements which can be answered

absolutely true or false • It is seldom applicable to the measurement of complex

understandings and other higher – order mutual processes Strengths. The true-false test may be: • Rapidly and accurately scored by individuals unqualified to teach

the subject matter area being examined • The scoring is completely objective • Extraneous factors have no influence on test scoring • It can be administered relatively quickly (less time per item is

required to answer true-false questions in compositions with any other item – type)

• It takes less time to construct and refine the items • The item statement need not include instructions on how to

respond • It is very useful in situations where the measurement of the

acquisition of factual, non – interpretative information is desired (vocabulary, technical terms, formulae, dates, proper names)

Construction

• Select from the table of specifications the areas that can be successfully tested by the true-false test

• Write each item on a separate 3x5 piece of paper (it is easier to place the items in the desired order on the test)

• The true – false item consists of: a statement, a disagreement with the statement

• The testee is instructed to mark the statement true or false, right or wrong, yes or no at the beginning of the test

Page 55: Tests and Evaluation -Metodology

Types of Tests I

42 Proiectul pentru Învăţământ Rural

Examples: The following items are true-false questions. If the statement is true circle A on your answer sheet; if the statement is false, circle B on your answer sheet. Be sure that the item number that you are marking on the answer sheet corresponds to the item number of the question you are answering: 1. According to the cognitive theorists, a learner will learn by heart if

he lacks a cognitive structure. A B 2. According to the cognitive theorists, learning something new is a

matter of seeing where it “fits” in. A B

In another variety of this item type, the statement may be: True (T), False (F), True or False (TF). A separate answer sheet should be used.

Example: All sides of a square are equal. T F TF

Rules for constructing true- false tests

• Be sure that the item is absolutely true or completely false (except when using the TF category)

• The true-false statement should possess one and only one central theme and should be free from ambiguities

• A test should contain the same number of true and false statements

You should avoid: • Negative statements • Irrelevant clues (all and none should be used with caution) • Qualifying clues (a long sentence) SAQ 4 Rewrite the following items:

1. In Blake’s “The Lamb”, the lamb does not stand as a symbol of a child.

2. Only a few men have been elected presidents of the US, after having been defeated for that office.

Corect the items in the space provided above and compare them to those in the “Answers to SAQs” section at the end of the unit.

The following sentence may (may not) contain grammatical

errors. In front of each item are a T and a F. If the sentence is grammatically correct thoroughly, mark out the T. If it is not grammatically correct, mark out the F. TF 1. I heard you was a wedding party. TF 2. The rare tires wore out.

Page 56: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 43

Sometimes, we want testees to identify the false element and

correct it. Disadvantage: the speed of the response is reduced; fewer items can be constructed in a given period of time. It takes longer to score it. Sometimes an answer not anticipated by the teacher is given (objectivity is reduced).

SAQ 5

Consider each statement below. Circle T if it is a do and F if it is a don't: 1. include items to adequately sample the material T F 2. include a Table of Specifications to assure adequate T F

comprehension 3. use questions which are partially true and partially false T F 4. use unnecessary words and phrases T F 5. write concise, unambiguous and grammatically correct T F

statements 6. have more than one theme in the item T F 7. have a pattern in the order of the response T F 8. have approximately the same number of true and false T F

statements 9. use negative statements T F 10. use the qualifying terms: all, none, some, few, many T F Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

3.3.1.2. Entrance Tests

They are used to admit pupils to a certain school. They protect admitting institutions and student funding agencies from too high a failure rate. Tests are used by universities and other educational institutions to assess the proficiency and predict the readiness of applicants to benefit from instruction given in the foreign language.

Examples of entrance tests are ACT (The American College Testing Programme), SAT (The Scholastic Aptitude Test) that is required for admission to many colleges in the USA. Applicants to law schools and medical schools must pass special admission tests e.g. LSAT (The Law School Admission), MCAT (The Medical College Admission Test)

3.3.1.3. Readiness Tests They assess whether a child is ready to benefit from instruction

in general or from instruction aimed at acquiring a certain skill e.g. reading readiness.

3.3.1.4. Placement Tests Closely related to the notions of diagnosis and selection is the

concept of placement. A placement test is a test which is designed to place students at an appropriate level or stage in a programme or language course. Such tests are used to assign learners to groups at different levels. The term refers only to the purpose for which it is

Page 57: Tests and Evaluation -Metodology

Types of Tests I

44 Proiectul pentru Învăţământ Rural

used. Various types of tests or testing procedures (e.g. dictation, interview, grammar test) can be used for this purpose. Such a test is used to assign language learners to one of the following levels: beginner, lower intermediate, middle intermediate, upper intermediate, advanced. The UCLA Placement Exam is used to assign students to all levels and screen students with extremely low English proficiency for participation in regular university instruction.

Placement tests designed by teachers are usually successful because they take into account the particular situation of their school. They should be simple, easy to administrate and quick to mark.

3.3.1.5. Diagnostic Tests

Diagnostic tests are designed to show what knowledge or skills a learner knows and does not know i.e. the strengths and weaknesses in learning abilities of the students. As they try to find out problem areas, diagnostic tests are important for teachers in order to design mastery learning and work out remedial activities. The data are also useful for self-assessment. For example a pronunciation test may become a diagnostic test if it tries to identify which sounds a learner is or is not able to pronounce. Few tests serve only as diagnostic tests. Achievement and proficiency tests may be useful for diagnostic purposes.

Areas of focus that may serve for diagnostic purposes • Phoneme discrimination tests • Grammar and usage tests • Controlled writing tests

They are usually used at the beginning of a language course.

Diagnostic tests are based on error analysis and deficiency analysis (on learners’ language deficiencies).

Diagnostic tests • offer feedback to the learners • are set after about eight hours of instruction • are not longer than 15’ • can be marked by the learners themselves • motivate • reduce anxiety about later summative tests • quickly diagnose errors • prevent “compound errors” of learning (a week’s poor learning

makes next weeks learning all the more difficult) • are not used for grading or judging

Corrective help • mastery learning also involves a self-correcting system • retakes are allowed • learners are advised to use appropriate instructional materials • correction in group after the test • out-of-class meeting to clear up difficulties • encouraging family, friend to help • corrective learning and retesting continues until mastery has been

achieved

Page 58: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 45

3.3.1.6. Progress Tests

A progress test is a small scale test (quiz) and it is an achievement test linked to a textbook/ a set of teaching materials. Progress tests are tests prepared by a teacher on the basis of a textbook/ curriculum/ a particular course of instruction given at the end of a unit, chapter, course or term. Besides being more specifically focused, they are narrower in scope than achievement tests. They are usually designed by the class teacher who can fully take into consideration the knowledge of the learners, the programme which they have been following, his/ her own particular aims and goals.

Teachers should learn how to construct such tests as they are extremely useful. Such tests • are based on the language programme or curriculum (textbook,

workbook) which the class has been following • assess learning and teaching • familiarize the teacher with the progress of each of his students

and of the whole class • have positive and motivating backwash effect • reinforce what has been taught • allow the learners to show what they have learned • show high scores if progress has been made • do not require a wide range of performance as in the case of

standardized achievement or proficiency tests (Gauss’s bell/ shaped curve or the Gaussian curve/ normal curve showing a distribution of probability associated with different values of a variant is not valid in this case).

Progress tests are widely used, as they try to measure the

extent to which the pupils have learned what has been taught. They are usually constructed by the class teacher who can fully evaluate it, taking into account:

• his knowledge of the students; • the programme which they have been following; • his own particular aims and goals.

SAQ 6 Doing Well on Tests Read each of the following questions and circle the correct answer. 1. When should preparation for taking a test begin?

a. when the teacher announces there will be a test b. the first day of class c. the night before the test

2. Which of the following is not an effective way to prepare for a test?

Page 59: Tests and Evaluation -Metodology

Types of Tests I

46 Proiectul pentru Învăţământ Rural

3.3.1.7 Achievement or Attainment Tests

An achievement/ attainment test aims at showing how much of a language has been learned with reference to a particular study programme in accordance with explicitly stated objectives of a learning programme. It differs from the proficiency test which is not linked to any language programme or language syllabus. The two terms are sometimes used interchangeably. Those who differentiate between the two terms emphasize the fact that the achievement test is based on past learning or a textbook while an attainment test on what the learner can do now irrespective of his past learning. A more useful distinction is that between achievement tests and proficiency tests (related to a particular purpose). Achievement tests may be traditional and innovative. They may be used for certification of learned competence. Achievement tests may become diagnostic tests if they isolate learning deficiencies in the learner with the intention of remediation.

3.3.1.8. Mastery Tests

Formative Assessment refers to the process of providing information to curriculum developers during the development of a curriculum or programme. It is also used in syllabus design and the development of language teaching programmes and materials. A formative test is given during a course of instruction. It provides feedback to the teacher and the student. It tests only what has been taught. The score shows whether the student needs extra work. It is a pass or fail. If a person fails he or she is able to do more study and take the test again. All tests in our schools should be criterion referenced. Criterion referencing is ideal for mastery objectives.

In order to avoid the usual 30 to 40 percent failure, teachers should adhere to formative assessment by • allowing pupils as much individualized instructions as they feel

they need; • allowing them as much practice as they feel they need; • defining the skill the learners need in order to pass;

a. go over your notes, underlining key words and phrases b. reread all of the material that might be covered on the test c. write memories and make outlines of what you have

learned

3. Which method should you use when you design a test? a. answer the more difficult questions first, since they will take

the most thought b. answer the questions in the order in which they are asked,

so as not to take extra time deciding which question to answer first

c. answer the easiest questions first; then tackle the ones that are more difficult for you

Compare your answers to those in the „Answers to SAQs” section at the end of the unit.

Page 60: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 47

• practice should be focused on well defined criteria in a checklist, a list of competences;

• identifying the causes of failure (gaps in the learning of the pupils); • allowing any number of retakes.

B.S. Bloom has shown that as some people take five or six times longer than others to learn something, the solution is appropriate instruction. Intelligence and aptitude tests are measures of how quickly students can learn or measures of what they can learn.

The learner needs, according to B.S. Bloom: • effort (the pupil should try hard enough for long enough); • quality instruction • awareness of errors • more time (“the key to many learning situations”)

Characteristics of mastery testing • in mastery testing, the person has either achieved (mastered) the

objective satisfactorily or not • typically, the objectives sampled in a mastery test are more narrow • mastery tests are used in programmes of individualized instruction

where a mastery-learning model is employed • the mastery – learning model suggests that the degree of learning

is a function of the time spent on the material • if degree of learning is fixed at some mastery level, then the

amount of time individuals must spend to reach this level will vary. The most rapid learners learn about 6 times as fast as the slowest learners

• mastery tests are useful at the early elementary school level

Mastery learning offers 90 percent success. How is mastery learning organized?

First stage – defining mastery objectives i.e. objectives attainable by all the class, after several hours of instruction and corrected practice Writing mastery objectives

The learner should be able to list, to recall, to recognize ….. The learner should be able to differentiate between, to summarize, to evaluate …… Mastery learning is based on the truth that all learners can master a subject given sufficient time. Points to Ponder

There is a widespread belief that the success of Asian education system depends in part on their “zero tolerance of failure”. They adopt mastery learning style, “diagnostic testing” followed by correction action. Geoffrey Petty

The essence of mastery learning strategies is group instruction supplemented by frequent feedback and individualized corrective help as each student needs it”. B.S. Bloom, “Evaluation to Improve Learning”

Page 61: Tests and Evaluation -Metodology

Types of Tests I

48 Proiectul pentru Învăţământ Rural

Mastery learning implies

• individualized instruction (step-by-step approach especially for the corrected practice phase of learning – projects, open-ended activities allow learners to work at their own pace)

• competence only in the basic use of skills or knowledge (stretching activities that are not time-killers and that do not require extra teaching assistance are recommended)

• personalized extra-work for students who experience difficulties • peer tutoring • access to reading materials at different levels • awareness of what mastery learning means

3.3.2. Function of Content

A test may be based on a certain theory of language (e.g. structuralist, communicative) or a specific domain of content.

3.3.2.1 Proficiency Tests Proficiency tests or attainment tests are theory – based tests.

They are most often global measures of ability in a language and are not necessarily developed with reference to some previously experienced course of instruction. They are used for placement or selection because of their power to spread students out on a proficiency range within the desired area of learning.

What does proficiency tests mean? • summative test • at the beginning of the century it meant the present level of the

learner’s proficiency i.e. limited or advanced proficiency made up of a combination of reading and writing skills, but also translation abilities, grammatical knowledge, vocabulary range etc. The total or aggregate proficiency of the learner used to be assessed by combining the scores from separate tasks or sub-tests.

• After the Second – World War the concept of unitary proficiency was favored i.e. a proficiency test based only on a single factor

SAQ7 What are the differences between a progress test and an achievement test? Write your answers in the space provided above (in no more than 20 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

WHAT DOES MASTERY LEARNING IMPLY?

Page 62: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 49

e.g. the concept of expectancy “grammar” that samples only a limited range of skills (translation, grammar knowledge). It has been abandoned mainly because of its negative washback effect.

• After 1980s the concept of proficiency was used to refer to the proficiency to do something with the language that has been learned. This concept is related to ESP: a doctor has to be proficient in English for medicine. The first test of proficiency was Cambridge CPE developed for foreign teachers of English. It is specific for applied proficiency which does not contradict a communicative view of language. It has good face validity (it meets the needs of the learners) and a positive washback effect.

There are three concepts of language proficiency

• Aggregate proficiency – the learner’s present level of language mastery, as demonstrated in his/ her ability to carry out a range of language tasks. It is also used as a summative or even as an achievement test. It has good washback effect.

• Unitary proficiency – an underlying level of proficiency or language competence which can be applied to any language operation. Because it uses only one technique (cloze, dictation, etc) it has a negative washback effect. “Dirty tests” often adopt this unitary approach.

• Specific/ applied proficiency – an externally defined level of language needed for a particular job or academic course. This concept of proficiency is widely accepted. It has good face validity (e.g. ESP test).

In discussions of proficiency testing, we refer to specific or applied proficiency. The main function of most proficiency tests is the same as that of the placement/ selection testing. Proficiency tests have always been regarded as summative tests (given at the end of a language course). Initially, proficiency was seen as a complex combination of various skills: reading, writing, translation, grammatical knowledge, vocabulary range. The learner’s total or aggregated proficiency is the result of combining the scores from separate sub-tests. For instance, the Cambridge Syndicate Certificate of Proficiency in English – CPE (1913) consisted of five papers lasting about six hours. Point to Ponder Dictation is the bête noire of teaching methods, but students often find it quicker and easier than copying. However, dictation is a disaster for slow writers and bad spellers.

The second definition (Oller, 1979), unitary proficiency, is like

Chomsky’s notion of linguistic competence – a single factor which can be applied to all aspects of language performance. If language proficiency is a single or unitary factor, then a proficiency test need only identify and assess that factor and that factor alone. For Oller,

LANGUAGE PROFICIENCY

Page 63: Tests and Evaluation -Metodology

Types of Tests I

50 Proiectul pentru Învăţământ Rural

the factor was expectancy “grammar” (the fact that we re able to complete a sentence or to identify and correct a language form). A test based on a single factor is built on the theory of unitary proficiency.

Nowadays, proficiency means a person’s ability in using a language for a specific purpose. A proficiency test measures how much of a language someone has learned. A proficiency test is not linked to a particular syllabus or course of study, but measures the learner’s general level of language mastery. Although this may be the result of a particular course of instruction, the latter is not the focus of attention.

Some proficiency tests have been standardized for worldwide use, such as the American TOEFL Test which is used to measure the English language proficiency of foreign students who wish to study in the USA. The specific purpose is to answer the question: Does the student know enough English to follow a lecture in English?

Proficiency tests should be based on a specification of what skills candidates have to be able to do in a language in order to be proficient i.e. able to function in a foreign language for a particular purpose e.g. a test is designed to determine whether a candidate’s English is good enough to function as a guide or translator.

However, there are standard proficiency tests that do not aim at a certain occupation. They are administered to candidates from different schools: • Cambridge Examinations: First Certificate Examination,

Proficiency Examination • Oxford Examination: Preliminary and Higher

They show whether a candidate has reached a certain standard with respect to certain abilities. Each proficiency test should be based on detailed specifications. The selection of the best by teachers, employers should be based on this list. All proficiency tests are not based on courses that candidates might have previously attended. If achievement tests look back (what has been learned from a teaching programme or course), a proficiency test looks forward. It answers the question: Will the student be able to solve a particular task which he/ she will be required to perform in English?

SAQ 8 The main function of most proficiency tests is the same as one of the other principal test types. Which?

• Aptitude testing • Placement/ selection testing • Diagnostic testing • Progress testing • Communicative testing

Circle the answer. Compare it with that in the „Answers to SAQs” at the end of the unit.

Page 64: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 51

Communicative Tests. Communicative testing aims at testing

communicative proficiency. The test items of this type of testing use communicative events i.e. items that are related directly to language use; authentic task, knowledge of language function and appropriateness of expression to social situation; knowledge of structure and word meanings. Communicative testing allows the testee some choice of what to communicate or what level of proficiency to be tested on in certain skills; up to date texts representative of the testee’s intended use of language.

Communicative testing aims at assessing communicative competence which is made up of grammatical competence (lexical knowledge of items, rules of morphology), socio-cultural competence (knowledge of the relation of language use to its non-linguistic context and communicative functions, coherence and cohesion), strategic competence (verbal and non-verbal communication strategies that may be called for breakdowns in communication due to performance variables and to insufficient knowledge), and discourse competence. This model has exerted a considerable influence on all aspects of language teaching and assessment, including overall approach, syllabus design, and methodology and testing.

The main characteristics of the communicative tests

Communicative tests are integrative rather than discrete point. Communicative testing answers the full range of language skills:

• Administratively costly as they involve large numbers of markers • The use of yardsticks reduces the subjective elements: 9 – expert

user, 8 – very good user, 7 – good user, 6 – competent user, 5 – modest user, 4 – limited user, 3 – extremely limited user, 2 – intermittent user, 1 – non-user. Each band has a brief description of the expected language performance

• They will test knowledge of the language rather than knowledge of the elements of the language

• The students will have to produce language rather than simply recognize appropriate language

• The learner will have to respond by using their own language rather than merely the examiner’s language

• Are tests of actual performance • Realistic. There is a real world purpose • Authenticity • Real world tasks • Based on an information gap • The learner uses strategies which are part of the communicative

competence

Examples of simple communicative items: a) Show that you can’t, or don’t believe what the other person is telling you.

COMMUNICATIVE COMPETENCE

COMMUNICATIVE TESTS

Page 65: Tests and Evaluation -Metodology

Types of Tests I

52 Proiectul pentru Învăţământ Rural

b) Show you are annoyed. Use some phrases which are strong and firm but without swearing. Answers: a) Come off it!

You’re pulling my leg. That’s not true. You are having me on. You can’t really mean that!

b) Push off! Get lost! I’ve had enough! Just stop it. Leave me alone! I’ve already told you I don’t want to discuss that. Who on earth told you that? Where on earth have I put my keys (books) etc?

3.3.2.2 Achievement Tests

Achievement tests are related to a total syllabus or to classroom lessons or units. Most school examinations – secondary school entrance tests, school certification examinations – take the form of achievement tests. National examinations organized by the Romanian Ministry of Education and Research are achievement tests.

SAQ 9 Identify the main characteristics of proficiency tests in the modern sense of the word. Tick the correct statements: Proficiency tests:

a. are based on a language syllabus b. look forwards c. look backwards d. are based on an analysis of the language learner

e. has positive washback effect f. have a negative washback effect g. are standardized

h. are made by teachers i. are used for diagnostic purposes j. are used for certificates k. are used for selection

Compare your answers with those in the „Answers to SAQs” at the end of the unit.

Page 66: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 53

Innovative (communicative) achievement tests look backwards and forwards. They look backwards because they are based on a textbook/ syllabus that has been taught/ learned. They look forwards because they test whether the learner is able to transfer his/ her language knowledge or abilities to the world outside the walls of the classroom.

Revision tests. You can also ask students to learn for a short test. This works best if the “doing” detail is very precise. For example, don’t say Read for a test next week, but I want you to learn the following… The learning task should be very well defined. Then, there is no excuse for failure. The revision test should be achievable by the great majority. The success should be rewarded with praise and recognition. Students get a real feeling of achievement from success in tests and this fuels future motivation.

3.3.2.3 Aptitude/ Prognostic Tests

Aptitude (prognostic) tests are based on abilities that are related to the process of acquisition, rather that the use of language. Such tests measure the testee’s probable performance in a language he/she has not studied. They measure the suitability of a testee for a specific programme of instruction or a particular job. Sometimes they are used synonymously with intelligence tests or scientific tests.

The theory of language aptitude, as described by Carroll (1956-1981) hypothesizes that cognitive abilities such as rote memorization, phonetic coding, and the recognition of grammatical analogies are related to an individual’s ability to learn a second or foreign language, and together constitute language aptitude. Language learning aptitude is also related to intelligence, age, motivation, phonological sensitivity and sensitivity to grammatical patterning. As you know, all these elements vary greatly from one pupil to another.

SAQ 10

a. Achievement tests are sometimes described as summative. Why?

b. How do we judge good achievement tests? Write your answers in the space provided above (in no more than 60 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

LANGUAGE APTITUDE

Page 67: Tests and Evaluation -Metodology

Types of Tests I

54 Proiectul pentru Învăţământ Rural

The Modern Language Aptitude Test (Carroll and Sapon, 1958) and Pimsburs’s Language Aptitude Battery have been recently severely criticized: they do not measure language aptitude but the general intelligence or academic ability. These tests disregard learning strategies and styles, context, motivation and determination. A language aptitude test may be used to predict the likelihood of success of a candidate for instruction in a modern language. It is made up of several different tests that measure: • sound coding ability i.e. the testee has to identify and remember

new sounds in a foreign language; • grammatical coding ability – the testee has to identify the

grammatical functions of different part of sentences; • inductive learning ability – learners are left to discover or induce

rules from their experience of using the language i.e. meanings are induced without explanations;

• memorization – the ability to remember words, sentences, rules in a foreign language.

3.3.3. The Frame of Reference

The results of a test can be interpreted in two different ways, depending on the frame of reference adopted:

3.3.3.1. if the frame is the performance of a particular group of individuals then we may speak of norm – referenced tests (NRT) / psychometric tests (Cziko: 1981) 3.3.3.2. if the frame of interpretation is domain – referenced i.e. interpreted with respect to a specific level/ ability, we may speak of criterion – referenced tests (CRT) / edumetric (Cziko: 1981)

There is no essential difference between CRT and domain – referenced tests. Both of them differ from objectives – referenced tests (items are selected to match objectives directly without reference to a pre-specified domain of target behaviors.

SAQ 11 Read, reflect and take a decision. Would you administer your 3rd form pupils an aptitude test? Yes? No? Justify your decision. Write your answers in the space provided above (in no more than 60 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

LANGUAGE APTITUDE TESTS

Page 68: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 55

3.3.3.1 Norm-Referenced Tests A norm-referenced test compares candidates with each other

and usually rewards the best. Why norm – referenced? Because marks show how the testee does compared with the norm or average, for all the testees. If an answer to a test shows if the testee obtained a score that placed him/ her at the top ten per cent of candidates or at the bottom five per cent, or that he/she did better than 70 % of those who took the test, we may say that the test is norm- referenced. The testee’s score relates one candidate’s performance to that of the other candidates. However, the score does not tell us directly what the testee is able of doing in the language. You must not forget that for statistical reasons, norm-referenced tests work effectively only for examinations with at least a few hundred testees. The percentage of candidates getting each grade remains unchanged, regardless of their marks, unless a conscious decision is made to change the percentages. It follows that variation in the difficulty of tests/ exams from year to year does not affect grades, but is sometimes unfair, as testees may be better one year than the next. Norm – referencing was used for deciding grades for admission exams in the Romanian system of education when a limited number of students were accepted to higher forms of learning. In England, norm – referencing is used for deciding A-level and GCSE grades.

Characteristics of Norm – Referenced or Standardized Tests: • must have been previously administered to a large sample of people • acceptable standards of achievement can only be determined after

the test has been developed and administered (by reference to the mean or average score of other students)

• items at various levels of difficulty are included • discriminate between low and high achieving students • good reliability and validity • norm-referenced measurement is necessary to make different

predictions • if learners differ in achievement levels, this normative information

can often assist in decision-making • normative-referenced testing is often considered a substantial

component of program evaluation • whether one uses norms- or criterion- referenced measurement

depends upon the kind of decision one wishes to make Weaknesses: • norms change with time as the characteristics of the population

change, and therefore tests must be re-normed • developed independently of any particular course of instruction

Point to Ponder

“Never underestimate the pleasure, satisfaction and educational value which pupils get from satisfactorily completing an action however simple.” (Michael Norland – The Craft of Classroom)

Page 69: Tests and Evaluation -Metodology

Types of Tests I

56 Proiectul pentru Învăţământ Rural

3.3.3.2 Criterion- Referenced Tests

Criterion – referenced tests measure what the testee can do, awarding a pass if they can do it, and a fail if they cannot. It does not matter if all candidates pass or if all the testees fail. A clear example is the driving test. This method of assessment is reliable only if the criteria are well defined (e.g. in a checklist, or a list of competences; otherwise, different markers will apply different standards or the same marker may apply different standards on different days and different candidates. This method is appropriate for mastery objectives. Strengths of criterion – referenced tests • set criteria meaningful in terms of what people can do • the criteria do not change with different groups of candidates • motivate learners to reach these criteria • they have a beneficial backwash effect • they are helpful in clarifying objectives • useful with small groups • test anxiety is reduced

Weaknesses • many criterion-referenced are shorter and therefore less reliable

than norm-referenced tests • students are unable to compare their performance with that of

other students

SAQ 12 The following list summarizes the chief objectives of language testing:

1. to determine the readiness for instructional programmes 2. to classify or place individuals in appropriate language classes 3. to diagnose the individual’s specific strengths 4. to measure aptitude to learning 5. to measure the extent of student achievement of the

instructional goals 6. to evaluate the effectiveness of instruction

Group these six categories under three headings:

1. Aptitude test 2. General Proficiency test 3. Achievement test

Compare your groups with those in the „Answers to SAQs” at the end of the unit.

ADVANTAGES OF CRT

DISADVANTAGES OF CRT

Page 70: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 57

• bright students, who easily attain the level of mastery, may not be motivated to reach high standards

• the results do not inform decision makers whether children achieve what they should when they should

Points to Ponder

• Learners require some reward or reinforcement for learning. • Reinforcement should follow the desired behavior as soon as

possible. • Learning proceeds step by step rather than happening all at

once, and it is strengthened by repeated success. • Self - assessment is preferable to teacher assessment.

3.4 Summary

This unit has been concerned with informal assessment and formal assessment. Categories of tests have been introduced taking into account the purpose, the content of the tests and the frame of reference within which their scores are to be introduced. The following types of tests have been introduced: selection tests, entrance tests, placement tests, diagnostic tests, achievement/ attainment tests, mastery tests, proficiency tests, aptitude or prognostic tests, norm – referenced tests and criterion – referenced tests. Norm-referenced tests are used to interpret a score of an individual by comparing it with those of other individuals. Criterion-referenced tests are used to interpret a person’s performance by comparing it to some specified behavioural criterion.

SAQ 13 Suppose you give 200 learners a test, choosing the best 40% to attend a good high school, and the next 60% to attend a vocational school. Is this test. Tick the right answer. Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

• criterion-referenced

• norm-referenced

• motivating

• demotivating

Page 71: Tests and Evaluation -Metodology

Types of Tests I

58 Proiectul pentru Învăţământ Rural

3.5 Key Concepts

• Achievement/ attainment test • Aptitude test • Communicative test • Criterion-Referenced test • Diagnostic test • Entrance test • Informal assessment • Norm-Referenced test • Mastery test • Placement test • Proficiency test • Progress test • Readiness test • Selection test

3.6 Checklist

What is the purpose of the test? Why am I giving it? What skills, knowledge, and so on, do I want? Have I clearly defined the instructional objectives? Have I prepared a table of specification? Do the test items match the objectives? What kind of test/ test format do I want to use? Why? How long should the test be? What do I need to do to prepare learners for taking the test? How are scores/ grades, or level of competency to be assigned? How are the test results to be reported?

3.7 Answers to SAQs

SAQ 1

Your answer depends upon your personal teaching and learning experience. Formal assessment: written grammar activities, reading tasks, listening tasks, vocabulary activities. The others are generally assessed informally.

SAQ 2

Your answer depends upon your personal teaching and learning experience. A – I B – I C – F

SAQ 3

Your answer depends upon your personal teaching and learning experience. a. L, b. H, c. L, d. L, e. H

Page 72: Tests and Evaluation -Metodology

Types of Tests I

Proiectul pentru Învăţământ Rural 59

SAQ 4

If your answer to SAQ 4 is not comparable to the one suggested below, please reread section 3.3.1.1.2 again. 1. In Blake’s “The Lamb”, the lamb stands as a symbol of innocence. 2. Only three men have been elected President of the U.S. after having been defeated for that office in the preceding general election.

SAQ 5

If your answer to SAQ 5 is not comparable to the one suggested below, please reread section 3.3.1 again. False: 3,4,6,7,9,10 (Don’ts) The rest are true (Do’s)

SAQ 6

Your answer depends upon your personal experience / common sense. 1. b, 2. b, 3. c

SAQ 7

If your answer to SAQ 7 is not comparable to the one suggested below, please reread section 3.3.1.6 and 3.3.1.7 again. Progress tests:

• Look back over a period of learning • Are small-scale tests • Are used for diagnostic purposes • Administered during the language programme • Devised by the teacher • Has no fail/ pass purpose

SAQ 8

If your answer to SAQ 8 is not comparable to the one suggested below, please reread section 3.3.1 again.

Placement/ selection testing

SAQ 9

If your answer to SAQ 9 is not comparable to the one suggested below, please reread section 3.3.2.1 again.

b, c, g, h,

SAQ 10

If your answer to SAQ 10 is not comparable to the one suggested below, please reread section 3.3.2.2 again.

a. Achievement tests are sometimes described as summative, that is, they sample the total language syllabus at the end of the course/ after a term/ at the end of the school year. b. The most important criterion is content validity. They must sample the language syllabus fully and fairly. They should not test anything which has not been taught.

SAQ 11

The answer is definitely „no” . A justification of it may be found in the following paragraph:

Page 73: Tests and Evaluation -Metodology

Types of Tests I

60 Proiectul pentru Învăţământ Rural

“How is one to interpret a language aptitude test? Rarely does an institution have the luxury or freedom to test people before they take a foreign language to counsel certain people out of their decision to do so. So, an aptitude test biases both student and teacher. They are each led to believe that they will be successful or unsuccessful, depending on the aptitude test score, and a self-fulfilling prophecy occurs. It is better for teachers to be optimistic for students, and in the early stages of a student’s process of language learning, to monitor styles and strategies carefully, leading the student toward strategies that will aid in the process of learning and away from those blocking that will hinder the process.” (Brown: 1994, p. 261)

SAQ 12

If your answer to SAQ 12 is not comparable to the one suggested below, please reread sections 3.3.1.7, 3.3.2.1 and 3.3.2.2 again.

Achievement test: 4 General Proficiency test: 1,2,3 Achievement test: 5,6

SAQ 13 If your answer to SAQ 13 is not comparable to the one suggested below, please reread sections 3.3.3.1 and 3.3.3.2 again.

Norm – referenced, demotivating

3.8 Further Readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 4-10 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press, pp 9-22, 48-59, 59-75

Page 74: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 61

Unit 4 TYPES OF TESTS II 4.1 Unit Objectives ........................................................................................................... 61 4.2 Formal Assessment - Types of Tests and Testing ...................................................... 62 4.2.1 Scoring Procedures ................................................................................................. 62 4.2.1.1 Subjective Tests .................................................................................................. 63 4.2.1.2 Objective Test ....................................................................................................... 63 4.2.1.3 Performance Tests ............................................................................................... 66 4.2.2 The Specific Technique or Method They Employ .................................................... 67 4.2.2.1 Multiple Choice, Completion, Dictation, Cloze Tests ............................................ 67 4.2.3 The Approach to Test Construction ......................................................................... 79 4.2.3.1 Direct Tests .......................................................................................................... 79 4.2.3.2 Indirect Tests ........................................................................................................ 79 4.2.4 Function of the Number of Elements Tested at a Time ............................................ 79 4.2.4.1 Discrete Point Tests .............................................................................................. 79 4.2.4.2 Integrative Tests ................................................................................................... 79 4.2.5 Speed Tests vs. Power Tests .................................................................................. 80 4.2.6 Other Test Categories ............................................................................................. 80 4.3 Self – Assessment ...................................................................................................... 80 4.4 Standardized Tests ..................................................................................................... 85 4.5 Summary .................................................................................................................... 88 4.6 Key Concepts ............................................................................................................. 88 4.7 Checklist ..................................................................................................................... 88 SAA 2 ............................................................................................................................... 89 4.8 Answers to SAQs ....................................................................................................... 89 4.9 Further Readings ........................................................................................................ 91 4.1 Unit Objectives

The history of language testing may be divided into three periods • the prescientific period (prior to the early 1950s) • the psychometric – structuralist period (from the early 1950s

through the late 1960s) • the integrative – sociolinguistic period (from the late 1960s to the

present time) This unit aims at covering the last two periods in the history of

testing and the new trend of self-assessment. It implicitly betrays the teachers’ and researchers’ strive for objectivity.

At the end of this unit, you will be able to • distinguish among various objective tests • understand why clarity of expression is so important in test items • recognize how irrelevant clues to the correct answer can easily

creep into objective items

Page 75: Tests and Evaluation -Metodology

Types of Tests II

62 Proiectul pentru Învăţământ Rural

• define and discuss the following objective – type formats: short answer, matching, true and false and multiple – choice

• apply guidelines offered for constructing objective tests • write better objective tests

4.2 Formal Assessment - Types of Tests and Testing

Language tests can also be classified according to some

distinctive criteria 4.2.1 The way in which they are scored 4.2.2. The specific technique or method they employ 4.2.3. function of approach to test construction 4.2.4. function of the number of elements tested at a time 4.2.1. The way in which they are scored 4.2.1.1. objective tests 4.2.1.2. subjective tests 4.2.2.The specific technique or method they employ 4.2.2.1. performance tests 4.2.2.2. multiple choice, completion, dictation, and cloze tests 4.2.3. function of the approach to test construction 4.2.3.1. direct tests 4.2.3.2. indirect tests 4.2.4. function of the number of elements tested at a time 4.2.4.1. discrete point tests 4.2.4.2. integrative tests

4.2.1. Scoring Procedures

Function of scoring procedures tests may be: • objective (the correctors of the test taker’s response is determined

entirely by predetermined criteria, so that no judgment is required on the part of the scorers – multiple choice tests, cloze tests and dictation)

• subjective (the scorers must make a judgment about the corrections of the response based on his / her subjective interpretation of the scoring criteria – oral interviews or written corpus)

The objective-type item was developed in response to the criticism leveled against the essay questions: content sampling, unreliable scoring, time-consuming to grade, and encouragement of bluffing. All objective-item formats may be subdivided into two classes:

• supply type (short answer) • select type (true-false, matching, and multiple choice)

These two types are sometimes called recall and recognition.

OBJECTIVE AND

SUBJECTIVE TESTS

Page 76: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 63

4.2.1.1. Subjective Tests

A subjective test requires scoring by opinionated judgment on the part of the scorer. An example might be the scoring of free, written compositions for the presence of creativity as no definition of creativity is provided. Many tests, such as cloze tests, permitting all grammatically acceptable responses to systematic deletions from a context, lie between the extremes of objectivity and subjectivity. It is true, however, that some subjective tests may be objectified in scoring. In this case, you have to use a precise rating schedule clearly specifying the kinds of errors to be quantified or through the use of multiple independent raters.

4.2.1.2. Objective Tests

Objective tests can be marked without the use of the examiner’s personal judgment. The correctness of the testee’s response is determined entirely by predetermined criteria: examinee’s responses are compared with a scoring key.

Advantages of objective tests:

• Have only one correct answer • Are scored mechanically; no particular knowledge or training in the

examined content area is required on the part of the scorer • All have the students working in a completely structured situation

and responding to a large number of items • Many questions can be asked during the examination period and

more adequate content sampling can be obtained • Objective items may create an incentive for pupils to build up a

broad base of knowledge, skills and abilities • May be marked by computer • Require more careful preparation than subjective examinations • Can be made just as easy or as difficult as the test constructor wishes • Can be pre-tested before being administered on a wider basis • Encourage guessing (4 or 5 alternatives for each item are

sufficient to reduce the possibility of guessing) • May emphasize irrelevant areas just because they are “testable”

SAQ 1 What is more subjective?

• The scoring of an essay • The scoring of short answers in response to questions area

reading passage Circle your answer. Compare your choice with that in the “Answers to SAQs” section at the end of the unit.

STRENGTHS OF

OBJECTIVE TESTS

Page 77: Tests and Evaluation -Metodology

Types of Tests II

64 Proiectul pentru Învăţământ Rural

Drawbacks. Some critics of the objective-type item contend

that objective tests: • Do not measure the higher mental processes, but rather

encourages rote memory • Encourage guessing • neglect the measurement of writing ability

Point to Ponder Everything it is possible for us to analyze depends on a clear method which distinguishes the similar from the not similar.

Lineus Geneva Plantarum, 1754

Objective items can be written so that they measure the higher

mental processes of understanding, application, analysis and interpretation.

Guidelines for writing objective tests • test for important facts and knowledge • tailor the questions to fit the examinees’ age and ability levels as

well as the purpose of the test • write the items as clearly as possible. Ambiguity can often occur

when qualitative rather than quantitative language is used (few, many, low can mean different things to different pupils)

• clarity can also be improved by using good grammar and sentence structure

• avoid using interrelated items (the correct answer may be found in another item)

• there should be only one correct answer • avoid negative questions whenever possible (not, never, least) • do not give the answer away

It is obvious that objective techniques are preferable in terms of

practicality (speed of scoring) and reliability (consistency of scoring). Usually, the so – called productive skills (writing and speaking) are subjective. They are based on the teacher’s judgment or impression of the language output. The lack of reliability is evidenced by: • inter-rater reliability (the scores of the two markers do not correlate) • Mark – remark – reliability (the same marker gives different scores

to the same test if asked to remark after a while) The first generation of tests used subjective scoring: impressive

marking. The second generation restricted the use of subjective tests. The third generation developed more reliable subjective techniques.

Examples of objective test items (excluding multiple-choice items)

WEAKNESSES OF

OBJECTIVE TESTS

CONSTRUCTING OBJECTIVE

TESTS

Page 78: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 65

1. Conversions Helen is a very good learner of English. Helen learns ….

Change the following sentences into questions: I am a student. We can work together.

2. Gap – filling He will come at half … eight. ….high-speed computer is always expensive. I always take … magazine with me. … the United States, 30 million people have successfully kicked the habit of smoking. … 1964, and only one … three Americans now smoke.

3. Combination • Combine the following sentences into one sentence without using

and or but: Do you see that traffic problem? He is stopping the cars. He is letting the school children cross the road. Answer: Do you see that traffic policeman stopping the cars to let the school children cross the road? Helen did her homework. Then she went swimming. After …. That cheese pie is hard to resist. I’m on a strict diet.

4. Addition Yet …. Haven’t seen this film.

5. Rearrangement or Ordering At/ poor/ look/ that/ woman/ old I had seen/ immediately reminded me/ John’s face/ in a zoo/ of a silly monkey.

6. False/ True • Circle in the margin A. if the sentence is true B. if the sentence is false

A magazine is a newspaper – shop. A, B A library is a bookshop. A, B

7. Insertion Identify a missing word. Find the place and write in the word: You should go to see it. It’s best film I’ve ever seen.

8. Alternative response Circle your answer: Do you enjoy to watch television? Correct/ incorrect

9. Sentence completion I hope you … if I leave. (free response)

10. Rewriting Rewrite in reported speech: “I think you’re great”, he said.

11. Correction Correct the grammar: Is that the woman which lives next door to you?

Page 79: Tests and Evaluation -Metodology

Types of Tests II

66 Proiectul pentru Învăţământ Rural

12. Short-Answer Items They are somewhat of a cross between the essay and other objective items. On the one hand, like essay items, they require recall rather than recognition. On the other hand, they can be objectively scored. Short answers items are best for dates, names, places, vocabulary Objectivity in scoring results adds a greater reliability.

4.2.1.3. Performance Tests

Performance tests are tests of skill – the skill with which learners can identify objects, manipulate objects, perform assigned tasks, or react to simulated situations. Tests of performance are often found in tests of typing speed, in simulated situations tests (the typing of business letters, personal letters). The technique is often used to assess skills in areas such as chemistry, physics, and foreign languages. Performance testing has long been neglected from the classroom testing as teachers are preoccupied with tests of verbal behavior. Performance tests are: • used to measure the effectiveness of final behavior • used at early grade levels, before paper and pencil tests can be effectively used (handwriting performance tests; sentence to be copied)

With very young examinees or with those who cannot write, it

may be necessary to use an oral response format. Identification tests include tasks that may be presented orally, visually from a reproduction on the exam paper. Examinees are asked to respond orally and in writing. In the latter case, responses may be short answers, completion, multiple choice ad matching.

SAQ 2 Which is generally more suitable for each of the following language areas: an objective or subjective test?

Language area

Objective Subjective

pronunciation vocabulary grammar discourse listening speaking reading Writing

Tick your answers . Compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 80: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 67

Point to Ponder Darwin concluded that he needed to keep a notebook and pencil with him at all times, as he found that he remembered evidence in favour of his theories, but quickly forgot evidence against them! We too tend to have selective memories about our students’ work and behavior.

Geoffrey Petty

4.2.2. The Specific Technique or Method They Employ 4.2.2.1. Multiple – Choice Tests

The multiple – choice item is the most popular and useful of all objective item- types. It can be used to measure rote memory as well as complex skills. It is simple to score and administer.

SAQ3 Objective testing uses a variety of techniques. Classify them into: discrete point technique and integrative techniques.

Techniques Discrete point techniques

Integrative techniques

transformation

Fill – in –the blanks

Blank and cue

Joining element

Replacing elements

Adding elements

Arranging elements

Matching elements

True/ false

Multiple choice

Cloze

dictation

Information transfer Tick your choices in the space provided and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 81: Tests and Evaluation -Metodology

Types of Tests II

68 Proiectul pentru Învăţământ Rural

Stages in Constructing the Test Items • Consider the purpose of the test – mastery or discrimination and

the item’s specific instructional objective • Determine the actual areas to be covered by multiple-choice items.

Make a careful list of exactly which items it is desirable to include (table of specifications)

• Determine the number of items to be included in the test • Include enough items to allow for reliability function of the level of

difficulty, the nature of the areas being tested, the purpose of the test • Avoid too long tests ( a source for administration difficulties, mental

strain and tension) • Have someone read the tentative items before preparing them in

final form The Structure of a Multiple Choice Test

• The initial part is the stem. A stem can be either a direct question or an incomplete statement which can be completed correctly. The stem contains the problem and sets an appropriate frame of reference. It must include all the conditions and limitations in order to respond. The stem may be stated as a direct question or an incomplete statement. The direct question format has several advantages: 1. it forces the item writer to state the problem clearly in the stem 2. it reduces the possibility of giving the examinees grammatical

clues 3. it may be more easily handled by the younger and less able

students because less demand is placed on good reading skills One disadvantage of the direct question form is that it may require lengthier responses

• The choices by one of the options are referred to as: options, responses, alternatives

• One option is the answer, correct option, or key. It is true that a multiple – response question can have more than one correct answer among the options. In this case, the pupil may be told how many they are, and is required to identify all of them to gain the mark.

• The other options are distracters (or foils). The role of the distracter/ foil is to divert or distract the majority of learners from the correct option.

• Use the active, present tense in stems and options. • Avoid using double negatives in either the stem or one option. Be

also careful with negative questions. For example, if asking “Which of the following is not true?”, or “Which is an exception to the rule?” make it really stand out that it is a “wrong” option that has to be selected in such questions. You should be aware that testees usually look for correct options. It follows that it is much better to write “Which of the following is …” rather than “Which of the following is not …”.

• Use simple, direct sentences, careful layout, the appropriate use of emboldening

• After the test has been constructed and carefully corrected for ambiguities, inappropriate vocabulary, unintentional

Page 82: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 69

comprehension difficulty or obscurity, unlikely meaning, and a correct pattern order of correct choices (e.g. ABCABC), it should be passed on to another person to be read and checked for these weaknesses.

• Incorporate in the stem all words which otherwise would have to be repeated in each alternative

Point to Ponder “God himself does not presume to judge a man till the end of his days. Why then should you or I?” (Ben Jonson)

Weaknesses The multiple-choice test is both blamed and praised. It is mainly blamed because it: • Tests only factual knowledge (in fact it can also measure

understanding application of principles, analysis synthesis and evaluation);

• Penalizes the “creative” student (research in this area failed to support this assumption. In fact a creative student may be better off with selective-type test, rather than supply–type tests);

• Is difficult to write good multiple-choice items and this can be done only by skillful individuals. Teachers cannot always think of plausible sounding distracters

• There is a tendency for teachers to write multiple-choice items demanding only factual recall

• Requires more time to answer (than in the case of true-false tests) • Cannot be used to measure testee’s ability to organize materials,

or to clearly express his answers according to acceptable language usage rules (the only solution in this case is to complement it with essay-type examination)

SAQ 4 A paper is scored by two different scorers/ or scored by the same scorers after two weeks. The scores are similar in both cases. Comment upon: reliability, objectivity, subjectivity. Write your answers in the space provided above (in no more than 30 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 83: Tests and Evaluation -Metodology

Types of Tests II

70 Proiectul pentru Învăţământ Rural

Strengths

The multiple-choice test is the most flexible and versatile of all selection-type examinations. It can be used to measure instructional objectives at all levels of the cognitive domain. (i.e. knowledge, comprehension, application, analysis, synthesis, and evaluation). It can also be extremely versatile: multiple choice tests may be designed for all subject matters and with learners at all grade levels.

• A large number of items can be answered during a brief period of

time. It can have a relatively small content sampling error if a table of specifications is carefully designed and used

• There are no scoring errors • It may be scored rapidly, accurately, and objectively even by

individuals who are unqualified (secretaries, student assistants) • The scoring is not influenced by the previous performance or by

the personal appearance of the testee • Scoring is completely objective

Example: The first version

The language of the earliest literature of England was: a) difficult; b) French; c) Anglo-Saxon; d) Latin.

The improved version The language of the earliest literature of England was:

a) midland b) French; c) Anglo-Saxon; d) Latin.

Example: 1. The works of Mihai Eminescu had a lasting effect on the

development of modern Romanian poetry. – STEM A. an enduring – correct option/ key B. an unknown – distractor/ foil C. a startling - distractor/ foil D. a final - distractor/ foil

2. Where is water found? – STEM

A. in the air – foil/ distractor B. on the earth’s surface – foil/ distractor C. in the ground - foil/ distractor D. all of the above – KEY RESPONSE

Principles of construction

• Each – multiple choice item should have only one answer (absolutely correct) although some instructions require choosing the best option. There must be no ambiguity in the choices.

• Test one feature at a time (it is less confusing for the testee and it helps to reinforce a particular teaching point). Items that test more

CONSTRUCTING MULTIPLE – CHOICE TESTS

Page 84: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 71

features at a time are called impure items. Test constructors use impure items because of the limited numbers of distractors.

• Each option should be concise, unambiguous, and correct when included in the stem (except in the case of specific grammar test items). Phrase/ words not required to express the question or to set an appropriate frame of reference should be eliminated. Difficult and technical words and phrases should be eliminated. Figurative language and complex sentences should be avoided. Do not provide clues through tense conflicts, misuse of articles, and singular- plural conflicts between verbs and nouns. Each alternative must linguistically fit the stem.

• The vocabulary and general construction of each sentence should be at a level that the learners can easily comprehend, so that they are not distracted from the real task of deciding on tense and aspect.

• The correct choice should not repeat word for word some sentence in a listening test

• Avoid using stems ending with the article a, an, or a preposition • The correct choice should not depend on comprehension or non-

comprehension of one unusual vocabulary item • All multiple – choice items should be at a level appropriate to the

linguistic ability of the learners • The context should be a lower level than the actual problem that

the item is testing • Multiple – choice items should be brief and clear (although the

tendency is to provide a context) • Begin with one or two simple items • In constructing items to test ability to select correct verb forms, one

must give sufficient indicators of time relationships to make the appropriate choice clear

Examples: 1. If she ……….. (understand) the situation, she would explain it to

us. The solution is understood as it is required by would explain.

2. Sometimes an adverbial expression should be used: He asked me how I was every time I …. (see) him. They …. (live) in Europe for years when I first met him.

• In constructing the items, use sentences that learners might encounter or wish to use.

SAQ 5 Why is the following item wrong? Revise it. American colonists were given the same rights as other Englishmen by:

a. local governors b. 1542 c. charters d. Parliament

Rewrite the above item. Compare it to the one in the “Answers to SAQs” section at the end of the unit.

Page 85: Tests and Evaluation -Metodology

Types of Tests II

72 Proiectul pentru Învăţământ Rural

The Stem The stem is the first part of a multiple choice item. The task has

to be clear and concise. No irrelevant details should be included. After reading the stem, the testee should be able to identify what exactly the requirements are. The wording should be very clear. Options such as “all of these”, “none of these” should be avoided. Be careful with negative questions. The stem may take the following forms: a. an incomplete statement

Example: The Romanian Constitution gives Parliament …to pass laws. A. the power B. has the power C. the power is D. of the power b. a complete statement

Example: People think of salt mostly as a seasoning for food, but this use accounts for less than five percent of the world’s salt production. A. exclusively B. nutritionally C. mainly D. necessarily c. a passage

Example: What does the passage mainly discuss? A. A new type of telescope B. Ancient and modern attitudes towards stars. C. A system of star classification. D. Progress in identifying new stars.

• The primary purpose of the stem is to present the problem clearly and concisely. The stem contains words/ phrases which would otherwise have to be repeated.

Example: The word spaceman is used in the passage to refer to a:

A. traveller in space B. traveller in a balloon C. traveller in a boat D. traveller in an ocean

The repeated word traveller should be part of the stem. The word spaceman is used in the passage to refer to a traveller in:

A. space B. a balloon C. a boat D. the ocean

The same principle applies to grammar tests:

Page 86: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 73

Example: The item:

I enjoy … children playing football in the park. A. looking for B. looking to C. looking about D. looking at E. looking on

May be re-written I enjoy looking … the children.

A. for B. to C.about D. at E. on

SAQ 6 Analyze the following badly-constructed multiple – choice items:

1. What was the ring made of? A. it was made of gold B. it was made of iron C. it was made of cotton and rope D. it was made of light wood 2. Puts his own desires first: A. egoist B. egotist C. altruist

Rewrite the above item. Compare it to the one in the “Answers to SAQs” section at the end of the unit.

The Correct Option

Usually one correct or best option is recommended. More correct

options confuse students. The correct option should be approximately the same length as the distractors. The correct option that is longer than the distractors may become a “give-away” item. Rules for writing options: • keep answer options short and concise (not longer than 15 words) • options should be approximately equal in length • make all options parallel in grammatical structure and general

appearance • Each option should follow logically and grammatically from the

start. The correct option must not be grammatically different from all the other options.

• Do not begin with none of the above or all of these • Check the options for careless clues • Be sure that the distractors are clearly incorrect, although plausible • If you use options which form a pair, for example by stating the

opposite of each other, make sure that the remaining two options also form a pair.

• Words or phrases from the stem should not be repeated in the options

Page 87: Tests and Evaluation -Metodology

Types of Tests II

74 Proiectul pentru Învăţământ Rural

• You may make a question more difficult by creating options which are very similar to each other

NOTE • Do not write long stems (not more than 50 words, sentences not

longer than 15 words) • Avoid the use of conditionals • Be careful to choose between: which is the best answer/ which is

the correct answer ( do not forget that multiple – choice questions may have more than one answer)

• A good testee should be able to give an answer without seeing the stem • A stem should not include general instructions • Include as much of the problem as possible in the stem, so that the

options can be kept short

Example: John is a dutiful son.

A. stern B. kind C. very respectful and obedient D. lawful C. is a “give-away” item. It should be written in the following way: John is a dutiful son. A. stern B. kind C. obedient D. lawful

The Distracters or Foils

Each distracter should: • be reasonably attractive and plausible to any testee • be grammatically correct • be constructed in such a way that students obtain the correct

option by direct selection rather than by elimination of obviously correct options

• be at the level being tested • avoid absurd distracters Sources for plausible distracters: • learners’ mistakes • previous answers in tests • the teacher’s experience • contrastive analysis between the native (Language 1) and foreign

language ( Language 2)

Page 88: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 75

Test Instructions

It is extremely important for the students to clearly understand the question format. If they do not understand, then we do not measure what we want i.e. the instructional objectives.

Instruction may be oral, although a combination of written and oral instructions is probably desirable. The instructions should be clear, concise and explicit. The instruction should be accompanied by examples of each type of items. Be extremely careful when an item type occurs for the first time. Encourage the students to ask questions.

Test Layout The layout influences the speed and accuracy of the

examinee. Hints: • use all the space available without hindering readability • make it easy for the examinee to keep track of his place in the

examination • a two-column page may be the best layout for multiple – choice or

true – false items • if you use various item types in the same examination, group

together the same item types: true – false items, multiple – choice, completion. In this way, you reduce the number of shifts in mental orientation

• do not use more than two or three item types on one – hour examinations

• in order to reduce test anxiety, arrange test items in order from the easiest to the most difficult

• ordering the items on the basis of their content Readability is increased if

• each item is completed in the columns and on the page in which it is started

• reference materials (paragraphs, graphs) should occur on the same page as the item

• the items that refer to the same reference material should be placed in the same page, separated from other unrelated terms by dotted lines

SAQ 7 Analyze the following distractors: 1. Intimating oneself in another’s good graces.

A. extenuating B. ingratiating C. superseding

Circle A,B or C. Compare your answer to that in the “Answers to SAQs” section at the end of the unit.

Page 89: Tests and Evaluation -Metodology

Types of Tests II

76 Proiectul pentru Învăţământ Rural

• if you use Arabic numbers for the items, use letters to differentiate the alternatives

• the labels on the test should correspond with the labels on the answer sheet

Formats Writing Multiple - Choice Tests

A typical multiple – choice test will look like the following:

1. Write the correct option in full in the blank space: The practice of making excellent films based on rather obscure

novels has been going on so long in the United States ……. constitute a tradition.

a. being b. as to c. so that d. could

2. Write only the letter of the correct option in the box.

She … eating breakfast. A. is B. has C. does D. no extra word

3. Why … that dog following us?

A. is B. has C. does D. no extra word

4. Underline the correct option: What .. your father do? A. is B. has C. does D. no extra word

5. Put a circle round the letter at the side of the correct option

As a result of … in physics and chemistry, scientists have been able to make important discoveries in biology and medicine.

A. there is more knowledge B. what is now known C. knowing now that D. known now The correct option should appear in each position (e.g. A, B, C,

D, E) approximately the same number of times in a test, or the options may be placed in alphabetical order – the first word in each option. However, figures, dates should be kept in chronological order.

William Shakespeare was born in

A. 1564 B. 1592 C. 1603

Page 90: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 77

D. 1616

Circle in the margin the letter corresponding to the correct form to complete the following sentences:

A. is B. has C. does D. no extra word

1. She … having lunch. A B C D 2. What … that phrase mean? A B C D 3. … he driven that car before? A B C D 4. What … your mother do? A B C D 5. John … never seen snow. A B C D 6. Who … always knows the answer? A B C D 7. Why … that dog following us? A B C D 8. … Jenny usually eat lunch at school. A B C D 9. When … the bus come? A B C D 10. What … your mother making? A B C D

Read and circle in the margin the letter corresponding to the tense and aspect of the verb that you would use to fill in the sentence:

A. simple past B. past continuous C. present perfect D. past perfect

1. He discovered that he … (lose) his money A B C D 2. They came just to … (get) breakfast. A B C D 3. He asked me how I was every time … (see). A B C D 4. The cake would have been better if it … (stay) in the oven longer. A B C D 5. While we … (wait) for the train, we heard a terrible noise. A B C D 6. If she … (understand) the case, she would explain it to us. A B C D 7. They … (live) in France six months when I first met them. A B C D 8. That’s the best play … this year. A B C D Circle in the margin the letter corresponding to the lost appropriate preposition:

A. back; B. along; C. hrough, D out; E. off; F. up

1. “Was he rude?” “Yes, he told me to get …” A B C D E F 2. If you lend him money, you’ll never get it … A B C D E F 3. Although we were relatives, we didn’t get … A B C D E F 4. Be sure to get … the tram at the third stop. A B C D E F 5. I liked the first part of the movie, but I can’t get … the second. A B C D E F Circle in the margin the letter corresponding to the word which correctly completes the sentence: 1. The sister of your father or mother is your A B C D

A. great aunt B. uncle C. stepsister D. aunt

Page 91: Tests and Evaluation -Metodology

Types of Tests II

78 Proiectul pentru Învăţământ Rural

2. The son of your sister is your A B C D A. nephew B. cousin C. niece D. godson

3. The mother of your father is your A B C D A. stepmother B. grandmother C. godmother D. mother – in – law

Cloze Tests

Cloze is a testing technique whereby a complete text is gapped after a few sentences of introduction. Learners try to fill each gap with a word that fits the context. Marking can be either for the exact word which is more reliable or an equivalent.

Cloze is often used as a test of reading comprehension, though there are questions as to what reading skills it reveals The term was coined in 1953 from the gestalt notion of “closure”, referring to the human tendency to complete pattern once grasped.

Cloze tests have a variety of format:

• A fixed – ratio deletion method that establishes the deletion of every –nth word (usually every sixth or seventh word) regardless of what that word may be

• Rational deletion – words that meet certain grammatical discourse criteria

Scoring can vary. The testee may be required to supply:

• The exact word that was deleted (efficient for exact scale testing; it can also be adapted to a multiple – choice format for easier scoring mechanisms)

• An acceptable word which “makes sense” Both types are valid as long as there are 100 blanks (deletions). The cloze test is considered an integrative test because it

requires knowledge of vocabulary, grammatical structure, discourse structure, reading skills and strategies. It also demonstrates that the testee has an internalized “expectancy” grammar i.e. he/she is able to predict what item comes next in a sequence.

SAQ 8 Starting from the following paragraph construct as many cloze tests as possible on a fixed – ration deletion and rational deletion. After the trip I will return to my country, where my family is expecting me to take over the family business. I am not anxious to do so since I am not interested in that business, and while it is profitable, it is not personally rewarding to me. Quite frankly, my real hope is to develop a

CLOZE FROM CLOSURE

A CLOZE TEST – AN INTEGRATIVE TEST

Page 92: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 79

4.2.3 Function of Approach to Test Construction 4.2.3.1 Direct Tests

Ratings of language use in real/ authentic communication are testing language performance directly e.g. an interview, a contextualized vocabulary test

4.2.3.2 Indirect Tests Indirect tests are indirectly measuring language proficiency i.e.

are less valid for measuring language proficiency e.g. multiple choice recognition tests, a synonym matching tests. However, the value of a test should be decide on the basis of other criteria in addition to whether they are direct or indirect e.g. cost – efficiency

4.2.4 Function of the number of elements tested at a time, a distinction is made

between 4.2.4.1 Discrete Point Tests

Discrete point tests are designed to measure knowledge or performance in very restricted areas of the foreign language e.g. a test of ability to use correctly the perfect tenses in English, supply correct prepositions in a cloze passage. Discrete point tests are based on the theory that language consists of different parts (e.g. grammar, phonology, vocabulary) and different skills (listening, speaking, reading, and writing) and these are made up of elements that can be tested separately. The meaning of discrete is separate and distinct from each other. Tests consisting of multiple – choice items are usually discrete point tests. Discrete – point items operate at the sentence level. This shortcoming is enhanced by the lack of context.

4.2.4.2 Integrative Tests

An integrative test is one that tests several language skills at the same time (e.g. a dictation test requires the learner to use knowledge of grammar, vocabulary, listening comprehension). Integrative tests have in view a greater variety of language abilities. They have a greater value in measuring overall language proficiency e.g. random cloze, dictation, oral interviews, and oral imitation tasks.

strong leadership within the company itself, hopefully with the involvement of my younger brother, who says he really wants to play a vital role. I think it is realistic to expect the company to prosper in this way, leaving me free to pursue my own interests. Compare your solutions to those provided in the “Answers to SAQs” section at the end of the unit.

Page 93: Tests and Evaluation -Metodology

Types of Tests II

80 Proiectul pentru Învăţământ Rural

Dictations and cloze tests can combine to form “partial dictations” or “oral cloze” tests. Dictations are considered integrative tests because they require: careful listening, reproduction in writing of what is heard, efficient short – term memory.

Test batteries solve the conflict between the two kinds of tests as they comprise discrete point subtests for diagnostic purposes and integrative tests. The test battery provides a total score that is considered to reflect overall language proficiency.

4.2.5 Speed Tests vs. Power Tests A speed test is one in which the items are so easy that every

testee might be expected to get very item correct, given enough time. But in the case of a speed test time is not provided i.e. testees are compared on their speed of performance rather than on knowledge alone.

A power test allows sufficient time for every person to finish but the items are so difficult that very few testees are expected to get every item correct.

4.2.6 Other Test Categories

• Examinations vs. quizzes • Questionnaires • Rating schedules • Single –stage and multiple stage tests • Language skills tests • Language feature tests (verb tense/ aspect/ voice; subject/ verb

agreement; modifiers, comparatives, superlatives, relativization, embedding)

• Memory – span tests • Sentence completion tests • Word – association tests etc

4.3 Self – Assessment

Self – assessment is carried out by students themselves. There is a danger in students relying solely on their teacher for the evaluation of their performance. If they are never trusted to evaluate their own experience they will not acquire the habits and skills of reflecting on their own performance. The aim is to produce a student with the confidence and skill to reflect and evaluate independently of the teacher, to become a reflective

SAQ 9 An integrative test is one that measures knowledge of a variety of language features, modes, or skills simultaneously. An example would be dictation. What does it measure simultaneously? Write your answers in the space provided above (in no more than 60 words) and compare them answers to those in the “Answers to SAQs” section at the end of the unit.

Page 94: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 81

practitioner. Ask questions: What were the main difficulties? Ask students to draw up an assessment checklist which will aid reflection.

Point of Ponder

Research shows that students are generally quite accurate in their self-scoring.

Students are often harsher in evaluating their own performance than the teacher would be. Provide self-assessment questions. Provide students with model answers after they have completed the worksheet. The students can then use these to mark their own or each other’s work. Self-assessment can be: • Low heat: self-worked tests, quizzes • High heat: presentations or exams

Example: The Standford – Binet test is an example of reliable intelligence test. It is mainly used for children and, among others, for the diagnosis of academic achievement. It measures: • Attention (absorbed by task versus easily distracted) • Reactions during test performance:

- normal activity level versus abnormal activity level - initiates activity versus waits to be told - quick to respond versus urging needed

• emotional independence - socially confident versus unsure - realistically self-confident versus distrusts own ability - comfortable in adult company versus ill-at-ease - assured versus anxious

• problem – solving behavior - persistent versus gives up easily - reacts to failure realistically versus reacts to failure

unrealistically - eager to continue versus seeks to terminate - challenged by hard tasks versus prefers only easy tasks

• independence of examiner support - needs minimum of encouragement versus needs constant

praise and encouragement • expressive language

- excellent articulation versus very poor articulation • receptive language

- excellent sound discrimination versus very poor discrimination • establishing rapport

- easy versus difficult

The test also takes into account: • verbal reasoning: vocabulary, comprehension, verbal relations • abstract/ visual reasoning: pattern analysis, copying • quantitative reasoning • short – time memory

Page 95: Tests and Evaluation -Metodology

Types of Tests II

82 Proiectul pentru Învăţământ Rural

Point to Ponder The ultimate goal of the educational system is to shift to the individual the burden of pursuing his own education.

J.W. Gardener

Self-assessment is a component of learner-centred education

or student autonomy. It underpins the individualization of instruction, the development of patterns of self-directed learning and of the methodology of self-access, as well as implying some degrees of learner training.

It follows that autonomy refers to learner’s capacity to take charge of both the strategy and content of learning. The psychological approach in favour of autonomy suggests that learning is more efficient and motivating to the degree that it matches a learner’s own style and strategies. An autonomous learner can identify what has been taught, is able to formulate his/her won learning objectives. Autonomous students select and implement appropriate strategies. They can monitor these for themselves and finally they know how to give up strategies that are not working for them. The so-called self-directed learning may also involve other issues: syllabus, negotiation, the role of autonomy in whole-class instruction.

In this context the learners have to assume a degree of responsibility over the assessment of the progress of their own learning. The concept of the trainability of autonomous learning skills and implicitly of self-assessment is currently brought into discussion. An important role is also played by attitudes to language learning that range from anxiety about the language and the situation, through attitude to speakers of the L2, the country in which it is spoken, the classroom, the teacher, other learners, the nature of language learning, particular elements in learning activities, tests and grades.

Attitudinal information has a place in language teaching in two areas: preparing the student to learn and this may resolve both the discovery of the student’s own underlying attitudes, and a process of attitude change and preference for particular kinds of learning activities. The effectiveness of all learning depends on the learner’s ability to judge when her/his performance (comprehension and production) is adequate for the situation in which she/he is operating. The effective learner is one who can judge when his/her proficiency is adequate for the purpose. The learner who is satisfied with inadequate performance will tend to allow the language to fossilize. On the contrary, the learners who strive for perfection will limit his/her progress in range and quality. Judging the adequacy of one’s performance is after all a matter of self-assessment. Point to Ponder It does not seem reasonable to impose freedom on anyone who does not desire it.

Carl Rogers, Freedom to Learn

SELF-ASSESSMENT – A WAY TO STUDENT AUTONOMY

AUTONOMOUS LEARNING SKILLS

Page 96: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 83

Self-monitoring versus self-assessment

Although both of them refer to judging one’s performance, the difference between them is one of scale and timing. Self-monitoring refers to small stretches of language, self-assessment to large stretches of language. It is also true that self-assessment depends upon and includes self-monitoring. The new trends in teaching and learning make teachers responsible to explicitly help the learner to become more proficient in self-assessment, so that he/she can become better and better at judging his/her performance. Although most assessment is explicitly summative, much everyday classroom assessment is concerned within the learning i.e. formative assessment. If I hadn’t separated formal and informal testing, I would have considered self-assessment as an integral part of formative assessment.

The ability to judge when performance is at an adequate level for the situation in which one is operating, includes

• establishing appropriate criteria or standard which may be overt or

covert; it may also be informal e.g. what a particular phrase sounds like or a sense that the phrase feels “right” or “wrong”

• judging the situation to decide what the minimum acceptable standard is

Point to Ponder Don’t assess key or “common” skills, such as “problem-solving” or “working with others” without teaching these skills.

Geoffrey Petty

Self-assessment may involve:

• the motivation to undertake it • the willingness to reject inadequate performance in some internal

standard established by oneself or learned • the ability to measure one’s own performance against the standards • the confidence to make these assessments • the recognition that one’s ability to judge is limited

Self-monitoring and self-assessment, although part of the process of learning, can be developed i.e. the learner may be helped to improve his self-assessment techniques. This process begins with the awareness activity that aims at persuading the learner that self-assessment is a useful activity.

SELF-EVALUATION FORM • What were your objectives for this term? • What did you achieve? • What do you feel you have been responsible for this year? • Did you live up to this responsibility? Why/ Why not? • Give examples of good activities. • Give examples of good materials.

SELF-ASSESSMENT – A WAY TO STUDENT AUTONOMY

Page 97: Tests and Evaluation -Metodology

Types of Tests II

84 Proiectul pentru Învăţământ Rural

• Which type of evaluation have you made use of? Diaries? Whole-class talks? Talks with the teacher?

• How do you judge the usefulness of the various types of evaluation? Good? Bad? In Between? Don’t know. Why?

• Positive things about your teacher. • Negative things about your teacher. • Ideas for next term.

Techniques of self-assessment

You can involve your learners in self-assessment if you ask them to write reports about their English and give them to you, if you learn their problems from their diaries, if you involve them in rating their skills in English, if they monitor their language when they edit their essays, when they use your correction codes, if you ask them to grade their mistakes. They may also list difficulties (pronunciation problems), favorite activities, organize group and class surveys to find out about their learning preferences and problems. You may begin by distributing a questionnaire to learn how they feel about their English e.g.

1. Learning English is … (difficult, easy, very difficult) 2. Which of these areas of English are easiest for you? Rank them

staring from the easiest area. (speaking, listening, writing, reading, grammar, vocabulary)

3. Give yourself a mark …… 4. Is English useful? 5. Do you try to speak English with your class mates?

You may also gather information with the help of th following

questions: 1. Why do you learn English? 2. How do you learn English? 3. How do you tackle an unknown text? 4. What is it to “learn” a word?

Point to Ponder You learn from your own mistakes only when you think about them.

Michael Hermis and Paul McCann Self-assessment

• Implies knowledge about language (language awareness) • Implies teacher’s doubts (unreliability); usually students give

themselves lower marks than they deserve • Should not involve marking but thinking about performance and

progress • Must be integrated with other classroom activities • Can take a lot of time

Page 98: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 85

Useful tips for grading • For some homework, provide keys that the students use to score

their own work • Require students to keep a page in their notebook on which they

record each test or quiz grade when they receive it. That way they always have a record of their own

• Teach students to edit and revise their papers before turning them in

• You’ll manage your time more efficiently if the assignments are spaced

• Use a computer – grade keeping system • If the teacher does not have time to correct every student

assignment, allow students to swap and grade papers. Spot check to prevent cheating

Points to Ponder • Independence, maturity, and self-reliance are all facilitated when

self-criticism and self-evaluation are basic and evaluation by others is of secondary importance. (Carl Rogers)

• Effective teachers let students know they are somebody, not some body. (William Purkey)

• Nothing is so fatiguing as the eternal hanging on of an uncompleted task. (William James)

4.4 Standardized Tests

Tests and examinations of proficiency are designed by many

organizations in many English speaking countries. Examinations are generally closed and restricted to particular educational systems. Tests are open and available at all levels. Some tests are restricted to limited proficiency levels, others claim to measure across the range of all levels. Some tests concentrate on one skill (oral skills); other test all the four skills.

Great Britain, University of Cambridge, LES Tests 1. Diploma in English Studies (DES); 2. Certificate of Proficiency in English (CPE); 3. First Certificate in English (FCE); the most frequently taken

language test in the world; 4. Preliminary English Test (PET); 5. Key English Test (KET)

English Speaking Union Framework’s nine-point scale is approximately equivalent to the nine-band descriptors used by the International English Language Testing system (IELTS)

9. Expert user; 8. Very good user; 7. Good user; 6. Competent user; 5. Modest user;

Page 99: Tests and Evaluation -Metodology

Types of Tests II

86 Proiectul pentru Învăţământ Rural

4. Limited user; 3. Extremely limited user; 2. Intermittent user; 1. Non user.

USA Tests Proficiency tests, the so called ACTFL Proficiency Guidelines, have been developed in collaboration by the Council on the Teaching of Foreign Languages, The Educational Testing Service and the Federal International Agency Language Round Table.

There are four levels (in fact seven if a more refined approach is used) • Novice; Novice High • Intermediate; Intermediate High; • Advanced; Advanced High; • Superior.

General English • The Cambridge range already mentioned; • The Certificate in Communicative Skills in English (Cambridge

CCSE); • The Association of Recognized English Language Schools (ARELS); • The Oxford Delegacy of Local Examinations (ODLE); • The English Speaking Board; • The Institute of Linguistics; • Pitman’s Examination Institute; • Trinity College.

Placement test • Nelson Quick Check • Oxford Placement Test

Study English Tests • IELTS; • TOEFL; • CENRA; • Northern Examination and Assessment Board; • Pitman; • Cambridge CPE; • Certificate in Advanced English (CAE); • Michigan English Language Battery; • University of London;

Business English Tests • London Chamber of Commerce and Industry (LCCI); • Oxford International Business English Certificate; • Pitman’s English for Business; • Cambridge’s Certificate in English for International Business and

Trade; • Educational Testing Service’s (USA); • Test of English for International Communication (TOEIC).

Tourism • Oxford’s Tourism Proficiency; • LCCI.

Page 100: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 87

Teaching English • Cambridge Examination in English for Language Teachers

(CEELT) Young learners

• ARELS/ODLE : Junior Counterpart (ages 12-17) • Associated Examination Board: English as an acquired language

(ages 7-12); • Pitman (ages 9-13).

Special purpose testing Testing language required for specific purposes is an aspect of

English for Specific Purposes (ESP). Where individuals or small groups are concerned, there may be no need for a standardized test. Cases of large – scale use are the fields of academic English, business English, and medical English. In these fields, there has been considerable test development, aimed at accurately assessing appropriate language skills for relevant activities. Features of this development are the use of authentic tests, authentic materials from appropriate situations, communicative activities and group tasks for assessing spoken language as if in real life situations.

SAQ 10 Self-assessment is extremely important for you. Examine the ways in which self-assessment can be carried out in work context. Circle the forms you can use:

• Use a learning contract • Include assessment of a work log • Ask students to produce a reflective journal • Consider using a portfolio • Devise record keeping aids for students • Use technology (e-mail) • Encourage networking

Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

SAQ 11

1. Objective-type tests can use either the “one correct answer” or the “best answer” format. Which one would you use? Why would you use this type over the other?

2. If you were preparing a true-false test would you have more true than false items? Why/ why not?

Write your answers in the space provided (not more than 60 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 101: Tests and Evaluation -Metodology

Types of Tests II

88 Proiectul pentru Învăţământ Rural

4.5 Summary

The principal ideas, conclusions, and recommendations presented in this unit are summarized in the following statements:

1. Objective tests must be written as simply and clearly as possible so that all examinees will be able to make the same interpretation of the items’ intent

2. Test items should be tailored to fit the age and ability level of the examinees

3. Technical jargon, and excessively difficult vocabulary should be avoided

4. Irrelevant clues should be avoided 5. Trivial details should be avoided (otherwise, we encourage

rote memory)

4.6 Key Concepts

• Cloze test • Criterion – referenced test • Diagnostic test • Direct test • Discrete point test • Formative evaluation • Indirect test • Integrative tests • Norm – referenced test • Objective test • Objectives – referenced test • Power test • Speed test • Standardized test • Subjective test • Summative evaluation

4.7 Checklist Do you set the most able learners the highest targets? Are your learners kept informed of how their attainment compares

with the need of the course? Do students evaluate their own performance? Is this evaluation checked by the teacher? Are successes and failures related to theory and how to do best

next time?

Page 102: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 89

SAA No. 2

Write a multiple – choice test made up of 20 items. Use as distracters some of the mistakes made by your pupils. Please note that a corect item will count for 5 points. Do not forget to send your multiple choice test to your tutor.

4.8 Answers to SAQs SAQ 1 This SAQ 1 is meant to activate your schemata. Marking an essay SAQ 2 Your answer depends upon your personal teaching / learning

experience.

Language area Objective Subjective pronunciation X

vocabulary X grammar X discourse X listening X speaking X reading X Writing X

SAQ 3 If your answer to SAQ 3 is not comparable to the one suggested

below, please reread sections 4.2.4.1 and 4.24.2 again.

Techniques Discrete point techniques

Integrative techniques

transformation X Fill – in –the

blanks X

Blank and cue X Joining element X

Replacing elements

X

Adding elements X Arranging elements

X

Matching elements

X

True/ false X Multiple choice X

Page 103: Tests and Evaluation -Metodology

Types of Tests II

90 Proiectul pentru Învăţământ Rural

Cloze X dication X

Information transfer

X

SAQ 4 If your answer to SAQ 4 is not comparable to the one suggested

below, please reread section 4.2.1.2 again.

The test is reliable. The fact that the score is the same points to its objectivity.

SAQ 5 If your answer to SAQ 5 is not comparable to the one suggested

below, please reread section 4.2.2.1 again.

Item B may also be considered an acceptable answer. In this case both B and D are acceptable answers. The revised form is: Who gave the American Colonists the same rights as the Englishmen? A. The King B. Local governors C. Colonial legislatures D. Parliament

SAQ 6 If your answer to SAQ 6 is not comparable to the one suggested

below, please reread section 4.2.2.1 again.

The stem contains words repeated in each option. What was the ring made of? A. gold B. iron C. cotton D. wood 1. The distracters are too difficult.

SAQ 7 If your answer to SAQ 7 is not comparable to the one suggested

below, please reread section 4.2.2.1 again.

The distracters are too difficult. They may distract even the good student. Such a tendency happens in vocabulary test items.

SAQ 8 If your answer to SAQ 8 is not comparable to the one suggested

below, please reread section 4.2.2 again (the paragraph on cloze tests.)

1. After the trip I will … to my country, where my … is expecting me to take … the family business. I am … anxious to do so since … am not interested in that …, and while it is profitable, … is not personally rewarding to …. 2. After the trip I will r..n to my country, where my f…y is expecting me to take o…r the family business.

Page 104: Tests and Evaluation -Metodology

Types of Tests II

Proiectul pentru Învăţământ Rural 91

3. After the trip I will return/ visit/ go again to my country, where my family/ friends/relatives is expecting me to take in/ up/ over/ for the family business. 4. After the trip I … to my country, where my family … me … over the family business.

SAQ 9 If your answer to SAQ 9 is not comparable to the one suggested

below, please reread section 4.2.2.2 again.

Listening comprehension, spelling, or general language proficiency SAQ 10 It is obvious that depends on your teaching / learning experience

Personal choice. SAQ 11 If your answer to SAQ 11 is not comparable to the one suggested

below, please reread sections 3.3.1.1 and 4.2.1. 2 again.

1. The correct answer variety should be used insomuch as it is difficult to obtain agreement, even among experts, on what is the best answer. 2. By having approximately an equal number of true and false statements in the test, we limit the influence of response set on the validity of the test score. But having exactly the same number of true-false statements could be a clue to the test-wise student. It is much better to have more false than true statements since there is evidence that false statements tend to be more discriminating.

4.9 Further Readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 4-10 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press, pp 152-155

Page 105: Tests and Evaluation -Metodology

Testing the Language Skills I

92 Proiectul pentru Învăţământ Rural

Unit 5 TESTING THE LANGUAGE SKILLS I

5.1 Unit Objectives ............................................................................................................92 5.2 Testing Speaking ........................................................................................................92 5.2.1 What Is Speaking?....................................................................................................93 5.2.2 Types of Speaking Based on Content and Function .................................................93 5.2.3 Objectives ................................................................................................................94 5.2.4 Types of Speaking Tests ..........................................................................................95 5.3 Testing Listening .......................................................................................................101 5.3.1 How Do We Comprehend? ....................................................................................102 5.3.2 Micro Skills .............................................................................................................102 5.3.3 Informal Evaluation ................................................................................................103 5.3.4 Scoring the Listening Test.......................................................................................106 5.4 Summary ...................................................................................................................110 5.5 Key Concepts ............................................................................................................110 5.6 Checklist ...................................................................................................................110 5.7 Answers to SAQs ......................................................................................................111 5.8 Further Readings .......................................................................................................112 5.1 Unit Objectives

Today language teaching is communicative, interactive, and integrated. In spite of this, we have to find ways of simplifying things so that learners can listen and talk about real things in real ways. The next two units examine the approaches employed in order to test the four basic language skills.

This unit attempts to provide the most obvious, natural, and effective general strategies of testing speaking and listening. The aim of this unit is to familiarize you with the principal techniques of testing and marking the skill of speaking and listening.

By the end of this unit you should be able to: • construct your own speaking and listening tests • evaluate the speaking and listening tests from the textbooks you

currently use • construct and administer such tests competently

5.2 Testing Speaking

Speech is probably the most socially visible skill, the one that allows the greatest amount of interaction, the one that will most quickly lead people to conclude that you are proficient.

However, we should not forget that speaking • probably comes from listening • is less important that reading or listening for many who are

studying in academic programs in the target language

Page 106: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 93

• in rare cases is less important than writing • cannot be detached from the overall context or from the extra-

linguistic components such as conversational rules, gestures, cultural, information, social states, level of formality, gender and age differences, and so on.

5.2.1 What Is Speaking?

The ranges of things people do when speaking are so broad that saying only that we are teaching or testing speaking is almost meaningless. Oral production varies by: • content • function (purpose), • emotional and social context, • processing capabilities (proficiency)

5.2.2 Types of Speaking Based on Content and Function • short exchanges for information or courtesy • we talk to officials, clerks, police officers, classmates to get the

information we need. We greet and are pleasant to people we meet or deal with (our relatives, the postman)

• longer impersonal exchanges: • job interviews, business transactions, problem solving, seminars or

group work. These are likely to have a formal or semiformal register. • speaking directed at others in structured contexts, such as class

presentations, lectures, news broadcast. The language tends to be at the formal end of the register

• conversations over dinner, gossip, friendly discussions, and so on. These are at the informal end of the register.

• aesthetic, ritualistic and entertaining speech • what we do with others (reciting a pledge, a prayer or singing) or

ask to others with the purpose of eliciting aesthetic responses such as with plays, poetry readings.

• classroom discourse

Types of speaking based on emotional and social context and on processing variables speech can be: objective, cold, impersonal, it can be intimate,

relaxed, and informal speech can vary in relation to:

- relative states, age, or gender - or because of in-group or out-group affiliation

The conclusions about types of speaking are important for

testing speaking. The variety of types points to the complexity of the task of testing speaking. Many different types of speech exist. They are affected by topic, context, relationship between speakers, and proficiency. Mere vocalization of correct utterances or repetitions of sentences in a dialog are not really talking at all.

Real talking implies to say something important to someone under certain conditions and for a certain purpose.

Page 107: Tests and Evaluation -Metodology

Testing the Language Skills I

94 Proiectul pentru Învăţământ Rural

The linguistic form will be among the most important considerations, but will certainly not be the only one. Normal human speech is not produced in the sequence, grammatical form, selection of words, application of the phonological rules but starting from a message triggered during the course of a conversation interaction. This message (feeling, idea or fact) is embedded in our schemata (i.e. it is part of our experience). Because we wish to communicate this message, we activate the linguistic discourse, content, event, or strategic schemata of our listener by means of which we can convey our message.

Tests of spoken language also test other skills (listening for example). There are no pure tests of spoken language because only in traditional academic lectures or at the theatre the language is one-way. The development of the ability to interact in a foreign language involves comprehension as well as production. At the earliest stages of learning a foreign language, formal testing is avoided. Informal observation provides the necessary diagnostic information.

5.2.3 Objectives • to set tasks that are representative samples of the oral tasks that

we expect students to be able to perform • tasks should elicit behavior which truly represents the candidates’

ability and which can be scored validly and reliably. • the testing of speaking is the most difficult of all language tests to

design, administer, and score. Why? • it is difficult to choose the criteria in evaluating speaking: which is

more important – grammar, vocabulary, pronunciation, fluency, listening comprehension, correct tone (fear, anxiety), reasoning ability, and initiative in asking clarification? How can we evaluate properly each of these criteria? Shall we also include among them questions of response?

Other difficulties

• The tester has to get students to speak and to evaluate them at the same time

• Each testee has to be evaluated individually • How are we to evaluate learners at the elementary level? • How are we going to evaluate those testees who want to work in

professions like teaching, business, translation, professional oral reading by radio announcers?

• Tests of speaking appeared much later than other type of tests. The main reasons for this are: for academic purpose, the written language was more important than spoken language. Tests of speaking are subjective and unreliable and also time consuming, expensive and therefore impractical if we take into account that the learners have to be tested individually. The exclusion of speaking tests from examinations has a poor washback on learning and teaching. The spoken test has to have a place in all kinds of language examinations.

SPEAKING- A DIFFICULT SKILL TO TEST

Page 108: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 95

Situations Learner speaks to:

• assessor, • learner and assessor, • interlocutor and assessor

Three views on the nature of spoken language. 1. the literary view e.g. public speeches, dramatic monologues – they are

not spontaneous spoken language, they are prewritten or memorized. 2. the linguistic view sees the spoken language as the oral

expression of writing 3. The communicative view sees language as a spontaneous and

interactive means of developing social relationships.

Point to Ponder

“Evaluation can be done surreptitiously, and it can be done with flags and trumpets; but it must be done, otherwise the teacher will not know if learning is taking place”

(Geoffrey Petty) 5.2.4 Types of Speaking Tests

• reading aloud (problem – solving working in pairs) • oral interview – the interviewer tends to intimidate the learner or to

dominate the interaction; reduced reliability

The Speaking Tests • literary tests of spoken language – reciting a poem or speech • the reading aloud of a prepared passage of prose or poetry • summarizing or retelling a story, book • the discussion of other aspects of a piece of literature • free talk – prepared talk or lectures on given topics • long turns which the learner has only a limited time to prepare (the

description of a picture) The limitations of free – speaking tests • limited interaction • lack of authenticity • poor washback effect • doubtful test security (who is the author of the grading and what is

graded is difficult to assess) • limited sample

Linguistic speaking tests have in view the assessment of stress patterns, intonation, grammatical structure, range of lexical units.

Examples: • an unprepared passage is read aloud (the assessor listens out for

the pronunciation of a few pre-selected words, intonation) • elicitation through questions/ direct instruction e.g. Ask me a

question beginning with …

THE NATURE OF SPOKEN LANGUAGE

WEAKNESSES OF SPEAKING TESTS

Page 109: Tests and Evaluation -Metodology

Testing the Language Skills I

96 Proiectul pentru Învăţământ Rural

• Accuracy of function realization is tested by describing a situation to which the testee must answer e.g. Situation: You are in a restaurant. You find a fly in your soup. What would you say to the waiter?

The limitations: • the unauthentic, non – communicative purpose, no interaction,

restricted method (what is not tested will not be taught), no washforward effect (no real link with the outside world) Advantages:

• no open answers • easy – to - work tests

Guided – speaking tests provide a task environment. The test provides details about: • What is speaking? • To whom? • Why and about what? • The context of situation Advantages: authenticity (role-play, a real – world scenario)

Example: The learner will be given a topic to prepare before entering the room.

Time for preparation: 3 minutes. • Part 1: Presentation: three - minute uninterrupted talk • Assessor: takes notes • Examples of topics: a. your interest and free time b. your reasons for studying English c. your future plans • Part 2: Personalized interview. The assessor asks questions about

the learner’s family, education, interest, future plans (5 minutes): Where do you live? What do you enjoy about living there? How long have you been studying English? Have you been studying English in the same class?

SAQ 1 Which of the following characteristics are generally typical of modern tests of speaking? Underline your choices.

a. integrative – discrete point b. subjective – objective c. high – reliability - low reliability d. authentic – contrived e. direct – indirect f. good validity – poor face validity

Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

GUIDED TESTS

Page 110: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 97

Example: Look at the two pictures below. Discuss with your partner.

Point to Ponder Getting accurate measurements of how skillful students are in authentic communication becomes very difficult in practice. Furthermore, indirect measures of correction take a great deal more skill and planning, though they may be worth the extra effort when dealing with students who are feeling discouraged or threatened.

Reading Aloud

The testee reads aloud to the assessor a passage of a text, a dialogue (one of the parts is read by the interviewer), a specialized technical English text, a descriptive passage, instructions (how to cook a dish or giving instructions by phone), retelling a story Variants: • Reading scripted dialogue with someone else reading the other part • Reading text with phonetic markers (sounds, words, technical

vocabulary, idiomatic or conversational expressions), speech factors (assimilation, liaison or contractors), words or sounds that are known to cause problems for speakers of a certain language

• Reading sentences containing minimal pairs • Spelling aloud (if testees apply for a job in travel agency or for

orders by phone) • Reading from a table figure, abbreviations or initials in different

quantities

SAQ 2 Tests of speaking do not test objectively more than two of the four components of communicative competence. Circle the correct competences:

a. linguistic competence b. discourse competence c. sociolinguistic competence d. strategic competence

Compare your choices to those in the “Answers to SAQs” section at the end of the unit.

Advantages • The assessor may choose the topic • The same test may be given to all testees (higher reliability) • Simple to administer and quick to score • Correct production of sentences stems and intonation patterns

suggest a good comprehension Disadvantages • The technique is not authentic (we rarely read aloud in real life)

Page 111: Tests and Evaluation -Metodology

Testing the Language Skills I

98 Proiectul pentru Învăţământ Rural

• The test is not communicative • Reading aloud is a skill that can be improved in a short time • Only the mechanical skills are tested i.e. pronunciation, intonation

Using pictures, maps, diagrams (pictures provide a realistic context)

• Pictures of single object (for testing the production of phoneme contrasts)

• Pictures of scenes ( for description, for narration)

1. The testee is given a picture 2. The testees studies the picture for a few minutes 3. then, he/ she is required to describe the picture in a given time 4. The number of words he/ she speaks is counted by one examiner 5. the number of errors are counted 6. separate scores for fluency, grammar, vocabulary, phonology,

accuracy of description Other examples:

• advertisements • pictures for comparison • pictures for instructions • if the picture depicts a story or sequence of events, it is useful to

give the testee one or two sentences as a starter • oral interview • question and answer ( disconnected questions are graded in order

of increasing difficulty; suitable for lower levels; easy to adapt questions to suit level)

Point to Ponder Whatever the homework, if it is set it must be seen, marked or tested by the teacher, otherwise it will be evaded.

Individual Elicitation

Research and specialized testing have devised various ways of

eliciting specific parts of the language. One technique is to use pictures. The tester talks about part of the picture (for example using a singular noun, or present tense verbs or questions forms). The learner responds and the tester notes if the expected plural, past tense verb, or the inflected form of the verb was present.

Interviews Interviews are a direct, face-to-face exchange between testee and

interviewer. Advantages • they are structured • the interviewer maintains firm control, keeps the initiative • more authentic • several topics may be raised

ASSESSING SPEAKING: OTHER PROCEDURES

Page 112: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 99

Disadvantages • the testee sees the assessor as a superior (the result is only one

style of speech) • many functions are absent • only at intermediate level or below • a candidate may dominate another Stages • introduction (polite social questions to put the learner at ease) • find level against a specific scale • check questions above and below the established level • several more questions at about the right level • self – assessment (oral ability, strength/ weakness) • feedback, tell the learner the result, invite any comment • assessor should not over – correct errors, fill pauses automatically,

interrupt unless necessary

Example Question and answer for lower levels achievement test. If one question is not understood, the interviewer can move to another. This model may be adopted for various levels, simplified or made more complex.

• What’s your name? Could you spell it? • How are you? • How well can you speak English? • Do you like speaking English? • What do you do? What’s your job? • Tell me a little about your family • Can you count up to twenty? • Can you tell me the time? • What is the date today? • What day of the week is it? • What is the weather like? • Where/ how did you learn to speak English? • Tell me three things you did yesterday. • What were you doing/ where were you at this time yesterday? • Where do you live? How long have you been living there? • Where do you work? Do you like it there? How long have you been

working? • What is your hobby? What do you like doing in your spare time? • What will you do when you leave here today? • What are you going to do for your next holiday? • What are your plans for the future? • Have you been to England/ America? • How many foreign countries have you visited? • When did you go there? How long did you stay? • What did you see/ do? • Did you enjoy your stay/ visit? Why? Why not?

Page 113: Tests and Evaluation -Metodology

Testing the Language Skills I

100 Proiectul pentru Învăţământ Rural

• What differences in lifestyle/ transport/ food/ people did you notice?

• Would you like to live/ work/ go back there? • Can you speak any other languages? How well?

Interaction with Peers

Two or more candidates are asked to discuss a topic. Pictures When looking at pictures, you can often use the present tense: This picture shows a … who seems to be ..In the center of the picture I can see … Role play The candidates are asked to assume a role in a particular situation. Interpreting In part 3 of Paper …, you and your partner will have to reach a decision or work something out using one or more pictures or diagrams. Discussion between candidates Discuss some of the important achievements that have influenced the world we live in. Which three achievements have offered the greatest benefit? Imitation Candidates have a series of sentences, each of which they have to repeat in turn. If the sentences are long enough, testees will make the same mistakes in performing this task as they will when speaking freely. Advantages

• control in choice of sentences • good for a placement test

SAQ 3 Oral interviews are criticized for at least 3 reasons. Can you identify them? Write your answers in the space provided above (in no more than 10 words) and compare them answers to those in the “Answers to SAQs” section at the end of the unit.

Page 114: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 101

SAQ 4 What should you test in order to encourage oral ability?

Write your answers in the space provided above (in no more than 60 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

5.3 Testing Listening

Listening comprehension may be considered the language skill of the greatest significance to foreign language teachers because it: • offers the most natural form of input • is a highly valuable skill in its own right • is a prerequisite of meaningful speaking and authentic interaction In order to identify the techniques of testing listening comprehension we have to understand:

What? Types of listening function of intonation, rate of delivery, stress, rhythm: • Short, impersonal material: announcements in airports, train

stations, information about office hours, the weather • Longer impersonal material: lectures, reports, presentations • Interpersonal interchanges: conversations, greetings, invitations,

compliments, eavesdropping • Aesthetic and entertainment functions: songs, poems, movies • Instructional functions: orders, dictations, true and false tests, fill –

in exercises

Types of listening based on context and discourse variables: • face – to face • remote • live • recorded • with a friend • with a stranger • between social equals • between people different in age, sex, and status • with or without visual or non – verbal cues

TYPES OF LISTENING

Page 115: Tests and Evaluation -Metodology

Testing the Language Skills I

102 Proiectul pentru Învăţământ Rural

• surrounded by noise or in a quiet background • over the phone. Communication is not less effective. The listener

may have difficulties because there are no non-verbal signals

Types of listening based on production and processing variables. Speech can be: • slow or fast • formal or informal • full of hesitations, pauses, repetition • polished or casual • linguistically, complex or simple • whispered • clearly articulated • native or non-native

SAQ 5 Which is more difficult to understand in a foreign language? A face-to-face or a telephone conversation? Why? Write your answers in 20 words in the space provided and compare them to those in the “Answers to SAQs” section at the end of the unit.

5.3.1 How Do We Comprehend?

We comprehend because we know a lot about what the speaker is saying. We have access to the same background, the same frame of reference, to the same schemata. Generally, we comprehend by linking up interactively with the speaker on familiar cognitive and emotional ground.

The main modern approaches to TEFL are rich sources for testing listening comprehension e.g. Total Physical Response (the learner responds to a number of commands), the Comprehensive Approach (show two pictures and give a command, or ask a question e.g. point to the picture that …

5.3.2 Micro Skills

Micro skills might include: • discrimination among the sounds of English • recognize sound patterns, rhythmic structures, intonation, and their

role in signaling information

ASSESSING SPEAKING: OTHER PROCEDURES

Page 116: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 103

• recognize forms of words, grammatical word classes, tense and agreement, patterns, rules, elliptical forms, cohesive devices communicative functions

• infer situations, participants • infer links and connections between events, deduce cause and

effects • detect main idea supporting new information

5.3.3 Informal Evaluation

You may need to know permanently if your students really comprehend. Informal evaluation is much easier than you might imagine. Learn to walk and “read” your students’ faces and gestures and listen to the overall communicability of their responses: • a stiff look may mean: boredom (a task is too easy) • Fear, frustration, restlessness, whispering (the input is too difficult)

Correction is part of the evaluation process, but on the whole

does not require correction. You can make mistakes in hearing but most of these mistakes are self-evident. The recommended materials and procedures from modern textbooks that aim at developing listening comprehension are a source for listening comprehension tests. In principle, listening is an easy skill to measure. In practice, objective testing requires high – quality sound, special testing materials, and proper facilities.

Listening materials tend to test rather than to teach. A comparison of teaching materials with explicitly designed testing material e.g. those used for the TOEFL examination, taken by foreign students who intend to study at a United States university or the Cambridge First Certificate intermediate level examination show that the differences are often slight i.e. the most important ones involve the degree of attention paid to the learner’s responses: in tests, the responses really do count.

R. Lund (1990) in Taxonomy for Teaching Second Language Listening (Foreign Language Annals 23 (1), 105 – 115, identifies nine different ways in which we can check listeners’ comprehension • Doing – the testee responds physically to a command • Choosing – the testee selects from alternatives or pictures,

objects, texts • Transferring – the testee draws a picture of what is heard • Answering – the testee answers questions about a message • Extending – the testee provides an ending to a story heard • Duplication – the testee translates the message into the native

language or repeats it verbally • Modeling – the testee may order , for example, a meal after

listening to a model order • Conversing – the testee engages in a conversation that indicates

appropriate processing of information

TAXONOMY OF LISTENING TASKS

Page 117: Tests and Evaluation -Metodology

Testing the Language Skills I

104 Proiectul pentru Învăţământ Rural

The main procedures are • Teacher talk – the most authentic and useful listening experience

is when the teacher is saying something real and learners are doping something in response e.g. Today we are going to talk about …, Listen to this story about …, I am going to talk about some of the people in your class and you will tell me who I am talking about

• Total Physical Response. Give the following commands. Beginning with everyone sitting down, with three pencils on each desk: Pick up one pencil in your right hand, Point the tip of the pencil up, and Lift both hands over your desk.

• Imaginary movement. Say the following sentences and ask learners to act them out: You push the door open slowly and quietly. You put your head through, you look around thoroughly

• Non – verbal and short response. Follow the movement on the map. Begin at … go out the front door and turn right … walk along … cross the street … where are you now?

• Look at the picture and follow the details. Pick the picture. Talk about a series of pictures on a cartoon strip, mentioning as many details as you can that also apply to several other pictures. Then ask the testee to pint to which picture you have referred to.

• True or false. Say true or false after you hear the following sentences: The flight lasted 10 days

• Identifying or pointing. Who is wearing … who has three books on her desk?

• Find … and color/ Find and cut it out • Definitions e.g. It lives in the ocean, breathes air, is smaller than a

whale, and likes to jump in and out of the water playfully. Answer? • Unnamed biography. Describe a famous person. After you finish,

ask who the person is. The same can be used for places, things, activities.

• Connected discourse. Stories: for all ages at any level. Use stories of all types. Select those appropriate for the age and interest level of the class. If you are using a story book that is well illustrated, you have the added possibility of using the pictures and general story line.

• Dialogues. Take both sides of this dialogue, shifting posture, tone of voice, or otherwise identify which of the two characters is speaking.

• Dictation. It is a specialized but highly useful listening activity which also requires writing. It can be used with all ages and levels of learners who can write. Choose a passage that is comprehensible and also writable. Read over the whole passage, asking for general responses. Reread the passage two or three times, broken into short segments. You can read phrase by phrase, and then repeat the whole sentence, and so on through all the sentences. Repeat a final time, reading through at normal speed. Correct them yourself.

• Auditory scanning (i.e. listening for detail/ specific information). Give the questions in advance. Tell them that they will only hear

PROCEDURES

Page 118: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 105

the selection once. Weather report. Thank you for calling weather line. Currently at the downtown weather forecast station the temperature is 24 degrees. And now our forecast. Overnight, clear skies and lows in the upper 10s. Clear and sunny tomorrow morning, with highs around 28. Watch for afternoon showers with temperature dropping into the low 10s. Questions in the written form are in front of the testee: What is the current temperature? What will the high temperature be tomorrow?

• Individual responses. In one – to – one setting you can talk about pictures and have the student point, you can make reports and see if the learner can follow them, or if the level is high enough, you can have conversational interchanges and see if the learner can respond to what you are saying. I you want to know about comprehension, you will do most of the talking ands require rather simple, but unambiguous responses.

• Simple paper and pencil test. Normally, you should pre-record the tests if you want to have objective results i.e. comparable results with different groups. Be careful! In giving live cues you are very tempted to respond to the non – verbal feedback of the group and vary your production considerably.

• True – false. Provide a series of statements that must be comprehended for their general meanings. They are clearly true or false. The testees mark a standard answer sheet.

• Pictures. You say something about one of the group of 3 or 4 pictures. The testee picks the one you referred to and works the answer sheet.

• Multiple – choice. You can make statements, ask questions, or have short conversations. The test contains 3 or 4 choices which the testees read. They pick the one most related to what they hear.

• Completion multiple – choice. Learners choose the best way of completing the lines of a conversation from among the 3 or 6 choices and mark their answer sheets.

• Oral cloze. Testees see a passage that has blanks. As they listen the second time, they fill in the blanks.

• Note taking. Candidates take notes during the talk. After the talk is finished they see the questions they have to answer.

SAQ 6 Dictate the following text: Dear Sir, I am answering your advertisement/ for an engineer./ I saw it in the paper yesterday./ Can I come for an interview next week?/ I left my job last month/ and I am free every day of the week./ I am 25 in August/ and I am not married./ I studied at Bolton University/ and I finished there in 2004. What do you test? Why is the dictation test written this way?

Page 119: Tests and Evaluation -Metodology

Testing the Language Skills I

106 Proiectul pentru Învăţământ Rural

Write your answers in the space provided above (in no more than 60 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

5.3.4 Scoring the Listening Test

Tests may be classified into non- productive exercises (only listening is involved; it is objective) and productive (integrated with other skills e.g. writing or speaking; it is subjective) Non – productive listening tests • Multiple –choice questions • True – false questions • Multiple- choice cloze • Matching • Sequencing • Information transfer Productive listening tests • Open – ended questions • True – false (false items to be corrected by the testee) • Listening cloze • Summarize • Note taking • Completion tasks • Dictation Non – productive tests can be scored objectively. Cloze, dictation, gap – filling can be scored semi-objectively. For reasons of practicality and reliability, these tasks should be selected. For reasons of washback, productive tasks are recommended.

SAQ 7 Which humanistic approach is recommended for developing and testing listening? Write your answer in the space provided above (in no more than 5 words) and compare it to that in the “Answers to SAQs” section at the end of the unit.

Page 120: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 107

A check list of listening task types that may be used in formal and informal testing

Type Example 1. Listen and Do: During or after listening, students are asked to perform some action

Numbering a drawing, completing a map, ordering items in a list, matching items, labeling, ticking.

2. Listen and Do Nothing: no output Listening to a story or a poem

3. Listen and Follow: students may be given a map or picture and match what they hear with what they see

Map, picture, diagram work

4. Listen and Respond: Students are asked for an affective response

Listening to tape – did they like / dislike it, did they emphasize with the person, etc.

5. Listen and Answer: The traditional type of question task

Students have to answer questions – of a variety of possible types: T/F, Wh-; m-c questions or open-ended Qs

6. Listen and Compare: Listening for similarities / discrepancies between two (or more) inputs

The inputs may be both/all listening inputs (e.g. Jigsaw Listening) or a mixture of a tape and print material, e.g. radio and press reports on same event

7. Listen and Complete: Gap-filling

Cloze-type exercise; masked-word tape task (word obscured by noise on tape); collating fragments of text (Patchwork Listening)

8. Listen and Predict: Partial text is provided and students are asked to anticipate

What will Mrs. X say next? How will Mrs. Y respond to that? How will the story end?

9. Listen and Correct: Students have written text which they correct to match spoken version

10. Listen and Recall/Write: Making and using notes

Students take notes as they listen, in order to prepare a written summary, or reach agreement on what happened, in group discussion. (Dictation is a variant of Listen and Write)

11. Listen and Discuss: Using a tape as an information source for oral interaction

Deduction of information from spoken (and/or written) texts, evaluation of information, problem solving on basis of taped text.

12. Listen and React: Expressing value judgments

Students are asked to make value judgments about opinions given or actions described on tape e.g. Did the person do the right thing?

Page 121: Tests and Evaluation -Metodology

Testing the Language Skills I

108 Proiectul pentru Învăţământ Rural

Point to Ponder

• Talking is sharing, but listening is caring. (Anonymous) • Education is the ability to listen to almost anything without

losing your temper or your confidence.

Rating Scale for testing listening

8. Handles all general listening operations; shows confidence, competence similar to those in his/ her native language, able to compensate for difficulties 7. similar to nine, low repetition, repairs, adjusts listening strategies for purpose 6. Extracts the majority of messages with minor loss of details, few corrections 5. Handle moderately listening operations; extract most of the message; need for repetition 4. Loss of detail, little grasp of subtlety, frequent need of repetition, and difficulties in handling input at normal speed 3. Only the gist of the message, need for repetition, and difficulties in handling input at normal speed, no compensation for errors 2. Comprehension of isolated points, dependent on repetition, a narrow range of language 1. Little confidence, comprehends only basic messages, unable to compensate SAQ 8 Listening comprehension test PART A

• Testee’s objective: to demonstrate your ability to understand spoken English

• For each question you will hear a short sentence. Each sentence will be spoken just once.

• After you hear each sentence, read the four choices in your book, marked (A), (B), (C), and (D), and decide which one is closest in meaning to the sentence you heard. Then, on your answer sheet, find the number of the question and fill in the space that corresponds to the letter of the answer you have chosen. Fill in the space completely so that the letter inside the oval cannot be seen.

• Listen to the example: Please turn in the key of your room before you leave.

• In your textbook, you read: A. Please lock your room when you leave B. Turn the key to the left to enter the room. C. Please return your room key before leaving. D. You must leave your room by four o’clock.

The correct answer is C. • Listen to another example: What is Mary going to do

Page 122: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 109

tomorrow? • In your textbook, you read:

a. Will Mary be traveling tomorrow? b. What are Mary’s plans for tomorrow? c. Who will be with Mary tomorrow? d. Does Mary have to do it tomorrow?

The correct answer is b PART B – 15 short conversations In Part B, you will hear short conversations between two people. After each question, a third person will ask a question about what was said. Read the four possible answers and decide which one is the best answer to the question you heard. Then, on your own answer sheet, find the number of the question and fill the space that corresponds to the letter of the answer you have chosen.

• Listen to an example: A man tells a woman that he doesn’t like the painting either. The question is: What does the man mean?

• In your textbook, you read: a. he doesn’t like the painting either b. it doesn’t know how to paint c. he doesn’t love any paintings d. he doesn’t know what to do

The correct answer is A. PART C

In this part of the test, you will hear longer conversations and talks. After each conversation or talk, you will be asked some questions. After you hear a question, read the four possible answers in your textbook and decide which is the best answer to the question you heard.

• Listen to an example: The topic is computer animation. • Question: What is the main purpose of the program? • In your textbook, you read:

a. to demonstrate the latest use of computer graphics b. to discuss the possibility of an economic depression c. to explain the workings of the brain d. to dramatize a famous mystery story

The correct answer is C. • Question: Why does the speaker recommend watching the

program? • In your textbook, you read:

a. it is required of all science engines b. it will never be shown again c. it can help viewers improve their memory skills d. it will help with coursework

The correct answer is D.

Comment upon the above test: a. Is it a discrete/ integrative test? b. What language areas does it cover? c. What type of test is it? d. How do you evaluate such a test: easy/ difficult

Write your answers in the space provided above (in no more than 10 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 123: Tests and Evaluation -Metodology

Testing the Language Skills I

110 Proiectul pentru Învăţământ Rural

5.4 Summary

This unit has been concerned with testing speaking and listening specific procedures have been introduced for formal and informal assessment of speaking and listening. Changing emphasis in the assessment of speaking and listening is a move towards integrative tests that cover all four elements of communicative competence, towards objective scoring and higher reliability and authenticity. We have also identified other characteristics: • Assessing processes • Internal (during course assessment instead of external end of

course assessment) • Use of a variety of methods • Criterion referencing • Formative identification of strengths and weaknesses and

recording of positive achievement instead of pass/ fail summative assessment.

5.5 Key Concepts

• Oral interview • Linguistic competence • Discourse competence • Sociolinguistic competence • Strategic competence • Transactional • Information transmitting • Interactional • Transactional • Authenticity • Top-down • Bottom – up • Interlocutor/ assessor

5.6 Checklist

Do students draw up checklists of criteria for success? Do your learners get frequent reinforcement, e.g. marks,

comments, praise, etc.? Does your reinforcement or recognition of success come as

quickly as possible after the student has completed the work? Are the standards you set seen as work achieving by your

students, as well as being achievable by them? Do you test regularly, and set well-managed deadlines for

students’ work?

Page 124: Tests and Evaluation -Metodology

Testing the Language Skills I

Proiectul pentru Învăţământ Rural 111

5.7 Answers to SAQs SAQ 1 If your answer to SAQ 1 is not comparable to the one suggested

below, please reread sections 2.3, 2.4, 4.2.1.2, and 4.2.4.2 again.

a. integrative b. objective c. high reliability d. authentic e. direct f. good validity

SAQ 2 If your answer to SAQ 2 is not comparable to the one suggested

below, please reread section 3.3.2.1 again.

a, d SAQ 3 Your answer depends upon your personal teaching / learning

experience.

The interviewer can intimidate the listener and dominate the interaction.

SAQ 4 If your answer to SAQ 4 is not comparable to the one suggested

below, please reread section 5.2.4 again.

If you want to encourage oral ability, then test oral ability. Generally, the abilities / skills should be given sufficient weight in relation to other abilities. Some teachers ask their learners to bring something to the test, a favourite object which is appropriate to their age/ interest (an object, a picture). The testee is asked to speak about the object. This technique reduces the fear of the unknown. The disadvantage is that the presentation can be prepared.

SAQ 5 If your answer to SAQ 5 is not comparable to the one suggested

below, please reread section 5.3 again. A telephone conversation is more difficult to understand. A

telephone conversation has a high information content and little verbal redundancy. If you do not understand, it can be embarrassing to ask for repetition. At the same time, the telephone adds a lot of noise interference. However, we cannot enjoy the benefits that body language contributes to improved listening comprehension.

SAQ 6 If your answer to SAQ 6 is not comparable to the one suggested

below, please reread section 5.3.3 again.

Listening comprehension is tested. The dictation is read three times. 1) The first reading of the whole text is done aloud at normal speed. 2) Then each block of text /…/ is read twice in succession with a pause between. 3) Read the whole passage through and allow

Page 125: Tests and Evaluation -Metodology

Testing the Language Skills I

112 Proiectul pentru Învăţământ Rural

students time to check what they have written before collecting the answer paper.

SAQ7 If your answer to SAQ 7 is not comparable to the one suggested

below, please reread section 5.3.3 again.

The Total Physical Response SAQ 8 If your answer to SAQ 8 is not comparable to the one suggested

below, please reread sections 3.3.2.1, 4.2.4.1 again.

a. discrete b. grammar, vocabulary, discourse c. proficiency

5.8 Further Readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 16-24 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press, pp 101-116, 134-141

Page 126: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 113

Unit 6 TESTING THE LANGUAGE SKILLS II

6.1 Unit Objectives ......................................................................................................... 113 6.2 Testing Reading ....................................................................................................... 114 6.2.1 Types of Reading based on Content and Function ................................................ 114 6.2.2 Types of Reading based on Context and Processing Variables ............................ 114 6.2.3 Types of Reading according to Purpose ................................................................ 115 6.2.4 Cloze Passages ..................................................................................................... 116 6.2.5 Passages with Questions ...................................................................................... 117 6.2.6 Microskills .............................................................................................................. 117 6.2.7 True – False – Don’t Know Checks ....................................................................... 118 6.2.8 Other Reading Techniques .................................................................................... 118 6.2.9 Assessing Overall Comprehension ........................................................................ 118 6.2.10 Issues in Teaching Reading ................................................................................ 119 6.2.10.1 Narrative Text. Reading for Pleasure ................................................................ 120 6.2.10.2 Reading for Information .................................................................................... 120 6.2.10.3 An Instructive Test ............................................................................................ 120 6.2.10.4 Types of Test Procedures ................................................................................. 121 6.3 Testing Writing .......................................................................................................... 121 6.3.1 Conditions under which Writing Takes Place ......................................................... 122 6.3.2 Current Theories of Writing with Particular Reference to Foreign Language Writing ...................................................................................... 123 6.3.2.1 Writing as a Product ........................................................................................... 123 6.3.2.2 Writing as a Process ........................................................................................... 124 6.3.2.3 Writing as a Social Activity .................................................................................. 124 6.3.3 The Main Approach to Teaching Writing. Text – Based Approaches .................... 125 6.3.3.1 Grammatical Form Practice ................................................................................ 125 6.3.3.2 A Communicative Approach ............................................................................... 125 6.3.3.3 Writer – Based Approach .................................................................................... 125 6.3.4 Various Choices of Writing Tasks .......................................................................... 126 6.3.4.1 Scoring Essay Type Tests .................................................................................. 126 6.3.4.2 The Point Score Method ..................................................................................... 128 6.4 Summary .................................................................................................................. 130 6.5 Key Concepts ............................................................................................................ 130 6.6 Checklist ................................................................................................................... 131 SAA 3 ............................................................................................................................. 131 6.7 Answers to SAQs ..................................................................................................... 132 6.8 Further Readings ...................................................................................................... 132 6.1 Unit Objectives

Interactive, integrated skills approaches to language teaching emphasize the interrelationship of skills. Reading is usually developed and htested in association with writing, listening and speaking activities.

The aim of this unit is to familiarize you with the main procedures of testing the skills of reading and writing so that you will

Page 127: Tests and Evaluation -Metodology

Testing the Language Skills II

114 Proiectul pentru Învăţământ Rural

feel confident to construct and administer such tests competently and appropriately. We consider the major types of tests, their advantages and limitations. We shall also give some suggestions on how to prepare and grade the essay questions.

At the end of this unit you should be able to answer all issues in testing reading and writing. Moreover, you’ll be able to apply many of the techniques to your own situation.

6.2 Testing Reading Reading theories, activities and materials, and research reports

on reading are increasing both in quality and quantity. In addition, the number of publications has also increased in spite of the expansion of the visual media that apparently make reading almost unnecessary. More than that, the general public is keenly concerned about promotion of literacy.

Reading is an extremely useful skill in itself. It makes it easier for learners to get usable intake, comprehensible input as they can adjust their reading to fit their own level. They can stop, back up, use a dictionary, or ask for help much more easily than if they were listening or speaking.

In other words, reading is the ideal form of comprehensible input. The classification of the main types of reading gives you an idea of the kind of authentic readings that should be used in teaching and testing.

Point to Ponder The art of reading is to skip judiciously.

P.J. Hamerton, 1834 – 1894, The Intellectual Life

6.2.1 Types of Reading based on Content and Function

The list is important if we think of the positive washback. A variety of types of reading tests will encourage the learners to read a broad range of texts. The list also gives you suggestions for using authentic texts in testing listening comprehension. • Short, impersonal material: highway signs, tickets, time tables,

labels, instructions. The style is usually formal and elliptic (missing articles, regular verbs)

• Larger, informational material: articles, books, reports, business letters, etc

• Personal material: personal letters, notes and messages, biographies, what we write

• Literary and aesthetic material: short stories, novels, plays, poetry • Instructional materials: textbooks, workbooks, tests, cloze exercises

6.2.2 Types of Reading based on Context and Processing Variables Reading can be: • Formal • Informal • Impersonal • Light

Page 128: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 115

6.2.3 Types of Reading according to Purpose • Skimming • Scanning • Speed reading • Reading for enjoyment • Reading for information • Reading for studying

Reading can vary depending on: • The age group • Gender • Social level • Professional background • Legal status

Reading varies according to: • Length of sentences • Difficulty of words • Complexity of sentence structure • Size of print • Use of co-text Reading abilities and types of reading can vary greatly according to readers’ interests, proficiency, reading habits, background

Reading is possible because the reader and the writer share the same schemata. The reader reads selectively using the appropriate knowledge, skill and experience (schemata) to get on the same ground as the writer and figure out what the message is. Reading is favored by the grammar – translation and the reading approach. Evaluation plays an important role in the process of developing this skill. You need to evaluate students’ progress, to plan future reading and also to detect problems before they become serious. Evaluation is reasonably easy, based on learners’ ability to respond to the meaning of the material. Teachers are sometimes tempted to take the phonological performance of oral readings as a guide to reading ability, but we must view such a tactic a serious error. Some learners can perform a “passage” perfectly without understanding at all, as others can sound terrible and know exactly what they read.

Since some reading material can be talked about reasonably well simply by looking at the co-text and the titles, however, teachers should not be content with a superficial interpretation. By having a variety of response types, you can also better judge which types of comprehension each student has. Formal tests of general skill, where used with caution, can also be useful.

The procedures that can lead to growth in reading ability are a source for formal and informal testing: • reading what you have said • reading around the classroom (e.g. the learners read the labels

placed on the door, table, chair etc)

HOW DO WE READ?

PROCEDURES

Page 129: Tests and Evaluation -Metodology

Testing the Language Skills II

116 Proiectul pentru Învăţământ Rural

• reading material presented first orally • managed reading of short passages • non – expository, non – narrative reading • skimming and scanning • speed reading

In order to make the reader give overt responses you may ask the reader: • To respond physically to a command rendered in written form • To select from written alternatives • To summarize what has been read • To answer questions about a written text • To outline and take notes from a written passage • To provide an ending to a story • To translate the message into the native language • To read instructions and assemble a toy • To talk in order to prove comprehension

SAQ 1 Why does reading share some techniques of testing with listening? Write your answers in 20 words in the space provided and compare them to those in the “Answers to SAQs” section at the end of the unit.

POINT TO PONDER Before testing reading in a foreign language, the native language reading skills of the testee must be assessed. If some testing skills are not yet developed by the testee in his native language, you can be sure that the learner encounters difficulties in reading in the foreign language. However, do not conclude that reading skills from the native language can be transferred to the foreign language.

Reading tests can follow the general outline of listening tests i.e

the same type of tests that are used for listening can be used for reading. They can be handled individually, with true – false questions, written instructions, and so on. As easy as reading is to organize informally, however it is surprisingly difficult to measure formally.

6.2.4 Cloze Passages

Cloze passages are easy to prepare, but the fill – in type can be difficult to write. Multiple – choice cloze tests appear to do much the same thing, being only a little longer to prepare, and as many times easier to score. You have to supply the correct choice and two or three other “attractive” distractors (that have the right meaning but

WHEN IS A CLOZE PASSAGE RELIABLE ?

Page 130: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 117

wrong grammatical form, or somehow look right but have the wrong meaning, and so on). Cloze passage of over 40 blanks can approach reliability of 85 and above rather easily. But once made, such tests are difficult to revise. You cannot just cut out the parts of the passage that do not give good results as you can do with multiple – choice tests or other test types.

6.2.5 Passages with Questions

Many reading tests use passages followed by 3 or 5 questions (for example, the TOEFL reading section). These tests are extremely difficult to construct. First, it takes a lot of time to find suitable passages that have content which is equally familiar to all testees. The questions tend to overlap, and the answer to one question often gives cues to the answers of the others. Reading through the questions in fact can often tell you what the passage is about and allow a high level of performance even if you do not read the passage. Such tests take a lot of time per item, and in some ways require more academic and study skills than reading skills.

6.2.6 Microskills

Lists of reading skills give us an idea of what to teach. At the same time, these micro skills can become testing criteria: • Discriminate among the graphemes and orthographic patterns of

English • Send chunks of language of different length in short – term

memory • Process writing at an efficient rate of speed • Recognize the cohesive devices, rhetorical forms, communicative

functions • Realize that a particular meaning can be rendered in different

grammatical forms • Infer context that is implicit • Infer links, connection, causes, effects, main idea, supporting idea,

new information • Distinguish between literal and implied meaning • Develop and use reading strategies (scanning, skimming, guess

the meaning of words from the context) • Identify referent of pronouns • Understand relations between parts of a text (introduction,

development, conclusion)

Setting criterial levels for reading is problematical. According to Arthur Hughes “the best way to proceed is to use the tasks themselves to define the level. All of the items (and so the task that they require the candidates to perform) should be within the capabilities of anyone to whom we are prepared to give a pass. In other words, in order to pass, a candidate should be expected, in principle, to score 100%. But since we know that human performance is not reliable, we can set the actual cutting point rather lower, say at the 80% level. In order to distinguish between candidates of different levels of ability, more than one test will be required.

READING SKILLS

CRITERIA LEVELS

Page 131: Tests and Evaluation -Metodology

Testing the Language Skills II

118 Proiectul pentru Învăţământ Rural

6.2.7 True – False – Don’t Know Checks

True – false tests are useful as quick assessment of reading comprehension. Scoring: 2 points for a correct answer, -1 point for an incorrect answer, no penalty for answering don’t know. Wilga Rivers (1978: 267) suggests that this procedure discourages wild guesses. Advantages: the True – false test requires attention to structural cues.

6.2.8 Other Reading Techniques • Questions in English requiring answer in students’ native

language. This is helpful for students who comprehend the text but have troubles saying what he has understood in correct English.

• If the text is too difficult students may be asked to read the questions before they start reading.

Point to Ponder Eye movements during reading are an important source of information about the reading process. Modern theories of reading usually analyze reading into a sequence of processes that are applied to each word as it is encountered in the text.

6.2.9 Assessing Overall Comprehension

• Supply in English a suitable title/ suitable subtitles for a text • Outline the plot • Give the main idea for each paragraph • Supply paraphrases or definitions for words in a text

Example of Reading Comprehension Test

After reading the passage, answer the questions by choosing

the letter for the alternative which could accurately complete the statement.

SAQ 2 What happens if in a nine – point scale, the best paper is assigned a 8 and the worst paper assigned a 2 because the teacher considers that the extreme scores should not be used because no paper is good enough to receive the top score and no paper is bad enough to receive the bottom score.

Write your answers in the space provided above (in no more than 40 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

Page 132: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 119

“I must tell you that there is something in the proximity of the woods which is very singular. It is with en as it is with the plants and animals that grow and live in the forests: they are entirely different from those that live in the plains – by living in or near the woods, their actions are regulated by the wilderness of the neighborhood. The deer often come to eat their grain, the wolves to destroy their sheep, the bears to kill their hogs, the foxes to catch their poultry. This surrounding hostility immediately puts the gun into their hands, they watch these animals, they kill some; and then, by defending their property, they soon become professed hunters; this is the progress; once hunters, farewell to the plough. The chase renders them ferocious, gloomy and reasonable; a hunter wants no neighbor, he rather hates them, because he dreads the competition. In little time their success in the woods makes them neglect their village.” (Letters from an American Farmer, III, “What is an American?” by St. John de Crèvecoeur)

2. Living in the woods affects:

a. animals but not plants and men b. plants but not animals and men c. men as well as plants and animals d. men but not animals and plants

3. The frontiersman a. is forced to become a hunter b. hunts for sport c. hunts rarely d. prefers fishing

3. Hunting and farming

A. go hand in hand B. do not work well together C. have no effect on one another D. are alternate pursuits

4. Hunting makes the frontiersman

A. more sociable B. a snob C. less sociable D. indifferent to neighbors

5. The author’s opinion of the frontiersman seems A. high B. unflattering C. flattering D. envious

6.2.10. Issues in Teaching Reading

• Skills may be used, with one predominant skill, in combination or in isolation e.g. reading without responding in writing. Testing skills in isolation is favoured for washback reasons and also because integrated tests generate a lot of output (written or spoken) which

AUTHENTIC READING

Page 133: Tests and Evaluation -Metodology

Testing the Language Skills II

120 Proiectul pentru Învăţământ Rural

cannot be marked objectively. Moreover, integrated testing is time consuming, costly, less reliable than objective testing

• Reading is not a passive or receptive skill but an active process • Authenticity, an extremely important factor in testing reading,

refers to the degree to which language teaching materials have the qualities of natural writing (texts from newspapers, magazines, and other authentic materials). The opposite of authentic tests is simplified tests. When authenticity is discussed, Widdowson’s sense (1976: 165) has to be taken into consideration: “Authenticity … is a function of the interaction between the reader and the text which incorporates the intentions of the writer/ speaker.”

• Authenticity of task is also an important issue. A task is authentic when the result is a behavioral outcome (we do something with the information derived from reading e.g. we read a recipe book and then make a cake). However, it is also true that reading is not always accompanied by a behavioral outcome as in the real world. The texts we read have various purposes, for example reading for reference, for interest or pleasure, for information or for instruction or advice. Each of these types has a different style (uses a certain tense or voice, is concise). It is also true that function of the relation with the real world comprehension may be fragmentary (telephone number from a directory) or global (a message). The reading strategies function of the purpose in reading are: skimming, scanning, intensive reading, and extensive reading.

Starting from the type of text we use, we may require the testee to carry out a certain task.

Point to Ponder Novice teachers are nearly always surprised by the results of evaluation; it is not easy to guess who is learning and who is not.

6.2.10.1 Narrative Text. Reading for Pleasure

• Sequencing a series of pictures/ statements to reconstruct a plot • Drawing • Assembling a number of lines

6.2.10.2 Reading for Information

• Drawing a map • Labeling • Completing a table

6.2.10.3 An Instructive Test

• Sequencing • Following instructions

Page 134: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 121

6.2.10.4 Types of Test Procedures

• Open – ended questions; rejected by the second generation testers because of their reduced objectivity and practicality; accepted by some testers for their authenticity and fragmentary comprehension

• True or false, same or different, yes or no questions (authentic, realistic)

• Multiple- choice questions advantages, objectivity, reliability, practicality)

• Global multiple questions (an answer is given after scanning the whole text)

• Information transfer e.g. transfer information given by a picture map • Matching vocabulary • Cloze/ discourse cloze • Translation. Drawbacks: poor practicality, negative washback

effect/ subjective, time consuming • Testing reference skill and study skills

SAQ 3 Match the letters (i.e. the strategies) with the figures (objectives / tasks) I. a) responding b) summarizing; c) skimming; d) scanning; e) note-taking; f) outlining. II. 1. The test-taker must: • locate a date, a name, or place in an article; the setting for a story; the principal divisions of a chapter; the cost of an item; • comprehend labels, headings, numbers, and symbols, making inferences that are not presented overtly. 2. What is the main idea of this text? What is the author's purpose in writing the text? What kind of writing is this? What do you think you will learn from the text? 3. Write a summary of the text. 4. In this article.................., the author suggerts that .................... Write an essay in which you agree or disagree with the author's thesis. Support your opinion with information from the article or from your experience. Write your answers in the space provided and compare them to those in the “Answers to SAQs” section at the end of the unit.

6.3 Testing Writing

Writing appears to be the most complex, the most variable, and

perhaps the least urgent of the four main language skills. However, for virtually all students at all levels, writing is a skill they simply cannot ignore. Writing is often of crucial importance. Judgment on the performance of the learner may have consequences for him/ her, such as exclusion into a specific discourse community.

Page 135: Tests and Evaluation -Metodology

Testing the Language Skills II

122 Proiectul pentru Învăţământ Rural

Writing is also the means through which assessment and testing of learning regularly take place. Writing is an important skill for a learner in supporting other learning experience. It is obviously the major means of recording, assimilating and reformulating knowledge, and of developing and working through his/ her own ideas. It is also a means of creativity and of self-expression.

6.3.1 Conditions under which Writing Takes Place We cannot think merely of products, but also the processing

conditions. Those conditions include the motivation and proficiency of the writer, the topic, the intended audience and the amount of time available. Physical processes • you put your fingers on a pen or keyboard • environment: you are sitting at a desk • you have to have a lot of information before doing it (different from

reading, when knowing is a result of doing it) • you need more energy • your ego is much more at risk • you expect writing to have more aesthetic qualities • you have to observe a number of cultural conventions • instructions are complex and compulsory • the main types of writing are expected to be more polished and

more permanent • writers can go back and forth and can use dictionaries and

reference materials • written language is more complex, more formal, more concise, and

uses less frequent vocabulary • written language has orthographic and punctuation rules • writer’s emotions are “translated” indirectly into words on page • writing lacks the social immediacy and urgency of speaking • writers feel lonely • the writer must think of an imagined reception audience • writing implies creation

Point to Ponder The invention of writing…has had a greater influence in uplifting the human race than any other intellectual achievement.

James Breasted, The Conquest of Civilization

How Do We Write?

Writers compose on the basis of their schemata: • Linguistic • Textual that affects quantity and quality (background knowledge) • Event (knowledge of the order of events) • Strategic (strategies for how to compose)

Page 136: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 123

• A level of print schemata is also involved (spelling, punctuation) • In second language acquisition, writing correlates with the degree

of language proficiency • Reading and writing are correlated

In other words, when you sit down to write something simple, you activate the schemata (ideas, information, and organization) regarding the topic. Sometimes you have to activate various parts of the schemata. If you find a lot of information, you do not just copy off everything. It is the turn of your discourse schemata to be activated that gives you possible ways of writing essay answers and your event schemata to decide which order the events occur. You also need strategies for impressing the audience or to decide how to use your time, and how to control your excitement. As a foreign learner of English, you also have to pay attention to the structure of the language (vocabulary, idioms, the correct level of formality, spelling).

6.3.2 Current Theories of Writing with Particular Reference to Foreign Language Writing

Current theories to writing may be classified into: • Linguistic • Cognitive • Social

function of the particular emphasis given to the text, the writer or the context (readers).

Two of these perspectives on writing are important in language teaching. Writing is seen both as a product and as a process.

6.3.2.1 Writing as a Product Writing means, among other things, the output of the activity of

writing i.e. a static text, visible on paper, isolate in time and place from the writer. Writing as a product has been analyzed in many ways: • Comparisons of L1 and L2 writings • Length (fluency) • Accuracy of form (error) • Effectiveness (quality) • Structure

Many of the techniques of analysis of written texts derive from Discourse Analysis. In 1966, Kaplan introduced the concept of cultural variation in “thought patterning” in L1, which is also important for the acquisition of L2 writing. As a result, research has been carried out on: • Distribution of information • Inter-clause relations to measure patterning of the whole text level Other differences between L1 and L2 texts: • Differences in the ways clauses are sequenced to built up

argument structures • Organization and elaboration of structures • The way in which readers’ requirements are met through topic –

signaling and attention – getting devices

Page 137: Tests and Evaluation -Metodology

Testing the Language Skills II

124 Proiectul pentru Învăţământ Rural

Stylistic features of L2: • Relative inconsistency • Inappropriateness or limitations in variety of style and tone • Specific morpho – syntactic – lexical - semantic features • The nature and frequency of clause connectors • Types of modification • Occurrence of passives • Cohesion • The use of collocations

SAQ 4 Which one of the following statements is most applicable to the selections of distractors for multiple – choice items? A. distractors should be unequivocally false B. one should avoid tricky distractors based on misconceptions C. distractors should be attractive to the uninformed D. distractors should be heterogeneous Circle your answers in the space provided and compare them to those in the “Answers to SAQs” section at the end of the unit.

6.3.2.2 Writing as a Process

Writing as a process focuses on the process of producing text i.e the activity of transforming ideas to written text. According to this view, writing is a complex cognitive activity, involving the use of a range of problem – solving strategies and composing processes. Research tools indicate the verbal protocol which is an analysis of recorded verbalization of people in thinking aloud while writing, observation and post-interviewing techniques. The resulting model consists of the following components:

• The writer’s long – term memory (knowledge of the topic,

audience) • The task environment (the assignment, topic, audience, exigency,

the text produced) • The processes themselves i.e. planning (generating ideas, goal

setting, organizing), expressing ideas in verbal form, reviewing (reading and editing)

The model is interactive. Both writer’s internal resources and external context interact with composing processes.

6.3.2.3 Writing as a Social Activity

Writing is an act of communication between the writer and reader within an external context. The important thing here is the interaction between producer and receptor in terms of common schemata and situational context.

The main components of writing are summarized in a diagram presented by Raines (1983) in Techniques of Teaching Writing, Oxford University Press

Page 138: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 125

6.3.3 The Main Approach to Teaching Writing. Text – Based Approaches 6.3.3.1 Grammatical Form Practice

This approach follows the requirement of the Audio-Lingual Approach that considers writing as a way of reinforcing other language skills. It requires: • Practice of syntactic and morphological patterns that can be

isolated • Reinforcement of target situations is achieved through: sentence

completion, combining texts, gap filling, manipulation and imitation activities using mode paragraphs that contain the selected structures

Its objective is formal linguistic accuracy. Appropriateness to context and self – expression are corrected by the teacher.

This limiting sentence – bound approach was replaced in the 1960s by a discourse analysis approach as “current traditional rhetoric”. Although the aim is still structural, this approach emphasizes the formation of habits in writing: paragraphs patterns and sequences of units of meaning over longer stretches of discourse. Rhetorical categories (e.g. description, narration) and functions are practiced by constructing and manipulating discourse forces through completion type tasks, topic sentence, paragraph development exercises. Point to Ponder The study of languages ... should be joined to that of objects that our own acquaintance with the objective world and with language … may progress side by side. For it is people we are forming and not parrots.

Comenius, 1957

6.3.3.2 A Communicative Approach

In a communicative approach to writing, the emphasis is on

message forms. The goal is purposeful interaction. Its objectives include: • Appropriateness to the purpose of communication i.e. content • Appropriate techniques • Information/ opinion • Information transfer exercise (from visual to text) • An emphasis on real time, holistic practice, risk taking strategies,

free practice

6.3.3.3 Writer – Based Approach It is based on the idea that writing is a non – linear, explanatory

and generative process. It focuses on writers’ efforts to formulate and communicate ideas. It involves problem – solving cognitive activities, using strategies of goal –setting, idea generation, organization, drafting, revising and editing. The main characteristics are: • A text is worked and reworked through a number of draft versions • Writing is a collaborative activity to be shared within the classroom

Page 139: Tests and Evaluation -Metodology

Testing the Language Skills II

126 Proiectul pentru Învăţământ Rural

• The teacher is an advisor rather than tester However, in a successful attempt to teaching, we should not

forget that there is no process without product, and no product which has nor arisen out as a process. Both approaches should be considered in a practical approach to writing in L2.

Writing is fairly easy to elicit and reasonably easy to grade, at least linguistically. But testing writing presents some important issues. Selecting fair topics is one difficulty, but grading is more serious. You must decide the relative weight of: • Mechanisms • Structures • Vocabulary • Fluency • Accuracy • Communication • Organization • Total amount of information

6.3.4 Various Choices of Writing Tasks • Controlled writing. You can have learners manipulate sentences,

expand, edit, summarize • Completion or synthesis. You can begin a passage and have

testees complete it. Or you can present paragraphs and ask students to sort, arrange and expand them.

• Simple stories, dialogues, or descriptions. You can ask learners to develop written material.

6.3.4.1 Scoring Essay Type Tests

With the exception of the oral test, the essay is the oldest test format in use today. The distinctive features of the essay question are: 1. the examinee is permitted freedom of response 2. the answers vary in degree of quality of correctness

Advantages of the essay

1. it is relatively easier to prepare an essay test than to prepare a multiple-choice test

2. it is the only means that we have to assess the examinee’s ability to compose an answer and present it in effective prose

3. it tests the pupils’ ability to supply rather than select the correct answer

Limitations of the essay

1. their poor content sampling 2. their low reader reliability 3. the student does not always understand the question and therefore

is not sure how to respond 4. reading essays and grading them is time consuming and laborious

Why Are Essay Tests Still Popular? 1. Essay tests can indirectly measure attitudes, values and opinions 2. Good essay tests are more easily prepared than objective tests

Page 140: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 127

3. They provide good learning experiences 4. Essay tests require the student to express himself logically,

coherently, and good English 5. Essay questions should be restricted to advanced foreign learners

of English It is very difficult to score essays. Many approaches have been

tried ranging from skimming the response for a quick estimate of its worth to assigning points for specific bits of information. Regardless of the method of scoring used, the emphasis should be on the ideas presented, relationships developed, judgment made.

If factual information is desired, use selection type items or completion – items. If more questions are to be scored, score the first question on each paper before proceeding to the next. When all the items are scored, the points assigned can be summed for each paper. In order to increase reliability, ignore students’ names.

There are two methods of scoring essay responses: • The sorting method • The point score method

Sorting Method • Decide on the number of groups to be used ( for example 5) prior

to scoring the papers • Read the papers • Sort the papers and place them into several groups ranging from high

to low • Resort the questionable papers or the border-line ones • Take care to see that the better papers in the “better” group are

superior to the top papers in the ”poorer” group • If a predetermined number if papers are assigned to each group

function of the size of the class, the number of points assigned to the question (for example if the number of groups is five then two papers may be assigned to the first/ top group, four to the second, eight to the third, four to the fourth, two to the fifth. The teacher is not required to conform exactly to expected distribution. The method establishes an expected standard measuring. All groupings should be used in order to increase the reliability of the examination.

SAQ 5 Imagine you are in a library. Find your reference materials in one of the sections of the library. Which of these items relate to:

a. study skills b. reference skills

1. Where might you look for a book about submarines?

a. section J; b. section G; c. section H 2. Where would you find the latest issue of Time magazine?

a. section A; b. section B; c. section K

Page 141: Tests and Evaluation -Metodology

Testing the Language Skills II

128 Proiectul pentru Învăţământ Rural

3. Where might you find a mystery about a teen-age detective? a. section D; b. section E; c. section F

4. Where would you find an atlas with maps of Africa? a. section J; b. section D; c. section C

Section A : Listening and Video- Tape Room Section B: Reading Room Section C: Reference Section Section D: Children’s Section Section E: Young Adult Section Section F: Biography Section Section G: Non-fiction Section Section H: Fiction Section Section K: Newspapers and periodicals 5. Read each of the questions that follow, and decide in which book would you look first to find the answer. Circle your answer: a. almanac b. encyclopaedia c. others d. Reader’s Guide to Periodical Literature 1. Where are some famous lighthouses located in North America 2. What is the current population of Vancouver? Your answers depend ou your personal experience. Compare your solutions to those in the „Answers to SAQs” section at the end of the unit.

6.3.4.2 The Point Score Method

The Point Score method of scoring essays is more reliable than the sorting method. However, in this case, validity may easily be sacrificed for an increase in reliability. Rules:

• Avoid selecting “facts” as scoring points • A graded key what contains the expected responses should be

constructed • The specific points for the reader to note must be isolated and

assigned relative weights • The teacher reads a response and assigns to it the appropriate

number of pints • At the end, all the points assigned to the questions can be added

to determine the total score for each paper • The method is useful for scoring the short – answer essay test

Example

Essay writing

You will have 30 minutes to plan, write, and correct your essay. Your essay will be graded on its overall quality

INSTRUCTIONS

Page 142: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 129

1. When the supervisor tells you to begin, read the essay question carefully.

2. Think before you write. Making notes may help you to organize your essay. Below the essay question is a space marked NOTES. Use only this area to outline your essay or make notes.

3. Write only on this topic. If you write an essay on a different topic, it will not be scored. Write clearly and precisely. How well you write is much more important than how much you write, but to cover the topic adequately, you may want to write more than one paragraph.

4. Write neatly and legibly. Do not skip lines. Do not write in very large letters or leave large margins.

5. Check your work. Allow a few minutes before times are called to read over your essay and make small changes.

6. After thirty minutes, the supervisor will tell you to stop. You must stop writing and put your pencil down. If you continue to write, it will be considered cheating.

Essay Question (30 minutes) Do you agree or disagree with the following statement? A zoo has no useful purpose. Use specific reasons and examples to explain your answer. Notes: …………………………………………………

Score Level Criteria Comments

30-27 EXCELLENT TO VERY GOOD: knowledgeable – substantive – thorough development of thesis – relevant to assigned topic

26-22 GOOD TO AVERAGE: some knowledge of subject – adequate range – limited development of thesis – mostly relevant to topic, but lacks detail

21-17 FAIR TO POOR: limited knowledge of subject – little substance – inadequate development of topic

CO

NTE

NT

16-13 VERY POOR: does not show knowledge of the subject – noin substantive – not pertinet – OR not enough to evaluate

20-18 EXCELLENT TO VERY GOOD: fluent expression – ideas clearly stated / supported – succint – well-organized – logical sequencing – cohesive

17-14 GOOD TO AVERAGE: somewhat choppy – loosely organized but main ideas stand out – limited support – logical but incomplete sequencing

13-10 FAIR TO POOR: non fluent – ideas confused or disconnected – lacks logical sequencing and development

OR

GA

NIZ

ATI

ON

9-7 VERY POOR: does not communicate – no organization – OR not enough to evaluate

20-18 EXCELLENT TO VERY GOOD: sophisticated range – effective word/idiom choice usage – word form mastery – appropriate register

17-14 GOOD TO AVERAGE: adequate range – occasional errors of word/idiom form, choice, usage but meaning not obscured

13-10 FAIR TO POOR: limited range – frequent errors of word/idiom form, choice, usage – meaning confused or obscured

VOC

AB

ULA

RY

9-7 VERY POOR: essentially translation – little knowledge of English vocabulary, idioms, word form – OR not enough to evaluate

Page 143: Tests and Evaluation -Metodology

Testing the Language Skills II

130 Proiectul pentru Învăţământ Rural

25-22 EXCELLENT TO VERY GOOD: effective complex constructions – few errors of

agreement, tense, number, word, order/function, articles, pronouns , prepositions

21-18 GOOD TO AVERAGE: effective but simple constructions – minor problems in complex constructions – several errors of agreement, tense, number, word order/function, articles, pronouns, prepositions but meaning seldom obscured

17-11 FAIR TO POOR: major problems in simple/complex constructions – frequent errors of negation, agreement, tense, number, word order/function, articles, pronouns, prepositions and/or fragments, run-ons, deletions – meaning confused or obscured

LAN

GU

AG

E U

SE

10-5 VERY POOR: virtually no mastery of sentence construction rules – dominated by errors – does not communicate – OR not enough to evaluate

5 EXCELLENT TO VERY GOOD: demonstrates mastery of conventions – few errors of spelling, punctuation, capitalization, paragraphing

4 GOOD TO AVERAGE: occasional errors of spelling, punctuation, capitalization, paragraphing, bet meaning not obscured

3 FAIR TO POOR: frequent errors of spelling, punctuation, capitalization, paragraphing, poor handwriting – meaning confused or obscured

MEC

HA

NIC

S

2 VERY POOR: no mastery of conventions – dominated by errors of spelling, punctuation, capitalization, paragraphing – handwriting illegible – OR not enough to evaluate

Total score Reader Comments

6.4 Summary

The assessment and testing of reading and writing, especially in a communicative – oriented classroom is a thorny issue. Because reading, like listening comprehension, is totally unobservable, it is important in reading as it is is in other skills to be able to accurately assess students’ comprehension and development of skills. The following overt response indicate comprehending: doing, choosing, transferring, summarizing, condensing, extending (providing an ending to a story), duplicating (translating), modeling (after reading instructions), conversing (engaging in a conversation that indicates appropriate processing of information. Six general categories form the basis for the evaluation of student writing: meaning, organization, content, vocabulary, discourse (sentences, grammar), syntax, mechanics (spelling, punctuation). This unit provides a wide range of procedures that can be applied to your own situation.

6.5 Key Concepts

• Schema theory • Skimming • Scanning • Silent reading • Reading aloud • Bottom – up approach • Top – down approach • Process vs product • Authenticity • Interaction

Page 144: Tests and Evaluation -Metodology

Testing the Language Skills II

Proiectul pentru Învăţământ Rural 131

• Correction symbols • Communicative approach • Writer – based approach

6.6 Checklist

Do students draw up checklists of criteria for success? Do your learners get frequent reinforcement, e.g. marks,

comments, praise, etc.? Does your reinforcement or recognition of success come as

quickly as possible after the student has completed the work? Are the standards you set seen as work achieving by your

students, as well as being achievable by them? Do you test regularly, and set well-managed deadlines for

students’ work? Are the questions realistic in terms of difficulty, time allowed the

student to respond, complexity of test Does the essay question establish a framework to guide the

student to the expected answer a) Is the problem delimited? b) Are descriptive words used (compare, contrast, define instead of

discuss, explain) SAA No. 3 1. Evaluate the paragraphs below: use the levels and criteria from the L2 composition profile. Fil in the description in

a. In the beginning of life there was no classroom, but we read about many people have a big deal of knowledge. There was no classroom told the first man in the world how to plan, how to build his huts. I read about many potteries that have good poems in the first and second centuries, they knew hoe these poems without any classroom. In ancient the women knew how to sewing there chesses without any teacher. b. I believe that we get more knowledge out side the classroom than we do inside. A classroom can give us only limited kinds of information. If we look at the beginning of civilization, foe example, we will note that people back then did not have formal classrooms, yet many of them were well informed. There were no classrooms to teach the first men how to plant or how to build huts. The great early poets of the first and second centuries didn’t learn their poems in a classroom, nor did the women find out how to sew their clothes there.

a. Meaning …………............................................ Organization ……….......................................… Content ……………........................................... Vocabulary ……….......................................….. Sentences ………........................................….. Grammar ………….........................................…. Mechanics ……...................................…........... Grade:

b. Meaning ………….............................................

Page 145: Tests and Evaluation -Metodology

Testing the Language Skills II

132 Proiectul pentru Învăţământ Rural

Organization …….........................................…… Content ………...........................................…….. Vocabulary ……........................................…….. Sentences …….........................................…….. Grammar ……..........................................………. Mechanics ………................................................ Grade:

Please note that the appropriateness of your evaluation of each paragraph will count for 50% of your grade. Try to give full descriptions, e.g. : Meaning: clear, confusing parts, clear presentation of the point of view etc. Do not forget to send your evaluation to your tutor in due time.

6.7 Answers to SAQs

SAQ 1 If your answer to SAQ 1 is not comparable to the one suggested

below, please reread section 5.3.6.2 again When testees read and listen there is nothing to observe as there is no overt behaviour.

SAQ 2 If your answer to SAQ 2 is not comparable to the one suggested below, please reread sections 2.4 and 4.2.11 again The resulting reduction in the range of scores reduces the reliability

of the examination. However, the teacher need not feel compelled to assign a 10 to the top paper or a 1 to the bottom one. This evaluation depends not only on the score designed to the paper but also on the teacher’s judgment of his/her work.

SAQ 3 If your answer to SAQ 3 is not comparable to the one suggested below, please reread section 6.2.10 again

1. d, 2.c, 3.b, 4.a.

SAQ 4 If your answer to SAQ 4 is not comparable to the one suggested

below, please reread section 4.2.2.1 again

The answer is C. SAQ 5 Your answer depends upon your personal study skills experience.

All of them are reference skills. 6.8 Further Readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 24-110 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press, pp 75-101, 116-134

Page 146: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 133

Unit 7 TESTING THE LANGUAGE SYSTEM AND BEYOND 7.1 Unit Objectives ......................................................................................................... 133 7.2 Testing Pronunciation ................................................................................................ 133 7.3 Testing Grammar and Usage .................................................................................... 138 7.3.1 Multiple- Choice Fill – In ........................................................................................ 138 7.3.2 Modify and Fill – In ................................................................................................. 138 7.4 Testing Vocabulary.................................................................................................... 140 7.4.1 Cloze ..................................................................................................................... 142 7.4.2 Multiple – Choice Fill- In Type ................................................................................ 142 7.4.3 Multiple – Choice Synonym Type .......................................................................... 143 7.4.4 Matching ................................................................................................................ 143 7.4.5 Simple Prompts ..................................................................................................... 143 7.4.6 Selection of the Words to Be Tested ..................................................................... 143 7.4.7 Translation ............................................................................................................. 143 7.4.8 True/ False ............................................................................................................. 143 7.4.9 Checklist Tests ...................................................................................................... 143 7.5 Testing Beyond Language Form .............................................................................. 144 7.5.1 Discourse and Culture ........................................................................................... 145 7.5.2 Speech events ....................................................................................................... 147 7.5.3 Literature ............................................................................................................... 148 7.6 Summary .................................................................................................................. 149 7.7 Key Concepts ........................................................................................................... 149 7.8 Checklist ................................................................................................................... 149 SAA 4 ............................................................................................................................. 150 7.9 Answers to SAQs ..................................................................................................... 150 7.10 Further Readings .................................................................................................... 150 7.1 Unit Objectives

Although teachers think that only language skills are usually of interest, most proficiency tests still retain a grammar or vocabulary section. It is believed that the lack of grammatical ability sets limits to what can be achieved in the way of skills performance. The same is true about vocabulary and pronunciation. Other tests also contain a literature or culture section.

The aim of this unit is to familiarize you with the different techniques of testing grammar, pronunciation, vocabulary, discourse, literature and culture.

7.2 Testing Pronunciation

While most language acquisition occurs because of natural processes within the learner, processes can be made more efficient by selectively focusing on language form, even at the expense of authentic communication, interaction, or integration. We can

Page 147: Tests and Evaluation -Metodology

Testing the Language System and Beyond

134 Proiectul pentru Învăţământ Rural

significantly accelerate and enhance learning a foreign language and make it more satisfying and cost – effective by knowing more about the linguistic system, and about the processes used by that linguistic system. One of the components of the language system is phonology.

Although research emphasizes that though teaching does not seem to affect the sequence in which language is acquired, it does seem to affect the rate. As conservative as this statement is, however, it hardly leads to the conclusion that formal instruction is of necessity worthless or wasteful. The lack of effectiveness in the teaching of pronunciation might stem very well from bad methods, aiming at inappropriate goals (e.g. native – like pronunciation instead of near -native – like pronunciation) compressing instruction into insufficient periods of time, placing too much attention on conscious manipulation.

Successful strategies:

• Use a top-down approach. Instead of beginning with the articulation of individual sound, newer methods emphasize the relevant features of pronunciation e.g. stress, rhythm and intonation. “The rhythm and intonation of English are two major organizing structures that native speakers rely on to process speech. Because of their major roles in communication, rhythm and intonation merit greater priority in the teaching program than attention to individual sounds”. (Rita Wong, 1987, 21, Teaching Pronunciation – Focus on English Rhythm and Intonation, Prentice Hall Regent)

• Spread instruction over a longer time • Stress the need to overcome psychological blocks: fear,

frustration, self-consciousness, self – image, and so on • Place emphasis on the overall discourse patterns

SAQ 1 True or false? Competence in culture and discourse are easy to measure/ test. Circle True or False and compare your answers to that in the “Answers to SAQs” section at the end of the unit.

Some of the microskills for listening comprehension (adapted from Richards) apply to pronunciation too: • produce chunks of language of different length • orally produce differences among the English phonemes and

allophonic variants • produce English stress patterns, words in stress positions, rhythm,

structure • produce reduced forms of words and phonemes

STRATEGIES

A TOP-DOWN

OR BOTTOM-UO APPROACH

MICROSKILLS

Page 148: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 135

As our goal is focus on clear and comprehensible pronunciation, we have to think of the factors that affect pronunciation learning: • the native language of the learner • age • exposure • innate phonetic ability • identity and language ego • motivation and concern for good pronunciation

Test devoted exclusively to pronunciation are rare today. This does not mean that pronunciation is not important. It means that it is evaluated with listening and speaking. Pronunciation tests today incorporate context and meaning.

It is normal for a teacher to try to know what his/ her learners are learning, and how well he/she will know what they are ready for. To a large extent, testing should go beyond phonology, but specialized prosodic levels can be useful. Correction should make maintaining self- image and motivation a high priority. In communication activities, correction should be indirect, and not interfere with the activity, In drills where phonology is the centre of attention. I advise you to give concise, clear and direct identification of the problem and request for repetition, both by the individual as well as by the group. You can take notes, of course, of students’ mistakes and mention these collectively, at the end of the activity, without identifying individual students or give this information individually.

Point to Ponder My voice goes after what my eyes cannot reach, With the twirl of my tongue I encompass words and volumes of words.

Walt Whitman

Pronunciation is normally not tested repeatedly except for specialized purposes. This list of techniques does not exhaust the procedures of testing pronunciation. Except for the first and the second, most of such testing must be done individually

• Discrimination tests. Testees listen and decide which sounds are

similar or different, correct or incorrect – are easy to score. In minimal pair tests, the testee must discriminate between two words that are identical except for the sounds being focused on e.g. sink – link, cheap – chip, sheep – ship, pin – pen.

Examples: You can use pictures while you say (keeping the intonation

identical in both cases): The sheep is in the lake. The ship is in the lake. The testee points to the right picture.

Write two words on the board labeled 1 and 2. Repeat the

words. Then ask testees to give you the number when you give

FACTORS THAT AFFECT PRONUNCIATION

CORRECTION

MINIMAL PAIRS

PICTURES AND OTHER TYPES OF TESTS

Page 149: Tests and Evaluation -Metodology

Testing the Language System and Beyond

136 Proiectul pentru Învăţământ Rural

additional examples: cheap, sheep, lip, chip, sheep, meet, skip, leap, ship, leap. Be sure to use the same intonation. You can also use triplets:

Cheap, chip, cheap Meat, meet, meet Ship, sheep, ship

Ask testees to pronounce the contrasting word. If you say

“ship”, they say “sheep” and so on. Then have them repeat phrases:

Cheap ship Meet the jeep Leap in the jeap Ship the chips

• Identification. Testees can be asked to indicate what they heard.

The listening task can be structured so that complex auditory skills can be assessed, such as the ability to hear contractions, reduced vowels, intonation patterns, and so on.

Example When all the three words are the same (AAA), sometimes only

the first and the second words are (AAB). Which is the correct answer? Ship sheep ship AAA AAB ABA ABC bad bad bat AAA AAB ABA ABC bed bad beard AAA AAB ABA ABC road rod nod AAA AAB ABA ABC

Then have them give full sentences. You might use the pictures from discrimination tests: The sheep in the lake. The ship is in the lake. They may make up sentences that contrast the sounds: The ship has a leak. The sheep has pink lips.

SAQ 2

Study the following questions and circle T (true) or F (false): 1. Written examinations were introduced because

oral examinations were found not to be valid. 2. “Objective” tests were introduced because it was

found that traditional techniques lacked reliability. 3. Because of their format, objective tests can be

assumed to possess reliability and validity. 4. Tests, unlike examinations, give accurate

information about a testee’s abilities. 5. By eliminating marker variability, validity is insured. 6. Some people are more variable in their

performance than others.

Compare your answers to those in the “Answers to SAQs” section at the end of the unit.

T F T F T F T F T F T F

• Repetition. Testees mimic the tester who then evaluates the

accuracy of specific components, including stress and rhythm.

Page 150: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 137

For initial or final evaluation of pronunciation, a recording of a passage that includes some of the following features: • General features: lax muscular control, central tongue position,

general articulation, stress and rhythm • Vowels: lengthening, diphthongization • Consonants: consonants contrasts, voiceless/ voiced pairs,

inflectional ending • Word stress, phrase stress, intonation (yes – no questions,

information questions series, etc)

Example Read the passage clearly and expressively into the microphone with your tape recorder set at record: Joe: Where are you going, Betty? Betty: Hello, Joe, I am going shopping. I’ve just moved and I need

some things for my room. Would you like to come with me, or are you going to work?

J: Thanks, I’d like to come. I want to buy a few things too. B: I’m going to look for chairs, a rug, and perhaps a picture. J: A rug? How big? Did you measure your room? B: Oh, no. I’m only going to get a little one. A big one would be very

expensive. I haven’t got much money. J: I haven’t either. First, let’s go to that old shop – the one near the

railway station. B: OK. My boyfriend told me that was a good place to start.

• Reading of isolated elements. Testees can be asked to produce

words or phrases from a list. Alone, this task gives little information about overall comprehensibility, but it can help pinpoint specific types of mistakes in articulation or stress.

• Reading of dialogues. Testees can practice and then read aloud natural discourse. The advantage is that everyone has the same task and it is easier to make comparisons among those being tested.

• Pronunciation in discourse. You can separately rate the various elements of pronunciation during the course of an interview: overall physical projection and upper body movement, stress and rhythm patterns, vowel reduction, articulation, and so on.

• Dictation. Dictation is also recommended for its ability to sensitize listening acuity. If you wish to make the task more specific to phonological issues, you can use material that includes minimal pairs, contractions, and allophonic variations, and so on.

• Intonation. The teacher models and then asks the testee to achieve a certain effect.

Example The window is open. – neutral statement

The window is open. – complaint The window is open. – request that the window be closed The window is open. – warning

OTHER PROCEDURES

Page 151: Tests and Evaluation -Metodology

Testing the Language System and Beyond

138 Proiectul pentru Învăţământ Rural

7.3 Testing Grammar and Usage

Although grammar is no longer seen as the goal or even primary means of language acquisition, a wide range of evidence seems very convincing that by our knowing about grammatical processes and structures, we can accelerate learning. It means we can enhance the process, make it more effective or efficient, prevent wasted energy or exclude learner unnecessary frustration.

Grammar tests are normally of the objective scoring type. Since there are so many grammatical patterns in a language, it is relatively easy to design reliable multiple choice tests of grammar. These often correlate strongly with other types of test, so there is evidence of their validity. But the ease of construction and the vast number of possible items are perhaps a temptation to over – use such items at the expense of more elusive communicative testing.

7.3.1 Multiple- Choice Fill – In

Many tests have sentences that look like sentences students have incorrectly produced with a blank where the mistake would be. The correct choice is one of the choices, of course, along with the common mistake, and the distractors/ additional mistakes that students might make or find tempting.

Example: Circle in the margin the letter corresponding to the form to complete

the following sentences when: A – is; B – has; C – does; D – no extra word 1. … Tom usually eat lunch at school? A B C D 2. Why … that man following us? A B C D 3. Who … always knows the solution? A B C D 4. John … never seen his cousin. A B C D 5. What … your mother do? A B C D 6. … he ridden that bike before? A B C D 7. What … that phrase mean? A B C D

7.3.2 Modify and Fill – In

Some tests provide blanks with uninflected forms in parenthesis. The student has to decide on the correct form and write it in. • In fill-in tests, testees must supply missing words of forms,

sometimes from a list. • Cloze passages are fill – in exercises, but the sentences are al

part of a context. In the following example, the passage requires a focus on verb tense and aspect. The choices would be provided in a separate list, or left completely up to the testee. Here is an easier version:

Example: Sylvia … (come) here about a month ago. She … (leave) her village

because her father … (die) and … (leave) her with a cruel step – mother and sister. Sylvia … (not hear) from her since she left. She … (live) in a flat with two roommates for the last three weeks. They … (make) her do most of the work while they …(go out) to have a good

Page 152: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 139

time. Last night she … (finished) her work early and … (go) to the local disco. She … (have) a good time but … (leave) at midnight because her feet … (hurt). Since then she … (find) more interesting things to do and so she … (feel) much happier now.

• In a choice drill, students make choices between alternatives.

The format is essentially that of a multiple – choice test. • An editing test presents students with sentences in or out of

context that have mistakes for them to correct. Alertness to erroneous forces in proofreading is extremely important. In real life we are often required to proofread. It can also be a problem – solving activity. Sometimes in such tests the mistakes are marked. Be aware that when testees have to find and also correct errors, the test can be much more difficult than it might seem to be.

• Patterned performance. Some tests require modifications such as transformation, deletion and expansion. Normally, papers would be graded individually.

• Translation. Testees translate sentences that have the target grammatical form. The responses are graded individually.

Scale of Grammatical Competence

Scale of Grammatical Competence (after Bachman and Palmer, 1983)

Rating Range Accuracy 0 No evidence of correct

morphological and syntactic structures

No control of sttructures Errors of all types

1 Limited range of morphologic and syntactic structures

Control of very few structures Many errors of all types

2 The same as above but with signs of systematic evidence

Control of some structures Many error types

3 Large but incomplete range of morphologic and syntactic structures

The same as above

4 Large but incomplete range of morphologic and syntactic structures

Control of most structures used Few error types

5 Complete range of morphologic and syntactic structures

Control of most structures used Few error types

6 Complete range of morphologic and syntactic structures

No systematic errors

Page 153: Tests and Evaluation -Metodology

Testing the Language System and Beyond

140 Proiectul pentru Învăţământ Rural

SAQ 3 True or false? Grammar has been the skeleton around which most testing has developed techniques. Circle T or F and compare your choice to that in the “Answers to SAQs” section at the end of the unit.

7.4 Testing Vocabulary

We are fully aware that teachers can effectively help learners expand both the size and sophistication of their vocabularies. But how many words is enough to use a new language at all well? Are words quite independent from the grammatical system and other schemata? These and other questions form the core of this unit.

Words are not really yours until you appropriate them by using them for your own purposes. In fact, schooling is little more than the acquisition of new concepts and the words that go with them. How many words you need is entirely dependent on what you are trying to do, with whom, under which circumstances. As you well know, words are distributed in such a way that a small number of words is very frequently used and the large number of words become increasingly rare as the size of vocabulary grows, Words are not evenly distributed. And furthermore, they do not all occur frequently. Not all words have the same number of meanings. The most common ones have the greatest number of occurrences and meanings. For example, the articles the, a, and an represent about ten percent of over one million written words. If we add about two hundred other structure words (like they, their and them) we can account for over 50 percent of all written words. A 92 percent figure is each with only 10,000 words. By 25,000 (half of all the words in the count), you have almost 98 percent of all the summing words. The number of words used in normal phone conversations by non – native speakers is said to run around 2,000. Some even say that 850 words can do the work of 20,000. It is said that a range of about 7,000 words is enough as an adequate beginning level for functioning in a United States university. It is obvious that for undergraduate programmes more words are needed.

Summarily, the frequency of words decreases geometrically with the size of the vocabulary. It follows that attention should be paid to words that appear frequently.

Words are also part of a complex network that goes from the phonological level to the level of background knowledge. Words are linked to grammar to their frequency, to other words of the same type, to words with opposite meaning and words that begin or end with the same letter or syllable, or with words that sound the same but have different meanings (homophones). It follows that a foreign language teacher has to select new words carefully and introduce them properly, with lots of contextual areas, build – in explanation, synonyms, and so on.

VOCABULARY AND STATISTICS

Page 154: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 141

You should know roughly which words students know in order to ascertain which materials to use ands how to use them. Or in other words you need t figure out roughly which new words students are ready for. You can do this quite easily informally, as well as with tests. Feedback (correction) also at times seems necessary. Adjust correction to fit the learner, the context, the social dynamics, and the time available, having as top priority maintaining incentive and self – esteem. That is, avoid emphasizing the fact that students do not know words or they use them inappropriately.

Various procedures are used to study words. They offer useful suggestions for testing. From among these we mention: • Reading or listening to loosely graded materials • Identifying and studying words • Introducing words in the experience process of other language

teaching activities • Introducing words through songs, poems • Introducing words through enriched short contexts (materials that

are not authentic) • Exercises – fill in, cloze, matching, complete the words, define and

translate • Word study (prefixes and suffixes, stems) • Use of dictionaries • Self-study activities (cards, computers)

Point to Ponder The gift of language is the single human trait that marks us all genetically, setting us apart from the rest of life.

Lewis Thomas, The Lives of a Cell

Why testing vocabulary?

• To find learners’ vocabulary size • Motivating the learner, encourage the learner by setting short –

term goals • Feedback for teachers • To evaluate progress • To compare vocabulary size before and after a language

programme Some rules have to be observed when testing vocabulary: • Avoid traditional types of vocabulary design to test of words that

are rarely used in everyday situations. Vocabulary achievement tests are highly valued for their backwash effect.

• Decide whether you want to test the learner’s active or passive vocabulary, the spoken or the written language

• In the case of beginners, focus on vocabulary deriving form the spoken language

IMPROVING VOCABULARY TEST SCORES

RULES FOR TESTING VOCABULARY

Page 155: Tests and Evaluation -Metodology

Testing the Language System and Beyond

142 Proiectul pentru Învăţământ Rural

The selection may be based on: • A syllabus • A frequency list: A General Secure List of English Words and The

Wright Frequency list (both of them are based on written language, no account is taken of difficulty levels or of areas of interference between L1 and L2)

• The students’ textbook or reading materials • Errors taken from the written work of the student • Besides quantitative tests, add qualitative ones (test whether the

learner is able to discriminate between words) • Avoid difficult grammatical structures when you test vocabulary

(words can be grouped function of their frequency and usefulness; a test should contain more frequent and useful words)

Vocabulary, like grammar, is so easy to test that it might be better to measure communicative competence, rather than to measure vocabulary. SAQ 4 True or false? T F The ten thousandth word in frequency might appear only once in over a million words. Circle T or F and compare your choice to that in the “Answers to SAQs” section at the end of the unit.

7.4.1 Cloze

Cloze is indirectly a vocabulary test. Example

Great Britain is an island that ...1… the Atlantic Ocean and the North Sea. It …2… the mainlands of England, Wales and Scotland. Ireland …3… the west coast of Great Britain. (Answers: 1. is surrounded by; 2. comprises, consists of, is composed of; 3. lies off Point to ponder Words are one of our chief means of adjusting to all the situations of life.

Bergen Evans

7.4.2 Multiple – Choice Fill In Type

The test item presents a complete sentence or short utterance within which the target word fits naturally. A blank is inserted for the target word, and three other words that might conceivably seem possible (to a non –native) are also provided. Students mark the answer sheet with the letter of their choice.

SOURCES OF VOCABULARY TESTS

Page 156: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 143

7.4.3 Multiple – Choice Synonym Type The target word is underlined in a sentence, and 4 choices are

provided. Students pick the one word which they think comes closet in meaning to the underlined word and mark the answer sheet accordingly. These tests are harder to construct and are more confusing than the fill – in type.

7.4.4 Matching

Some tests merely present a single word and then a list of four additional words. The testee picks the one word that matches the target word. While some object that this is artificial and a distortion of material processes, the evidence now seems to indicate that the more proficient readers are the more likely they are to respond immediately to words out of context.

7.4.5 Simple prompts Tests occasionally present pictures, words in L1, or definitions

and ask testees to supply the words in L2.

7.4.6 Selection of the Words to Be Tested • It is not easy to test all the words • General Service List – choose 60 out of 100 words which will be

used to represent the 2000 headwords • Exclude all the words that cannot be easily tested (a, the, of, be) • It is much easier to test nouns, adjectives, verbs, adverbs • If we use pictures, the selection is based on concrete nouns • Choosing the test items from the words left • Number the words and choose every tenth word

7.4.7 Translation Translation is a useful way of providing a quick check of learning. The learners can be asked to translate the underlined words – this makes possible to test words that we could not test with a multiple –choice test. The aim is to find which words in the General Service List were known and which were not. • 2,000 and 3,000 word level contains high frequency words • The 5,000 word level is on the boundary of high and low frequency

words • The 10,000 word level contains low frequency words

7.4.8 True/ False The words to be tested are put in sentences. If the tested word is not known, the learner will find it difficult to answer correctly.

7.4.9 Checklist Tests Checklists using some non - words should be used with caution

as learners with a small vocabulary overestimate their vocabulary. The method is unreliable with learners who are poor at spelling and with words having multiple meanings. Testing non – words in sentences is easy to prepare and score.

Page 157: Tests and Evaluation -Metodology

Testing the Language System and Beyond

144 Proiectul pentru Învăţământ Rural

Examples: 1. To test yourself on the vocabulary, fill in the missing letters in the incomplete words: A superstition is an untrue b- - - - f held by many p - - - - e based on fear of n - - - e. The ground hog s - - - y is one of the c - - - - - - t superstitions.

The matching lexical cloze is a similar type of test. The words are listed below. In a true matching lexical cloze the words are omitted according to a system. 2. Choose appropriate words from the list below to complete the passage. You may need to change the forms of some of the words. Capable, permit, privilege, employ, complaint Women in the United States were looked upon, for a long time, as being less … than men. This is the … why they were not … to have as many … as men. 3. Circle in the margin the letter corresponding to the most appropriate completion for the following sentences when A = back; B = along; C = through; D = out; E = off; F = up I liked the first volume, but I can’t get … the second. A B C D E F Be sure to get … the bus at the second stop. A B C D E F 4. Circle in the margin the letter corresponding to the phrase which correctly completes the sentence: The mother of your father or mother is your A B C D A. stepmother; B. grandmother; C. godmother; D. mother-in-law 5. Sets. Three of the four words in each line are similar in meaning or share some common features. Draw a circle around the word that does not fit: 1. conference, congress, meeting, ethics 2. collapse, dissipate, speculate, decay 6. Multiple – choice in context. Later I ... to them for my bad behavior. a. apologized b. applauded c. enquired d. entertained

7.5 Testing Beyond Language Form

As you know, language is inextricably intertwined with information, culture, and products of various kinds including literature. Although these concepts are usually included within areas of language skills and language system, they do merit a focus of their own. If they are interwoven, we cannot separate them anyway, and by focusing on them we are teaching language in any case.

Furthermore, it is quite impossible to acquire a new language and not a new culture. Culture offers some students an incentive: to

Page 158: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 145

find out how other people think and live. Cross – cultural and even multicultural learning is considered in our days to be highly desirable. Point to Ponder Were Shakespeare suddenly to materialize in London or New York today, he would be able to understand, on the average, only five out of every nine words in our vocabulary. The Bard would be a semi-literate.

Toffler

7.5.1 Discourse and Culture

In its turn, discourse is not only a linguistic property, but a socio-linguistic and cultural component as well. Discourse at a simple level includes how people select, arrange and time utterances in order to produce certain effects in those they talk to. In all cultures, people talk in order to get things done. And the things that try to get done fall into the same general categories. Finacchiaro and Burnfit (1983, 65 – 66) identify 5 broad functions, each in turn, containing several other functions: • Personal. People carify and express how they are feeling • Interpersonal. People use speech to initiate, interrupt, and end

conversations and to negotiate many other social functions: complimenting, apologizing, offering, accepting, refusing invitations, agreeing and disagreeing

• Directive. People attempt to influence others and to respond to the attempts of others to influence them

• Referential. People exchange, compress, summarize, and list information.

All cultures have structured conversations that probably have the same rules. Grice’s maxims are observed in all cultures: be just as informative as you need to be, say things that are true, say things that are relevant, say things clearly, briefly, and in an organized manner. But beyond these similarities, cultures have very different types of speech acts. There are also differences between the sexes and the different members of the same culture. Anyway, all learners of a foreign language should know that: • Discourse is shaped by your view of yourself and your

understanding of what right you have to express your opinions to others

• Discourse is shaped by social relations and topic, and how that topic can or cannot be handled

• Discourse is timed, and different cultures have different guidelines for determining the lengths of pauses, interruptions, length of talking, and who gets more time, whom to talk

• Discourse occurs in physical space and where people are positioned and how far apart they are

• Discourse follows particular conventions (how people begin a speech act, and how they close it)

GRICE’S MAXIMS

DISCOURSE FUNCTIONS

Page 159: Tests and Evaluation -Metodology

Testing the Language System and Beyond

146 Proiectul pentru Învăţământ Rural

All these rules are learned by experience, observation, and individual hypothesis testing. Subtle and complex features have to be taught, the wrong rules can be learned but never unlearned. Sp explanation seems a necessary strategy. Explanation should be preceded by observation and followed by meaningful practice.

Discourse functions are normally taught explicitly and by means of dialogues. Emphasize first recognition and comprehension, provide selective explanation, and then many additional opportunities for continuing observation and appropriate responses. When you teach functions, begin with a short and interesting dialogue, illustrating the functions e.g. greeting, apologizing. After the dialogue, offer an explanation (when, why, and how people express these functions). Further examples are provided. Students practice the dialogue material; have discussions to react to and interpret the functions, acting them with the same function in their native language. Finally, they are asked to use the new function (a variety of prompts is provided: pictures, unfinished dialogues, practice tasks).

Testing is either by means of tests of knowledge or by recognition of appropriate responses (in a multiple – choice test) by correctly responding to a prompt, or by performances. When we teach and test culture we may: • Introduce cultural concepts that include information about the

culture, and require students to solve problems within the possibilities the culture has available. Other approaches: values – clarification activities (students are given a situation: a conflict between going out with family and getting homework finished)

• Turning in a classmate who is cheating. The discussion should remain open-ended.

• Role – play. Students are asked to take roles, some being the stereotyped group (“the outsiders”) and the others are the stereotyping group (“the insiders”)

Facts about a culture can easily be approached as content

(how the Smiths celebrate a birthday). Facts are easy to list and quality, therefore, are easy to test.

Behavior can indirectly be tested with regular tests, although such testing does not at all guarantee that the learner would actually behave in a comparable way in real life.

Interview, role playing and simulation also offer opportunities for you to guide students. But practically speaking, setting up a role play with the purpose of giving a grade can make you put a lot of weigh on the one or two behaviors. The amounts of time required, and the risk that testing is not effective, which leaves us with few valid, reliable; and cost-effective ways of measuring what seems to be among the most significant of our objectives. Language testing can also include the higher level of functions, content, literature and culture.

Example: 1. What is the difference between the United Kingdom and Great

Britain? 2. What does the Union Flag stand for and how should it be flown? 3. Does Britain have a National day?

Page 160: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 147

4. How do the British celebrate traditional and religious holiday? 5. What and when are bank holidays? 6. What is Pancake Day? 7. What is Guy Fawks Night? 8. What is the significance of the poppy and when is it worn? 9. What are Britain’s national flags? 10. What are the most common superstitions in Britain? 11. What is the most popular food in Britain? 12. Why do the British like drinking tea? 13. Why do the British like going to the pub? 14. Why is the Tower of London so popular with tourists? 15. What is Speakers’ Corner? Point to Ponder Novice teachers are nearly always surprised by the results of evaluation; it is not easy to guess who is learning and who is not.

Testing discourse analysis implies assessment of the quality of

coherence deriving from an interaction of text with given participants i.e. participants’ knowledge and perception of paralanguage, the situation the culture, the world in general and the role, intentions and relationships of participants, the study of cohesive devices (participants, the study of cohesive devices, pronouns, ellipsis and conjunctions, differences between the written and the spoken form. Discourse is more difficult to test than other areas.

Example: Test lexical cohesion by synonyms: each gap should be filled by a synonym or near synonym of the italicized word in the first sentence. 1. These animals live in rainforests. They are beautiful… 2. Have you seen this gadget for cleaning combs? It’s an excellent

little …; you should buy it. 3. New moves are afoot to stamp out tax evasion. The … are

intended to stamp out tax evasion. 4. Mr. Smith said an altercation had arisen between himself and Mr.

Jones. He said the … was over a bill. Answers: 1. creatures/ beasts; 2. device; 3. measures; 4. argument/ quarrel

7.5.2 Speech Events Learners of foreign languages are often asked to perform on

certain specific occasions and those occasions have culturally specific forms: learners give class presentations and attend lectures. Each of these has a structure, determined by culture. To perform successfully, those patterns must be followed.

The theory postulates necessary conditions for particular acts. In an order, for example, the speakers must refer to a possible future action by the addressee and must have the right to give orders; the addressee must have the obligation to do the action e.g. You ought to tidy up, the result is the act of ordering.

COHESION

Page 161: Tests and Evaluation -Metodology

Testing the Language System and Beyond

148 Proiectul pentru Învăţământ Rural

7.5.3 Literature Literature is the collection of products, usually written down,

that is valued for its aesthetic, rather than its informational content. The assumption that literature is the ideal basis for both cultural and linguistic learning is often disagreed on. Nowadays, the writing of literature is not seen as utilitarian, or as functional. The themes of novels and plays certainly take place within cultures, but can hardly be said to be members of the culture. A story that made all aspects of culture explicit would strike most readers as tedious and unrealistic. Writers do not write for non – native speakers. Nevertheless, despite changes of approach, doubts about pedagogic validity and even doubt about its distinct existence as a discourse type, literature continues to be popular with students and an inexhaustible resource for the language teacher. Traditionally, literature has occupied a central position in the teaching of English. It still seems as a model of the “best” language. Language learning in which literature is central inevitably focuses more upon the written than the spoken language. The intrinsic value of literature, and the fact that it does provide interesting and authentic use of the language, has guaranteed it continued prominence.

If you teach literature or teach language through literature use it in the way that the author intended. If you use it for linguistic or cultural analysis, do not observe the intentions of the author. Select literature that your learners will respond aesthetically, personally and emotionally. Select material that is within the right range of difficulty. Use the procedures employed in the teaching of other skills: pre- reading, silent reading or listening, social interaction, integration of the four skills, and appropriate focus on language. Include writing as a possible response. Literature continues as an internationally recognizable discourse in spite of fashions and cultural differences. It allows people to gain insights into other cultures while also appreciating the universality of human nature. It also allows you to enjoy a universal pleasure in language art. All these factors have ensured that literature teaching has survived. It continues strengthened rather that weakened by the current debates. Point to Ponder Do not underestimate the motivating effect of an anticipated test.

When you teach culture, you teach about a culture (family, size,

customs, holidays, and educational system), attitudes towards a culture, and behavior appropriate for a culture (how to behave in a family, act during a ceremony, etc). When we think about teaching a culture we have to think of what is essential to teach in order to survive in the respective culture. Becoming “native” would take many years of effort. In other words, choosing the right things to teach seems more important for culture than it does for language.

HOW TO SELECT LITERARY TEXTS IN TEFL

TEACHING CULTURE

Page 162: Tests and Evaluation -Metodology

Testing the Language System and Beyond

Proiectul pentru Învăţământ Rural 149

7.6 Summary

This unit explained how to assess mastery of the subskills of English i.e. to test how well each component has been mastered as a subskill of the four main skills. Of course, tests of grammar, vocabulary, pronunciation do not show exactly how well a person uses English, but they can help teachers identify students’ strengths and weaknesses in oral or written communication. Choosing which procedure to use depends on the learner’s age and language ability as well as on the kind of skill being taught. It is true that tests devoted exclusively to pronunciation are rare today. This does not mean that testing pronunciation is useless. It simply means that this subskill is assessed in conjunction with listening and speaking, incorporating context. Pronunciation items can be useful as they may measure progress made on specific points of pronunciation.

7.7 Key Concepts

• Top-down • Bottom-up • Discrimination test • Minimal pairs • Intonation • Modify and fill – in • Discourse • Grice’s rules • Cohesion • Coherence • Speech events • Sheltered academic programme • Culture

7.8 Checklist

Do all students achieve some success and get some reinforcement?

Do you ask your students to set themselves tasks? Do you adopt assessment methods which do not rely exclusively

on written assessments? Do you ask questions equally of males and females? Do all your students get some measure of success in their

learning? Does this success get quickly reinforced? Do you encourage self – evaluation and student responsibility? Do your homework assignments combine maximum learning value

with minimum marking effort? Are you rigorous about setting colleting and marking homework?

And do your students’ parents know this? Do you use homework marks for report, or record cards?

Page 163: Tests and Evaluation -Metodology

Testing the Language System and Beyond

150 Proiectul pentru Învăţământ Rural

SAA No. 4 Give your learners three tests of one grammatical feature e.g. form and use of personal pronoun, a multiple – choice test, a cloze test, and ten sentences to translate into English. Make graphs of the number of errors made by the learners on each test. Repeat the tests in a different order for another feature e.g. Past tense versus Present Perfect and examine the results as well. Write an analysis of what this experiment has revealed about the relative difficulty and discriminatory power of the tests, and the most persistent problems for the students who are learning these features. Send your paper to your tutor.

7.9 Answers to SAQs SAQ1 Your answer depends upon your personal teaching and learning

experience.

False. Cultural and discourse competence are very difficult to measure. One reason is that the number of “items” is smaller, but the bigger problem is that attitude and behavior are difficult to assess.

SAQ2 Your answer depends upon your personal teaching and learning

experience. You may also read sections 4.2.1.2 and 4.2.1.3

1,2,3,4,5,6,7 – True SAQ3 If your answer to SAQ 3 is not comparable to the one suggested

below, please reread section 7.3 again True

SAQ4 If your answer to SAQ 4 is not comparable to the one suggested

below, please reread section 7.4 again

True. Words drop off sharply in the probability they will be encountered with decreasing frequency. 98% of words in learned writing are among the six to ten thousand most frequent. The 50,000th frequent word can be expected to appear only once in about a million words of running text.

7.10 Further Readings Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 110-118 Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press, pp 141-152

Page 164: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 151

Unit 8 NEW TRENDS IN TESTING 8.1 Unit Objectives ......................................................................................................... 151 8.2 General Trends.......................................................................................................... 151 8.3 Computer- Based Language Testing ........................................................................ 152 8.4 Alternative Assessment ............................................................................................ 156 8.4.1 Techniques ............................................................................................................ 156 8.4.2 Journals ................................................................................................................. 156 8.4.3 Conferences .......................................................................................................... 157 8.4.4 Cooperative test construction ................................................................................ 157 8.5 Portfolios ................................................................................................................... 157 8.5.1 Characteristics ....................................................................................................... 158 8.5.2 Assessing Portfolios .............................................................................................. 159 8.5.3 Portfolio Content .................................................................................................... 160 8.5.4 Useful advice on development of portfolios ........................................................... 161 8.6 Summary .................................................................................................................. 162 8.7 Key Concepts ........................................................................................................... 163 SAA 5 .............................................................................................................................. 163 8.8 Answers to SAQs ..................................................................................................... 164 8.9 Further Readings....................................................................................................... 164 8.1 Unit Objectives

This unit tries to identify the main trends in assessing and testing learners of English as a foreign language. By the end of this unit you will:

• be aware of the advantages of using qualitative methods of

assessment • be able to adopt and apply alternative procedures of assessment • appreciate the value of assessment as a process • be able to assess learners’ portfolios • be aware of the advantages of using computer-adaptive testing

8.2 General Trends

There is extreme resistance to change in regard to language testing. Testing is conservative as it concerns institutional standards, norms and society. In spite of this, there are signs that • a closer relation between second language acquisition studies and

language testing will take place • testing will provide a very useful link for second language

acquisition and Applied Linguistics • elaboration of more scientific criteria for language tests • more attention to validity than to reliability

Page 165: Tests and Evaluation -Metodology

New Trends in Testing

152 Proiectul pentru Învăţământ Rural

8.3 Computer- Based Language Testing

Definition: A computer – based language test is a test that is delivered and scored by computer. The computer needs to be able of judging whether or not a particular response is correct.

Computer – Based Language Testing has tended not to keep pace with those in CALL (Computer Assisted Language Learning) because of the difficulty in programming the computer to deal with open – ended input.

In spite of this, the computer plays a significant role in language testing. It provides • a user friendly testing environment to the test candidate • a variety of options within a test • a way of recording information that both assess linguistic performance

and help in identifying a candidate’s test – taking strategies The computer may be used: • as a testing device, especially for informal classroom test • as a tool for research into test-taking and language – learning

strategies

Limitations • the quality of language which is produced during a direct test of

oral production can not be assessed by the computer. • a computer can not cope with grammatical analysis above the

sentence level or with semantic analysis • direct test of written production is limited to items which require

relatively short, predictable responses Advantages • it can stimulate oral production by using a simulation with a group

of learners • it has characteristics of speed memory , patience and flexibility • there are two ways in which these characteristics can be exploited

to allow computers to provide computer-adoptive tests (tests which the computer adapts to the individual)

o learner-adaptive testing implies immediate feed-back, a second choice at a question if their first answer is wrong, access to a dictionary or glossary and clues. The number of letters in the word that is the correct answer given, one or more letters are given in various positions, reference to other information in the text, explanations

o by providing learner-adaptive tests (tests adapted to the individual candidate), the testee can select from a list of 9 different item types (multiple choice, gap filling, transformation, correction, insertion, deletion, identification, organization, matching)

• it can be programmed to accept alternative responses, mis-spelling, to carry some syntactic analysis of responses

Page 166: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 153

• Self-assessment tests in which candidates are asked to say whether they think they know the answer to a question or how well they think they would be able to perform a test, offer feed-back to the teacher/test constructor, reduce learner alienation from the testing process, provide evidence of progress in the area of self assessment. The use of self-assessment procedure has an important role in student motivation. Self-assessment and conventional procedures can be calculated, compared, and presented immediately.

Features that are characteristic to a teaching programme should have a place in a test because • it is difficult to choose between a teaching activity and a test (it is

obvious that learners should learn from both) • take-away tests are becoming increasingly common • a candidate’s reference to the help facilities can be recorded (in

this way we gather information about strategies, the state of candidate’s knowledge)

Other information that can help research • the time taken for the test • the order in which candidates answer questions provides

information about test – taking strategies and candidate’s processing, about the difficulty a candidate is having with different questions function of the number of times a candidate considers different questions before answering them

Conclusions about computer assessment • The testees enjoy computer-based tests more than paper – based

ones • Testees consider that the computer – based tests are more useful

(immediate feedback and the second try facility are more useful) • Testees feel more relaxed • Computer-based tests measure more than language performance • The results are influenced by past experience of using computers • Computers are at a point suitable for classroom tests • Computer- based tests tend to become more flexible and friendlier • The barriers between teaching and testing are being blurred • Computer-based tests provide access to other sources of

information e.g. about a candidate’s test performance and strategies which they use to achieve performance

Three examples of computer adaptive tests • Decision point tests – these tests are constructed with items of

difficulty limited to the task difficulty at the decision cut-off points • Step ladder tests – on these tests precalibrated items are

clustered at a series of graduated difficulty steps • Error-controlled tests – these tests are distinguished in that

examinee ability is re-estimated using an appropriate ability estimation algorithm after each item is evaluated

Page 167: Tests and Evaluation -Metodology

New Trends in Testing

154 Proiectul pentru Învăţământ Rural

SAQ 1 What kind of tests are these questions specific to: 1. This is a test about the use of the Past Tense and Present Perfect in

English. Put a cross on the number which shows how well you think you can use these tenses:

• I make few mistakes when I use these tenses. 10, 9, 8, 7, 6, 5, 4, 3,2,1, 0 • I always make mistakes when I use these tenses. 2. Do you think you can answer this question correctly? Yes Not sure

No Write your answers in the space provided above (in no more than 20 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

The curriculum for the twenty – first century emphasizes • A shift from content and objectives to skills and processes • The empowerment of learners to act on their own • Focus on new knowledge, on the interdependence of knowledge

areas, and on the relevance of school knowledge to everyday problems

Implications for the assessment of learning A norm – referenced approach to assessment is no longer

suitable assessment of the quality of work. A curriculum context that encourages feedback focused on learning purposes and that values critical, reflective, interactive processes for development and improvement is more important. Ethical Issues of Critical Language Testing

Taking into account that psychometric traditions in testing are challenged by interpretative, individualized procedures for evaluating ability and that tests are undoubtedly embedded in culture and ideology, test designers have to offer new ways of testing for varying styles, abilities, and intelligences among test takers. Other challenged convictions • Standardized tests are not infallible in their predictive validity • Tests are culture- biased

Points to Ponder

• Tests serve as “gatekeepers” in society. • Tests are milestones in the journey to success.

Page 168: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 155

Recent developments in classroom testing prove that the mentality of educators about testing undergoes a process of change. These developments are determined by a broader view on the measurement of ability and with the development of more authentic testing rubrics. These changes derive from: • the research on intelligence by Howard Gardner and Robert

Sternberg who identified more than one type of intelligence (linguistic, logical – mathematical, visual – spatial representation, bodily – kinesthetic, thinking and two forms of personal understanding, intrapersonal) freed testing from exclusive reliance on time, discrete point, analytical tests in measuring language

• the above research conclusions on intelligence infused this field with a responsibility of “tapping into whole language skills, learning processes and the ability to negotiate meaning. Our challenge was to test interpersonal, communicative, interactive skill, and in doing so, to place some trust in our subjectivity, our intuition” (Brown: 404)

• as a result, more performance – based testing is involved in testing of typical school subjects in spite of the fact that they are time- consuming and expensive: open – ended problems, hands-on projects, student portfolios, experiments

• more and more interactive language tests i.e. test that assess while testees actually perform the behavior we want to measure i.e. involving test takers in speaking, requiring, responding, combining listening and speaking, or reading and writing. For example, Swain’s test battery includes paper – and –pencils multiple choice tests, oral communication skills and written proficiency; OPI (Oral Proficiency Interviews), a widely used interactive proficiency test is currently in the process of revision. The current period uses fewer and fewer de-contextualized tests in favour of alternative and more authentic means of testing. Brown offers a table that highlights the differences between the traditional and alternative approaches to assessment:

Differences between the traditional and alternative

approaches to assessment (after Brown)

Traditional Alternative One-shot standardized exam Continuous long-term assessment Timed, multiple- choice format Untimed, free-response format Decontextualized test items Contextualized communicative tasks Score: suffice for feedback Formative, interactive feedback Norm – referenced scores Criterion – referenced scores Focus on the “right answer Open – ended, creative answer Summative Formative Oriented to product Oriented to process Non – interactive performance Interactive performance Fosters extrinsic motivation Fosters intrinsic motivation

REASEARCH ON INTELLIGENCE

Page 169: Tests and Evaluation -Metodology

New Trends in Testing

156 Proiectul pentru Învăţământ Rural

8.4 Alternative Assessment

Alternative assessment includes self-assessment and peer- assessment. Self-assessment has the following advantages: speed, direct involvement of learners, learning autonomy, increased motivation. Disadvantage: subjectivity.

8.4.1 Techniques • Oral production: use of self checklists or peer checklists to detect

pronunciation or grammar errors • Listening comprehension: listening TV or broadcast tapes and

checking comprehension with a partner, asking for help when you do not understand something

• Writing: revising written work on your own, with a peer, proofreading

• Reading: reading passages followed by self-check comprehension questions, reading and checking comprehension with a partner, vocabulary, quizzes Other characteristics

• Alternative authentic assessment are varied and cohesive • It encourages multiple methods for demonstrating learning • It can promote learning opportunities beyond the classroom • Encourages students to develop skills, understanding, and insights

relevant to their particular needs and contexts • Make assessment fair by reducing the dependence on

performance in a single examination as the only determinant of student achievement and by giving individuals the opportunity to demonstrate attainment over time and in a variety of contexts

• Promotes complex thinking and problem – solving • Encourages students performance of their learning • Engages with issues of equity • To make assessment more accurate and reflective of an

individual’s learning and development by identifying the abilities being examined

• Knowledge is assessed in term of its constructive use for further learning rather than view it simply as a measure of achievement

8.4.2 Journals

Journals or a dialogue between teacher and students may include language learning logs, student grammatical discussions, reactions to readings, personal feelings, attitudes. You should: • Tell the learners how to get started, give them a model • Make the learners aware of the importance of journals, their content • Give directions about length of each entry • Collect, read and return promptly the journals • Make the feedback clear • Help learners to process your feedback

Page 170: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 157

8.4.3 Conferences

Conferencing implies a one-to-one interaction between teacher and student. The role of the teacher assumes the role of the facilitator and guide, an ally of the student. It develops in your learner self-reflection attitudes. These alternative types of assessment are formative looking forward towards further development.

8.4.4 Cooperative test construction Ask the testees about the things they have learned and that

should be in a test. Then ask them to formulate the actual test questions. The teacher makes them aware that the real test will contain some of the questions they have selected. The cooperation test becomes in this view a way to stimulate review and integration. SAQ 2 List and explain three advantageous features of a system for the machine construction of tests. Write your answers in the space provided above (in no more than 30 words) and compare them to those in the “Answers to SAQs” section at the end of the unit.

8.5 Portfolios

The increasing dissatisfaction with traditional, quantitative forms

of assessment has led to the development of alternative assessment approaches. The theories behind the use of the portfolio for both assessment and learning purposes are the constructivist learning theories (which see learners as actively making sense of new knowledge and deciding how to integrate it with previously held concepts) and Vygotsky’s notion of the zone of proximal development. The use of portfolios for both assessment and learning purposes provides opportunities for demonstrating learning and for the development of important learning dispositions, processes and strategies. Point to Ponder Lev Semyonovich Vygotsky (1896 – 1934) Russian psychologist, born in Orsha. He studied various social sciences at Moscow University, and turned to psychology when aged 28. This last decade of his life, when he was at the Institute of Psychology in Moscow (1924 – 34) was his productive period. His theory of cognitive development, especially his view of the relationship between language and thinking, have strongly influenced western

Page 171: Tests and Evaluation -Metodology

New Trends in Testing

158 Proiectul pentru Învăţământ Rural

psychology. He was open to intuition, had an undogmatic approach to experimental methodology, and moved easily between the pure and applied fields. He always emphasized the role of the cultural and social factors in the development of cognition. “Thought and Language” is now a classic text in university courses in psycholinguistics.

Portfolio use for assessment parallels the shift from quantitative

tradition of assessment to a more qualitative approach. In the quantitative tradition, curriculum is viewed as discrete units (listed as decontextualized objectives and possibly including facts, skills, competences and performance indicators). The teacher transmits units of knowledge to learners in a fixed time frame. The learner attempts to acquire the content transmitted by the teacher. Assessment is norm – referenced (based on multiple-choice or essay marks). Such assessment practices atomize knowledge. Portfolios offer an opportunity to address some of the limitations of qualitative assessment.

The significance of engaging students in the process of developing a portfolio of work is best understood in the context of this conception of learning. Teachers provide feedback to promote learning. In the portfolio process this occurs when the students collect, select and evaluate their own mark. Points to Ponder • What the child is able to do in collaboration today, he/she will be

able to do independently tomorrow. (Vygotsky) • Many people are increasingly likely to live so-called “portfolio

lives” constantly needing to update their skills and knowledge in order to take advantage of opportunities as they arise. Their skills will need to be transferable. This changing world will thus place much greater emphasis on individuals taking responsibility for reflecting on what they have already experienced, setting future learning goals and preparing plans in order to improve their contribution and their employability. (Report of the Steering Group of the National Records of Achievement Review)

• Whenever Sir Isaac Newton had a particular thorny problem, he worked on it just before he went to sleep. He said, “I invariably wake up with a solution.

The notion of a portfolio of work, developed overtime,

incorporating critical reflections and self-evaluation of what has been achieved makes for a more compatible assessment system.

8.5.1 Characteristics

• Reflection on the inclusions in the portfolio (assignment, written paper, test)

• Reflections on what has been learned (provide the opportunity for another level of analysis i.e. the extent to which intentions and purposes have been achieved)

Page 172: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 159

• The system empowers students to take responsibility for learning Portfolio use requires a constructivist pedagogy characterized by: • Opportunities to analyze learning • Teacher facilitation of learning • Group and pair work • Student teacher dialogue about student’s learning • Available support

All students need to acquire skills in self-assessment, continuing learning, self-evaluation and planning of the future work because of the radical changes of a global economic, social and political nature, implying necessary changes to assessment, pedagogy and the curriculum as described through the use of portfolios. All portfolios should contain “pieces of evidence” and the more relevant the evidence the more useful it is for evaluating the level of achievement. Developmental phases and uses of portfolio: 1. conceptualization of portfolios to support the learning process 2. construction and development of portfolios to support learning

processes 3. grading the portfolios: reliability, standards, summative

assessment or holistic assessment

Guidelines for using portfolios include purpose, how to get started (a sample portfolio from a previous student), indicate acceptable material, evaluate portfolios, give feedback, help learners respond to your feedback.

8.5.2 Assessing Portfolios • assess in a team • quality not quantity content • be clear about what you are assessing • structure your feedback • encourage creativity • provide opportunities for self-assessment • set up an exhibition • get students to provide structure clearly labeled and numbered for

easy reference • get students provide route maps carefully structured • ask students to answer to questions such as: After you have

completed your portfolio, what do you consider you did especially well, and what would you now do differently?

• Self evaluation is an integral part of portfolio assessment • Portfolios are used to supplement, not replace, traditional

assessment procedures • Make the whole portfolio process a collaborative teacher – student

effort (the teacher is a consultant to the student)

PORTFOLIO AND THE CONSTRUCTIVIST PEDAGOGY

Page 173: Tests and Evaluation -Metodology

New Trends in Testing

160 Proiectul pentru Învăţământ Rural

8.5.3 Portfolio Content

A portfolio: Contains evidence of a

student’s achievement,

skills,

accomplishments)

Journal and logs

Examples of written work

Videotapes of student

performance

Self evaluation

Mind map and notes

Charts, graphs

Questionnaire results

List of books read/summaries

Tests and quizzes

Audiotapes of presentations

Page 174: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 161

8.5.4 Useful advice on development of portfolios (after Ronald L. Pastin)

Useful advice on

development of

portfolios

(after Ronald L. Portin)

Help the parent examine the portfolio (make him/her aware

of evidence of progress and areas of needed improvement)

Ask students to reflect on which items are worth

including

Set a limited number of objectives

Organize conferences to

review students’ portfolios

Be sure each item is dated (to

assess the evaluation of

progress)

Develop your own teaching portfolio as a means of facilitating your professional development (video tapes of successful classes, curriculum materials, sample lesson plans, your goals and objectives,

workshop classes attended, publications, awards, certificates, professional affiliations, your teaching

philosophy, principal’s evaluation, inspections)

Page 175: Tests and Evaluation -Metodology

New Trends in Testing

162 Proiectul pentru Învăţământ Rural

SAQ 3 Why people who do well in intelligence tests usually are better learners than those with excellent memories?

Write your answers in 20 words in the space provided and compare them to those in the “Answers to SAQs” section at the end of the unit.

8.6 Summary

This unit pinpoints the main trends in contemporary assessment and testing that betray a move from quantitative to qualitative testing, from paper – based tests (PBT) to computer based tests, from traditional testing to alternative testing, including self-assessment. Advantages of CAT (Computer Adaptive Testing) may be described as follows:

• Individual testing time may be reduced • Frustration and fatigue are minimized • Boredom is reduced • Test scores may be provided immediately • Diagnostic feedback may be given immediately • Test security may be enhanced • Record-keeping functions are improved • Reporting, research and evaluation capabilities are

expanded. A constructivist approach to testing, one of the alternatives to traditional testing, is portfolio evaluation. Several advantages are:

• It promotes cooperation rather than competition • It enhances professional communication • It requires no technical knowledge of quantitative evaluation

procedures • Ideas are conserved for future application in other classes

Other characteristics of contemporary trends include: • Continuous long – term assessment • Formative tests • Criterion – referenced score • Interactive and motivation

8.7 Key Concepts

• Alternative assessment • Bodily – kinaesthetic intelligence • Constructivist approach • Computer • Computer adaptive testing • Computer administered test • Computer assisted instruction (CAI)

Page 176: Tests and Evaluation -Metodology

New Trends in Testing

Proiectul pentru Învăţământ Rural 163

• Computer assisted language learning (CALL) • Computer based instruction • Computer literacy • Computational linguistics • Error control test • Interpersonal intelligence • Intrapersonal intelligence • Journal • Linguistic intelligence • Log • Logical – mathematical intelligence • Microcomputer • Mainframe computer • Musical intelligence • Portfolio evaluation • Spatial intelligence • Zone of proximal development

SAA No. 5 This paper aims at making you identify, clarify and develop ou informed, comprehensive personal philosophy of grading that is consistent with your philosophy of teaching and evaluation. Examine the following items and circle the figures for all items that should be included in your set of criteria for determining a final mark. Write in a percentage the weight you would assign to each circled item (obviously the total percentage is 100 %) 1. Language perfomance of the learner (based on tests, quizzes, other objective tests) .......... 2. Your informal observation of the learner's language ...... 3. Oral participation in class activities .............. 4. Attitudes and behaviour: degree of cooperation, politeness, disruption in the classroom .............. 5. Effort ........... 6. Motivation ................. 7. Punctuality and attendance ............. 8. Self-assessment ............... Write an essay about your philosophy of grading taking into account your answers to the questionnaire above. Do not write more than three pages. Consider the following questions: Do you consider yourself consistently impeccable in your objectivity? Can you capture the totality of your student's competence only through formal tests? What is the value of alternatives in assessment? Example: I base my final marks/grades on student language performance because I think grades should represent .................. Grades should not be contaminated by ........................... I discourage the inclusion of ..............................

Page 177: Tests and Evaluation -Metodology

New Trends in Testing

164 Proiectul pentru Învăţământ Rural

Your essay will be scored according to the following grading scale and explanations: 50% - clarity and strength of all main ideas and supporting ideas, argument and logic; 10% - grammatical and mechanical errors, word choices and expressions; 20% - cohesive devices within and across paragraphs 10% - documentation, citation of sources, evidence, and other support 10% - adequacy and strength of the conclusion. Do not forget to send your evaluation to your tutor in due time.

8.8 Answers to SAQs SAQ 1 If your answer to SAQ 1 is not comparable to the one suggested

below, please reread section 4.3 again

These are self-assessment questions. SAQ 2 If your answer to SAQ 2 is not comparable to the one suggested

below, please reread section 8.3 again

Difficulty and content range could be specified in advance • By consideration of the number of items employed, test reliability

could be estimated in advance • Since the items are being assembled by machine from a large

item bank, security is maintained SAQ 3 If your answer to SAQ 3 is not comparable to the one suggested

below, please reread section 8.3 again (the paragraph on inteligence)

IQ tests measure the abilities of pattern – recognition, non-verbal and verbal reasoning, and problem solving. Because learning requires creative “meaning making”, not passive remembering.

8.9 Further Readings Brown, H. Douglas (1994) Teaching by Principles, Englewood Cliffs: Pretince Hall Regents, pp 373 - 395 Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan, pp 118-132

Page 178: Tests and Evaluation -Metodology

Bibliography

Proiectul pentru Învăţământ Rural 165

Bibliography Atkinson, Rita L., Richard C. Atkinson, Ernest R. Hilgard (1983) Introduction to

Psychology, Eighth Edition, San Diego: Harcourt Brace, Jovanovich Publishers Brown, H. Douglas (1994), Teaching by Principles, An Interactive Approach to Language

Pedagogy, Englewood Cliffs: Prentice Hall Regents Brown, H. Douglas (1994), Language Learning and Teaching, Third Edition, Englewood

Cliffs: Prentice Hall Regents Brown, H. Douglas (2004), Language Assessment. Principles and Classroom Practices,

San Francisco State University, Longman. Harrison, Andrew (1983) A Language Teaching Handbook, London: Macmillan Heaton, Brian (1991) Language Testing, Modern English Publications: London Hedge, Tricia (2000), Teaching and Learning in the Language Classroom, Oxford: Oxford

University Press Hughes, Arthur (1991), Testing for Language Teachers, Cambridge: Cambridge University

Press Jones, Leo (1991), Cambridge Advanced English, Cambridge: Cambridge University

Press Seaton, Brian (1982), A Handbook of English Language. Teaching Terms and Practice,

London: Macmillan O’Grady, William and Michael Dobrovolsky (1989), Contemporary Linguistics, New York:

St. Martin’s Press Patton, Michael Quinn (1982), Practical Evaluation, London: Sage Publications, Newbury

Park Pavelcu, Vasile (1968) Principii de docimologie, Bucuresti: EDP Seaton, Brian (1982), A Handbook of English Language Teaching Terms and Practice,

London: The Macmillan Press Ltd Vagler, Jean (2000), Evaluarea în învăţământul preuniversitar, traducere de Cătălina

Gârba şi Ionela Băluţă, Iaşi: Polirom