Delta Module Three ELT specialism: testing and assessment

This guide assumes that you have already followed the one on testing, assessment and evaluation and/or know about testing formats and purposes. If you would like to open that guide in a new tab, click here.
Links to other guides in this page also open in new tabs so just shut the page to return.
For the purposes of Module Three, three types of test concern us:

Diagnostic: to discover people's strengths and weaknesses
Proficiency: to discover people's current abilities and level
Achievement: to discover and evaluate how well a course has met its learning targets for the students

Which test, when?

Testing happens more than once for the Module Three course:

It forms an integral part of the needs analysis (diagnostic assessment) but the testing must come after the needs analysis or you won't be able to target the test properly.
It is central to the evaluation of the course itself both during the course (formative assessment) and at the end (summative assessment).

That means that Part 2 (the Needs Analysis) and Part 4 (the Assessment) are both impacted by the ways you test and your understanding of the guiding principles. That's nearly half the whole of Module Three. There are four areas:

As part of but not instead of the needs analysis process you need a test which will perform a dual function. It will be a dual diagnostic and proficiency test.
1. it will tell you how well people have already mastered the language and skills they need before the course begins.
2. it will tell you where their strengths and weaknesses lie to inform the course design.
During and at the end of the course, part of your evaluation of its success must involve tests to see how well the learning objectives are being or have been addressed. These are achievement tests.
Additionally, you will want to include some type of formative assessment and evaluation while the course progresses. That, too, will be a form of achievement testing and may also include elements of diagnostic testing to help you plan where to go next and know whether you need to tweak the course programme to take account of the learning process.
Finally, at the end of the course, you need to evaluate its success and comment on its suitability for future use.

The principles set out below apply to all of these.

The three essentials to discuss

Three fundamental characteristics of a test were established in the general guide to this area. What were they?
Click here when you have an answer:

Reliability
A reliable test is one which will produce the same result if it is administered again.
Validity
1. does the test measure what we say it measures?
2. does the test contain a relevant and representative sample of what it is testing?
Practicality
Is the test deliverable in practice?

Which is the most important for the purposes of Module Three? Click to go on.

You are dealing with a defined group of people (possibly quite a small group or even a single learner) who will only take any of your tests once, probably. Reliability is not, therefore, a central issue.
You are not constructing a public examination or an institution-wide repeatable test. You are targeting only what you need to know about this group for the purposes of planning a course for them. You will need to make this clear in your discussion if only to reassure the reader-marker that you understand the principle involved.
It is validity and practicality that we are mostly looking for.
Reliability will become an issue if you plan to repeat the course with the same end-of-course achievement test.

Issues of validity

You will recall that there are 5 types of validity to consider:

Think of some comments to add to the right-hand boxes concerning your Module Three focus and then click the graphic to reveal some.

validity task

There will be variations.
For the diagnostic test, all the above apply but for an end-of-course achievement test, predictive validity may be less crucial. Construct validity and content validity will still be important. You have to test what has been taught / learned and you have to be able to describe what you are testing.

Issues of practicality and reliability

More than likely, you will be conducting the diagnostic / proficiency test at a distance but the achievement test will be a face-to-face event. There are implications:

You need to make sure that the test is reliable if different people take it at different times. That means clear instructions to the subjects and/or colleagues elsewhere.
You need to consider the group's nature:
1. If it's a large group, use a test format which allows for as much objective marking as possible. The larger the group and the larger the number of markers, the less reliable subjective marking becomes.
2. If it's a small group or a single student, you can use more subjective marking because reliability is less of an issue
There may be institutional constraints concerning the amount of time which can be devoted to this both by the group and by the markers and course designer (you). If there are, discuss them.

Impact: the fourth factor

This is a fourth factor you may like to consider, especially regarding the end-of-course achievement test.
Impact refers to the effect the process of testing has on the test takers and the course itself. You don't need to discuss this when referring to the diagnostic/proficiency, pre-course test but impact may be an issue when you come to discussing the achievement test. However, any in-course and end-of-course testing you do may impact the course itself:

the learners must perceive the tests as a fair assessment of their progress (face validity)
the teachers may be influenced by the nature of the tests they know will be done (washback / backwash)
the learners may become fatigued and demotivated by too much testing
testing can be a time-consuming process and time may be at a premium

Designing the test

When you get down to designing the actual test items much will, naturally, depend on what it is you want to test. Testing skills vs. language items such as structure and lexis requires a different approach usually. You need to think carefully before you plunge in but here are some ways of doing it.

Test-item types

Here's a list. Think of the advantages and disadvantages to put in the right-hand areas and then click on the graphic.

item types

You will need to discuss the (dis)advantages in these sections of your Module Three report. This discussion will be brief when it is inserted into the Needs Analysis section because you are only concerned with diagnostic / proficiency testing here and you only have 900 or so words to devote to the whole section.

Response types

The next thing to consider is how you want the subjects to respond to the items. On this will depend how long the test takes, how much data you can gather and how subjective, or otherwise, the marking is.

Here's another graphic to respond to.

task 3

All tests are compromises so there are pros and cons to discuss in the report:

Adding reliability by making marking as objective as possible and using discrete items with limited response types will reduce validity by disallowing direct testing of the target skill.
Making sure you have high levels of content validity may mean employing subjective marking of free-response items and a loss of reliability and practicality.

Only you can decide, given the priority you place on certain kinds of data, what the appropriate mix is but you must be clear in your discussion that you are aware of the compromises you have made.
The questions posed above about which test types and item designs are appropriate for the data you want to capture are not easy to answer (and cannot be answered for your specific module focus here). What must be clear in the discussions in the Needs Analysis section but particularly in the Course Assessment section of the information report is that you have thought about it all and based your tests on principles rather than intuition and practicality alone.

Writing the testing sections for Module Three

In your report, you will have to describe and justify your testing procedures but also discuss them (twice). In other words, you will embed a discussion inside an information report. That discussion will also include an evaluation of whether the test actually served its function.
In the in-service training section of this site there are guides to assessing the four main language skills, vocabulary and grammar competence independently. They are:
The guide to assessing Listening Skills
The guide to assessing Reading Skills
The guide to assessing Speaking Skills
The guide to assessing Writing Skills
The guide to assessing grammar
The guide to assessing vocabulary

If you have followed the guide to structuring the whole of Module Three, you will know how that works but here's the overview

testing structure

The content will depend on whether this forms part of the Needs Analysis or part of the Course Assessment section. In either case, you need to include all tests in your appendices.

For the Needs Analysis, keep it brief and to the point but don't be afraid to refer forward to a deeper discussion of principles which will come later.
For the Course Assessment section, you need more discussion and more depth.
Here, too, you must devise ways of canvassing people's views about the course generally and for some discussion of questionnaire formats (which can be used for face-to-face debriefs, too) you should refer to the relevant section of the guides to Needs Analyses on this site.

Avoid the obvious errors

Examiners' reports contain more or less the same catalogue of weaknesses year after year so be careful to avoid any of the following being levelled at your use of tests and evaluation procedures.
For this section, the most frequently cited problems are:

a failure to distinguish between formative and summative assessment
- Make sure you show that you understand the difference so you can be clear about what is being tested and why.
failing to link assessment procedures to the course objectives
- Make sure you show that all your testing procedures are designed with the course objectives in mind.
being over-reliant on public examination materials
- This is especially relevant for assignments focused on preparation for examinations. Be aware that simply lifting test items of whole parts of the target examination is rarely a good way to assess the learners. Do not confuse proficiency testing (the examination) with summative and formative assessment on your course.

General references for testing and assessment. You may find some of the following useful. The text by Hughes is particularly clear and accessible:
Alderson, JC, 2000, Assessing Reading, Cambridge: Cambridge University Press
Cambridge English Language Assessment, 2013, Principles of Good Practice Quality management and validation in language assessment, available at https://www.cambridgeenglish.org/images/22695-principles-of-good-practice.pdf has some useful material
Carr, N, 2011, Designing and Analyzing Language Tests: A Hands-on Introduction to Language Testing Theory and Practice, Oxford: Oxford University Press
Douglas, D, 2000, Assessing Languages for Specific Purposes. Cambridge: Cambridge University Press
Fulcher, G, 2010, Practical Language Testing, London: Hodder Education
Harris, M & McCann, P, 1994, Assessment, London: Macmillan Heinemann
Heaton, JB,1990, Classroom Testing, Harlow: Longman
Hughes, A, 2003, Testing for Language Teachers, Cambridge: Cambridge University Press
Martyniuk, W, 2010, Aligning Tests with the CEFR, Cambridge: Cambridge University Press
McNamara, T, 2000, Language Testing, Oxford: Oxford University Press
Rea-Dickins, P & Germaine, K, 1992, Evaluation, Oxford: Oxford University Press
Underhill, N, 1987, Testing Spoken Language: A Handbook of Oral Testing Techniques, Cambridge: Cambridge University Press

Test yourself on the content of this guide.

The Module Three ELT Specialism areas:
Choosing the topic	Writing the essay	The introduction	Needs analysis
Testing	Course proposal	Module 3 reading list	Before you submit