Assessing listening skills

listening

In many ways, the consideration of testing and assessing listening ability parallels that of assessing reading. Both are receptive skills and both can be broken down in similar ways. For that reason, you should read this guide after or in conjunction with the guide to assessing reading ability.

The essential difference between the skills is that the listener cannot move backwards and forwards through the text at will but must listen for the data in the order in and speed at which the speaker chooses to deliver them.

In common with the assessment of reading skills, that of listening skills is, perforce, indirect. When someone speaks or writes, there is a discernible and assessable product. Merely watching people listen often tells us little or nothing about the level of comprehension they are achieving or the skills they are deploying. This accounts for the fact that both listening and speaking skills are often assessed simultaneously. In real life, listening is rarely practised in isolation and the listener's response to what is heard is a reliable way to assess how much has been comprehended.

Rarely, however, does not mean never and there are a number of times when listening is an isolated process. For example, listening to the radio or TV, a lecture or a station announcement are all tasks which allow no interruption or feedback from the listener to gain clarification or ask questions. One can, of course, allow the listener access to a recording which he or she can replay as frequently as is needed to understand a text but, as this cannot be said to represent a common real-life task, we'll exclude it from what follows.

We can, of course, test some underlying skills discretely. For example:

we can test learners' abilities to understand lexical items through, e.g., matching or multiple choice exercises
we can assess the ability to recognise individual phonemes by, for example, getting learners to match minimal pairs of words to written forms

and so on.

However, before we do any of that, we need to define what listening skills we want to test and why. For more on the subskills of listening, see the guide to understanding listening skills. The following is premised on the fact that you are familiar with the content of that.

The aims of the teaching programme

All assessment starts (or should start) from a consideration of the aims of instruction. With listening skills, as with reading skills, however, it is notoriously difficult to identify specific skills which are linked to specific purposes. An argument can almost always be made that the following are key macro listening skills whatever the setting, whatever the purpose and whatever the topic and text type:

Listening to locate specific data is required:
1. for academic purposes to locate the part of a lecture or address or book which focuses on what needs to be learned
2. by general listeners to locate items of interest in, e.g., announcements and news programmes
3. in the workplace to make the identification and absorption of heard data efficient and focused
Listening to obtain the gist is needed:
1. by busy people in their occupations so they can judge whether something they are hearing is relevant or ignorable in part or whole
2. by students to judge whether a comment or section of a lecture is relevant to their studies and current concerns
3. by general listeners who simple want to get the gist of a text and don't need detailed understanding
Following directions and instructions:
1. by general listeners needing to know what to do or where to go in response to an enquiry which may be as simple as Where's the toilet? or much more complicated
2. by students to find what to do, what to read and when to submit work
3. by people in the workplace to allow them to follow an instruction and organise their working time

Underlying these three macro skills are a number of micro listening skills without which few texts can be properly understood. These will include, for example:

Recognising the phonemes of English, especially those which are allophones in English but full phonemes in the learners' first language(s)
Identifying lexemes and word boundaries
Using context and co-text to infer meaning (including visual information)
Understanding intonation a recognising attitude
Recognising the communicative functions of utterances: questions, instructions, responses, initiations etc.

Three basic tenets

We have to use assessment tasks which focus on the kinds of texts the learners will hear in 'the real world'.
We need to design tasks which accurately show the learners' ability.
We need to have a reliable way to score the learners' performance.

These three factors are to do with ensuring reliability and validity. For more on those two concepts, see the guide to testing, assessment and evaluation. The rest of this guide assumes basic familiarity with the content of that guide.
Fulfilling all three criteria adequately requires a little care.

Identifying listening text types

The first step is to find out what sorts of texts the learners will need to access and what strategies are appropriate for the purposes of listening. This is by no means an easy undertaking, especially if the course is one in General English (also known as ENOP [English for No Obvious Purpose]) when it is almost impossible to predict what sorts of texts, for what purposes the learners may one day need to access (see below for a generic check-list of skills).
On courses for very specific purposes, it is easier to identify the sorts of texts the learners will encounter and the purposes for which they will listen to them but there is no related set of subskills that we can identify with confidence that will allow them easy access to texts in particular topic areas. We can, however, look at the types of texts and identify key listening strategies to focus on. For example:

Situation	Skills needed
ANNOUNCEMENTS	Good monitoring skills to decide on relevance (Is this my flight?) and the ability to extract vital data (gate numbers, platforms etc.)
LECTURES	Listening for signposting (sequences, itemisation, prioritisation, importance etc.)
RADIO AND TV	Gist listening to entertainment to follow a plot Monitoring for relevance in a news broadcast Using visual clues to understand TV programmes
INSTRUCTIONS AND DIRECTIONS	Intensive listening for detailed understanding
MEETINGS AND SEMINARS	Intensive listening to understand detail and locate relevance On-going monitoring to identify questions and invitations to comment
DIALOGUES	Gist listening to follow a conversation Intensive listening if the listener is a (potential) participant

When we know what kinds of settings in which our learners will need to operate, we can get on with designing tests which assess how well they are able to deploy the skills they will need.

A general listening-skills check-list

It may be the case that you find yourself teaching General English rather than English for Specific Purposes. If that is so, you need a general-purpose check-list of abilities at various levels against which to assess your learners' abilities to read. Here's one:
listening check list

The abilities and text types are, of course, cumulative. At, e.g., B2 level, a learner should be able to handle everything from A1 to B1 already.

Designing tasks

Now we know what sorts of thing we want to assess, the text types we are targeting, the purposes of listening, the subskills deployed and so on, we can get on and design some assessment procedures.
There are some generic guidelines for all tasks.
If you have followed the guide to testing, assessment and evaluation (see above), you will know that this is something of a balancing act because there are three main issues to contend with:

Reliability:

A reliable test is one which will produce the same result if it is administered again (and again). In other words, it is not affected by the learner' mood, level of tiredness, attitude etc.
This is challenging area in the case of assessing listening because the skill requires high levels of concentration especially if more than gist is to be gleaned.
We need to be aware that very long listening tasks will result in fatigue and that may overwhelm learners who are otherwise good listeners. Unless there is a good reason for using a long text (e.g., when preparing people for study in English), a range of short tasks focused as far as possible on micro skills is a better way forward in most circumstances.
Assessment outcomes are often in written form and the listening text itself often recorded and repeatable so marking can be quite reliable.
Validity:

Two questions here:
1. do the texts represent the sorts of texts the learners are likely to encounter?
  For example, if we set out to test someone's ability to understand a lecture, we need to ensure that the topic area is valid for them.
  On the other hand, if we know that our learners will rarely, if ever, encounter the need to listen to extended monologues from native speakers but will need to understand what they are told in service and informational encounters, then we have to match the texts we use for assessment of their abilities.
2. do we have enough tasks to target all the skills we want to assess?
  For example, if we want to test the ability to use context and co-text to infer meaning, do we have a task or tasks focused explicitly and discretely on that skill? If we want to test the ability to monitor a series of announcements for crucial data, do we have a test that requires that skill?
Practicality:

Against the two main factors, we have to balance practicality.
It may be advisable to set as many different tasks as possible to ensure reliability and to try to measure as many of the subskills as possible in the same assessment procedure to ensure validity but in the real world, time is often limited and concentration spans are not infinite.
Practicality applies to both learners and assessors:
1. for learners, the issue is often one of test fatigue.
  Too many tests over too short a time may result in learners losing commitment to the process.
  On shorter courses, in particular, testing too much can be perceived as a waste of learning time.
2. for the assessors, too many time-consuming tests which need careful assessment and concentration may put an impractical load on time and resources. Assessors may become tired and unreliable.
3. The third issue concerns technology. If we know, for example, that our learners will rarely have to understand audio-only, disembodied text, then providing context and clues through the use of video recordings should be considered. Even settings which are heavily text laden (such as lectures) are accompanied by gesture, expression and visual data that cannot be excluded from a valid test of the skills.

Examples may help

Say we have a short (150-hour) course for motivated B2-level learners who will need to operate comfortably in an English-speaking culture where they will live and work. They will need, therefore, to be able understand a wide and unpredictable range of texts for a similarly wide range of purposes so we need to focus our assessment on generic, recognisable listening skills.
We have around three hours of the course earmarked for this assessment.
What sorts of items could we include, bearing mind reliability, validity and practicality?
Evaluate the following ideas, based on these principles and then click on the to reveal some comments.

Get the learners to watch a 20-minute news broadcast and give them a worksheet designed to get them to identify, from a set of six or so, two essential facts about three of the items.

Negatives:

The issue may be one of text validity for individuals. For one student, familiar with the topic of football or international politics, the task may be made easier than it is for a learner wholly uninterested in either area. Ensuring a wide enough range of topics to engage everyone is challenging.
Here we are focused on two subskills: monitoring and intensive listening. Many people only want the gist of news reports so the task may not fully match the text type.

On the positive side:

The fact that this is a video recording enhances the test's validity because the subjects can use visual clues to infer meaning even if they miss detail in the audio track.
Imaginative use of variations in question setting can make the assessment more valid and reliable and the tasks can be graded for level so that even if the material is well above what the learners would normally understand, careful question-setting can ameliorate the problem.

Get the students to listen to a range of 4 short texts, each targeting a different listening skill:

Three station announcements, only one of which refers to a train to a particular destination stating a platform number and time. The learners' task is to identify it and record the data.
An audio text giving directions to a particular place. The listeners have a map on which to record the destination and route.
An audio-only recording of a dialogue of a heated disagreement. The task is simply to identify what the nub of the argument is from a set of 5 alternatives.
A video recording of the final ten minutes of a lecture in which the speaker sums up the four key points she has made. The task is to make a note of each one.

Negatively:

Task b. is audio only and arguably unrealistic. Asking someone for directions is often a face-to-face encounter and in that case, the informant is likely to point to the place on the map and show the route.
Not everyone is very good at map-reading.
Task c. has a similar fault. Unless the learners actually need to identify topic from a radio play or soap opera, it is hard to see what real-life encounter it parallels.
Task d. is only relevant to learners with specific listening skills needs.
Care has to be taken not to penalise the subject for poor writing skills in task d.

On the positive side:

The tasks accurately target real subskills.
There is a good range of skills being tested.
The answers are in a permanent form and can be reviewed and remarked if need be.
Two tasks employ video recordings and that may contribute to validity.

Give the learners either:

two listening texts on the same subject with speakers giving opposing points of view
two spoken descriptions of a recount of an event with three differences

Ask the students to summarise, in writing, the similarities and differences between the two texts.

Negatively:

This requires a written response so favours learners with good writing skills.
It's quite time-consuming to prepare.
It is limited in terms of topic area and not easily altered to suit ESP concerns.
Care has to be taken not to penalise the subject for poor writing skills or low grammatical accuracy.

On the positive side:

Comparing and contrasting points of view is a key skill.
A written response can be carefully and reliably marked.
Carefully designed with needs in mind, it can be a very valid test of the ability to understand fully what one hears.

Designing anything in life involves striking a balance between competing priorities.

The kind of evaluative thinking that you have used here can be applied to any kind of assessment procedure, regardless of topic, level and task types.

Other listening-skill assessment task types

It may be that your circumstances allow for very simple listening tasks such as those requiring the learners to respond spontaneously to a set of prepared initiations. This kind of test can be done in the classroom or a language laboratory or even on the telephone.
Those are all legitimate tasks providing the task type and content suits the purposes of assessment.
There are other ways.
No list can be complete, but here are some other ideas for other ways to set listening tasks for assessment purposes. The content of any task will of course, depend on all the factors discussed so far.

Monitoring tasks

The station-announcement task above is an example of this kind of procedure. Longer texts can be used as well, asking the learners, for example, to identify the linguistic signal a speaker uses to signpost a summary of key points. Such tasks can be graded even if the text is ostensibly beyond the learners' level. Just locating a gate number or a name in an otherwise complex and indistinct recording is a good test of the ability to monitor and ignore the unnecessary.
Compare and contrast tasks

See above for an example of this task type. Two similar but distinct events can be described in speech for listeners to identify key words (e.g., roadside vs. harbour).
Matching tasks

Getting people to match a short audio description to a picture (or series of similar pictures where only one represents the content of the text) is a good test of detailed understanding.
Multiple-choice tests

These tests can be carefully targeted on particular items in the text to test the ability to listen for detail, infer likely meaning of lexemes and understand tense relationships and so on. They can also be targeted at the ability to listen for gist and identify key words and phrases.
The great disadvantage in terms of listening skills assessment is that the learners need to hold all the alternatives in their heads while they are also being required to focus on the text itself. Alternatives need to be kept short if cognitive overload is to be avoided.
Directions and instructions

In these tests, learners may be required to listen and follow instructions. Such tasks, because of their artificiality, have limited uses but they do test intensive listening skills. Popular topics are origami and following directions to locate something. They can be motivating and intriguing tests.
Labelling tasks

In these tasks, the learners are given a diagram of something fairly complicated and asked to match the descriptions of various labels (A, B, C ...) to the parts of the diagram that the listening text refers to. This is an important academic skill for some learners but of limited utility in other settings.
Note-taking tasks

In these tasks, the usual procedure is to require learners to take notes as they listen and then, when the address or lecture is over, they are presented with questions to answer on what they have heard and use the notes to respond. Providing there is a level playing field, i.e., that some of the learners are not able to answer the questions without any reference to notes because they are familiar with the topic, this can be a valid and reliable test of the skill.
Dictation

Dictation wanders in and out of fashion but is still seen as a reliable if not too authentic way of testing listening ability. The text has to be carefully chosen to be relevant. The problem with this sort of test is that it isn't always clear what's being tested because a good deal depends on the learners' ability to deploy grammatical knowledge and logic to infer what the text should be.
It is a fairly flexible procedure because we can force learners to start with a blank piece of paper or have them fill gaps in a text. The latter can be quite finely targeted.

Measuring outcomes

Unlike writing and speaking test, in which holistic, impression marking can be done, listening tests are normally marked analytically.
This involves breaking down the tasks and being specific about the criteria you are using to judge success. Any amount of weighting can be applied to whichever of the micro skills you judge to be most important.
Normally, the results of a listening test are permanent in some way (short answers, multiple choice responses and so on). Even the success of a Directions / Instructions task (above) can be objectively marked. This means that marking can be objective (and the test is reliable, to that degree) but, unless the test items target recognisable and definable micro skills, validity is always problematic.

The summary

summary

Go to the in-service skills index