Testing and assessing vocabulary
It is almost a truism that vocabulary is tested directly or indirectly in all tests of a learner's language ability. It is difficult to conceive of any type of test which is lexis free. Even when a test item looks like this, for example:
Select the correct
answer:
John couldn't get in because he ____________ left his keys at
the office.
a) was leaving
b) had left
c) would leave
d) has left
and is presumably designed to test the subject's knowledge of
past-tense forms in English, it requires the test taker to
understand the connection between keys and the ability to
get in somewhere, the meaning of the modal auxiliary verb, the meaning of the
adverb in get in and the logical connection implied by the
word because.
Without that lexical knowledge, it's hard to demonstrate grammatical
knowledge and get the right answer.
So, if vocabulary knowledge is routinely a part of testing all the
other sorts of language ability ...
... why test vocabulary separately? |
This is not the place to set out the different purposes that
tests fulfil, whether they are achievement, diagnostic, proficiency
or progress tests. Nor is this the place to discuss the motivating
factors that tests sometimes enhance. We are concerned here
with testing vocabulary in particular, not testing in general.
Guides to general areas of testing and lexis are linked in the list
of related guides at the end.
There are a number of good reasons for testing vocabulary discretely from other skills and abilities.
- Backwash:
Explicitly vocabulary testing often results in teachers paying more attention to its teaching and being more consistent and discerning about what items they focus on.
Backwash may also have an effect on the learners. If they know that vocabulary is going to tested discretely, they may well be motivated to review what they have encountered and consigned to notebooks, probably in no particular order. They may even be persuaded to revisit and reorganise their vocabulary notebooks. - As a measure of overall ability:
Vocabulary knowledge has been shown to be a good indicator of a learner's overall ability in a language so, for diagnostic and placement purposes, vocabulary testing is a useful tool. - Face validity:
Some learners make very great efforts to acquire vocabulary because they recognise, quite rightly, that although it is difficult to communicate without grammar, it is impossible without words. If we do not test vocabulary in an identifiably discrete way, learners may not feel that their abilities are being fairly assessed. - Depth vs. breadth:
Testing vocabulary incidentally, in a mix of other test types, may give us some measure of the breadth of learners' lexical knowledge (i.e., the size of their lexicons in crude terms) but is unlikely to provide anything like the precision we require if we want to measure the depth of their knowledge of lexis. This means testing vocabulary separately so we can get some estimation of how well items are known, not just how many are recognised. - Learning is
remembering:
Vocabulary learning is not subject to a rule-based approach in the same way that learning grammar rules and applying them can be. There are distinct patterns, of course, such as collocational aspects, affixation, multi-word verbs, synonymy, homonymy and so on but, essentially, learning vocabulary means remembering vocabulary and testing it is a strong motivating factor in encouraging learners to review and recycle what they have encountered. - Revision and review:
Vocabulary is an area where it has been shown that multiple exposures to lexemes in context are required before items can be said to have been acquired.
Vocabulary testing provides an opportunity, when giving feedback, to review, recycle and extend learners' knowledge. - Spacing:
It has been shown that it is better to space out vocabulary learning and recycling rather than concentrate it in blocks of intense effort. In the trade, this is known as distributed practice. Such practice, it is argued, allows short-and long-term memories to integrate.
Testing at regular intervals allows the teacher to space out the learning and recycling at gradually increasing intervals.
What to test: depth vs. breadth |
Hughes asserts that for the purposes of a placement test, i.e., "in essence a proficiency test" (Hughes, 1989:147):
All we would be looking for is some general
indication of the adequacy of the student's vocabulary.
Ibid.
If that were all there was to it, we simply need to focus our
test on the vocabulary items we consider most frequent and useful
for our learners, perhaps drawing on something like the general
service word list or an academic word list and design a test to see
if our learners can accurately understand (by some kind of matching
task) and use (through a form of gap-fill testing) the items we have
targeted.
That will give us a rough-and-ready indication of the breadth
of their passive and active vocabulary.
Another possibility, of course, lies in asking how well learners
know vocabulary items, not how many they know. This will mean
focusing on phenomena such as pronunciation, collocation,
colligation, word formation (morphology), word grammar
(transitivity, countability etc.) and perhaps some other factors
concerning hyponymy, synonymy, simile, metaphor, style, register and
idiomaticity. This is what is meant by focusing on depth of
understanding as well as on breadth of knowledge.
For this to work, we need to be a little more imaginative in how
we construct test items, as we shall see.
What to test: targeting the test |
What you test is dependent on why you test, i.e., what the test is designed to tell you.
- Achievement and progress tests
will focus, if they are to be fair, only on the items taught or encountered on the course.
They are either (or both):- Formative and frequently carried out to identify what needs to be recycled and reviewed.
- Summative and carried out at the end of a course to see how well the items have been acquired as a way of evaluating the success of the programme.
- Proficiency, placement and diagnostic tests
will focus on getting an estimate of the size and depth of learner's knowledge and will depend, usually, on some kind of sampling.
If, for example, we have identified 2000 words that a learner at A2 level should know, a test of 100 randomly selected words from the list will represent a sample of 5% which is actually rather good statistically, although a sample of 200 words would, of course, be twice as good and twice as time consuming.
When learners are at C1 level, however, they are expected to know around 4000-5000 words and random sampling becomes much more difficult because to attain a 10% sample rate, we would need a 400-item test which would take at least 3 hours to complete if one allows 30 seconds per item.
That is impractical in most settings and explains why vocabulary testing in public examinations is often integrated into testing other aspects of language knowledge and ability.
Measuring breadth: vocabulary size |
This is the attempt to discover how wide the test taker's vocabulary is in terms of understanding and using lexemes. It is not a particularly sophisticated measure of language competence but there is evidence that breadth of vocabulary is a good indicator of general language proficiency. However, there are provisos:
- Cross-language facilitation and interference:
This is not an issue when all the learners share a common first language and no other apart from the target language.
However, in groups where the learners come from a variety of language backgrounds and/or in which some members of the group may have learned other additional languages, the issue of cross-language influences begins to be felt.
When learning English, for example, it is unlikely that learners from a Romance language background or who have learned a Romance language, no matter what their first language is, will have very much difficulty understand a word such as consolidation because a word which looks similar and carries the same meaning exists in most of these languages (French, Italian, Spanish, Romanian, Portuguese etc., in all of which the word begins consolida- with only the ending differing slightly if at all from English).
Speakers of Slavic languages will be slightly more challenged but a cognate word exists in most of them which can be identified with a little effort.
However, learners from other language backgrounds, especially non-Indo-European ones will have no such support from their first languages and will need to have learnt the word from scratch. Even in German, where a similar word exists, a more natural translation might be Vertiefung which bears no superficial relationship to the English word. In other languages, the form of the word also bears no relationship to the English word at all:
samstæðu (Icelandic)
укрепление (ukrepleniye) (Russian)
sağlamlaştırma (Turkish)
ukuhlanganiswa (Zulu)
vakauttaminen (Finnish)
pagpapatatag (Filipino / Tagalog)
fanamafisana (Malagasy)
etc. - Register:
Learners who have certain interests and/or professions may find that some items are well known to them which are obscure to people with other backgrounds. For example, a learner, whatever his or her first language, who happens to be a chemist will have little difficulty understanding a word like sulphate (or sulfate if you prefer AmE) which is similar if not the same in a very wide range of languages (but not all). Other learners will be more challenged.
Equally, a learner with a background in banking might well understand the terms direct debit, exchange rate, deposit etc., whatever her or his first language, where other learners will struggle.
A learner who is particularly interested in motor racing will probably be familiar with bend, pit-stop, chicane, chequered flag and a number of other terms which are obscure to those without any specialist knowledge.
The moral is to try as far as possible in the selections of items to test to avoid bias of this sort and also, where appropriate, to focus only on the items which have been taught, or at least encountered, on the course. That is feasible if one is designing a progress or achievement test, less so in designing placement, diagnostic or proficiency tests as we saw.
Ways of measuring vocabulary size |
The following are not confined to measuring vocabulary size because they can be used to measure other aspects of lexical knowledge (if they are somewhat different designed). Here are five examples and some comments:
Synonym tests |
These are simple to design and administer. For example:
Choose the item which is closest in meaning to fire:
- blaze
- combustion
- ignition
- eruption
This is a test which depends on knowing all the words and being able to match meanings. A more searching test, sometimes, is to choose words and distractors which are close in form but not meaning. For example:
Choose the item which is closest in meaning to search:
- seek
- clench
- reach
- trench
and a test like that can also be used to test whether learners can distinguish between homophones, like this:
Choose the item which is closest in meaning to feet:
- pause
- pores
- pours
- paws
The problem, of course, with test items like these is that there is no context so distractors must be very carefully chosen to eliminate any possibility that, in certain circumstances and with certain meanings, more than one correct answer may exist.
Definition tests |
These are also easy to design if you have a learners' dictionary to hand. Examples are:
Which word means extremely frightened?
- afraid
- horrified
- petrified
- scared
Which definition of frown is correct?
- an expression showing anger or disapproval
- a gesture showing dislike
- raising your eyebrows to show surprise
- stretching your mouth to show dislike
There are problems with both of these test types because the first depends on understanding that the words are adjectives not verbs and the second, of course, depends on understanding the words in the definitions such as disapproval, gesture (vs. expression), eyebrows etc.
Gap-fill tests |
These can get over the issue of a lack of co-text. For example:
Fill the gap with the correct word from the
list of four:
The computer program __________ much faster processing of
information.
- empowered
- enabled
- let
- qualified
The drawback with this kind of test is that, although the distractors should not in theory contain words the test taker does not know, it is often very difficult to identify distractors which are conceivable (but wrong) rather than wrong (and obviously so). Another issue is that the co-text should also not contain unknown or ambiguous items.
Gap-fill tests in which no alternatives are given, may also be a way of measuring productive ability rather than recognition. For example:
Fill the gap with one word only:
Mary lost her key so she __________ mine to get into the flat.
The obvious drawback with this sort of test of vocabulary is that
it is very hard to write a series of item in which only one possible
word is allowable. In the gap in this example, borrowed,
took, stole, appropriated, nicked and a range of other possibilities are
allowable and that complicates marking by introducing an element of
judgement of appropriacy. Would you allow purloined,
for example?
A way around this is to redesign the task like this to give the
first letter and an indication of how many letters the word
contains:
Fill the gap with one word only:
Mary lost her key so she b _ _ _ _ _ _ _ mine to get into the flat.
but that, naturally, makes it easier.
Using pictures |
Pictures can elicit productive vocabulary and this is a technique commonly deployed. For example:
Write the correct words for the sports by the pictures: | |
_______________ | |
_______________ | |
_______________ |
Unfortunately, this only works for lexical items that can be unambiguously identified from pictures and even then, some items may be representable by more than one word and that complicates marking.
Definition tests |
These, too, can be used to measure productive ability, like this:
Fill the gap with one word only:
Electronic devices which are connected to an amplifier and fit over
both ears to play sounds are called:
____________________ .
The drawback is that there are very few words which can be completely unambiguously defined in this way.
Measuring depth: vocabulary use |
The first decision concerns the selection of the aspects of a word that you want to test. In the general guide to teaching lexis, the following were identified as what it may be necessary to know in order to 'know' a word:
- what a word means – what it denotes and what it connotes (if appropriate)
- how it is connected to other words which mean similar things (e.g., buy, sell, bargain, discount etc.)
- what words it commonly goes with (collocation) so we know we can't have a high tree but prefer tall as the adjective, for example
- what other meanings it can have (e.g., shop, bank etc. can have different meanings and fall into different word classes)
- how the word changes depending on its grammar (e.g., shop, shops, shopping, shopped etc.)
- what grammar the words uses (e.g., does it take a direct object, an indirect object, both, a preposition, does it have an odd plural or an irregularity? etc.)
- how to pronounce the word.
- what kind of situations the word is used in and who might use it? Is it, for example, typical of a certain register?
Depth of meaning also concerns passive and active vocabulary, of course. Here are thirteen example test items with a commentary.
Word knowledge |
Use the word complain in a sentence
of a minimum of 8 words. Your sentence must contain a subject
and an object.
____________________________________________________________________________________________
Clearly, this sort of test requires subjective marking although the marker will only be looking for accuracy concerning the target and ignore the rest but it tests a wide range of knowledge because the test taker needs to be able to
- recognise the word class
- understand the meaning of the verb
- know that it is a prepositional verb usually combined with about or of
- use the verb transitively with an appropriate object
For small test samples, this kind of item can be revealing. The test can also be done orally and that will include a check on whether it can be pronounced adequately.
Collocation |
Mark with a
or a
which words on the left can be used with the words at the top.
The first one is an example.
rain | snow | wind | sunshine | |
heavy | ||||
pouring | ||||
strong | ||||
powerful | ||||
drifting | ||||
blowing | ||||
blazing | ||||
swirling | ||||
forceful |
For variety and a little more precision, test takers can also be invited to put a by any item they consider doubtful.
Collocation and naturalness |
Collocation can also be tested on a scale of naturalness so we could have:
Mark these sentences with a
1, 2 or 3 using a
.
1 means it is the most natural
2 means it is possible but unnatural
3 means it is very unlikely or impossible
You can use each number as many times as you like.
1 | 2 | 3 | |
weighty issue | |||
heavy issue | |||
bulky issue | |||
lions groan | |||
lions rumble | |||
lions roar | |||
out of control | |||
beyond control | |||
on control |
Collocations of many sorts can be tested this way because there is a cline from wholly unnatural to slightly and fully natural.
Formality |
Formality sensitivity can be tested in the same way:
Style:
Mark these sentences with a
1, 2 or 3 using a
.
1 means it is formal
2 means it is neutral
3 means it is informal
You can use each number as many times as you like.
1 | 2 | 3 | |
please pass the salt | |||
give me the salt | |||
would you hand me the salt, please | |||
they tend to be annoying | |||
they are a pain | |||
they are irritating | |||
I'm averse to swimming | |||
I am disinclined to swim | |||
I don't like swimming |
Register |
Register sensitivity can be addressed in the same way:
Mark with a
or a
which words on the left you would
expect to hear in the settings at the top.
The first one is an example.
IT | business | football | theatre | |
spreadsheet | ||||
transfer | ||||
performance | ||||
applause | ||||
critic | ||||
shoot | ||||
substitute | ||||
understudy | ||||
replacement |
Paradigmatic and syntagmatic relationships |
Mark with a
or a
which words on the left you can associate with the words at the top.
The first one is an example.
delayed | alteration | computer | light | |
late | ||||
change | ||||
machine | ||||
electronic | ||||
train | ||||
minor | ||||
bright | ||||
program | ||||
operation |
This is not an easy test to understand in terms of what you have
to do so learners need a little training to look for the two types
of relationships at which it is aimed.
Again, for variety and a little more precision, test takers can
also be invited to put a
by any item they consider doubtful.
The test encourages the learner to try to recognise words of a
similar nature and word class (paradigmatic relationships) as well
as those likely to co-occur syntactically (syntagmatic
relationships).
Colligation |
It is possible to test learners' understanding of word grammar in a number of different ways. For example:
Mark with a
or a
which phrases are correct.
Then, if necessary, write the correct form in the box on the right.
or | Correction | |||
1 | I am sorry for late | |||
2 | I allowed him to come | |||
3 | She let him to stay | |||
4 | I concealed it under the table | |||
5 | I concealed behind the curtain | |||
6 | They arrived the hotel | |||
7 | He donated them the money | |||
8 | We handed over the doorman the tickets | |||
9 | We expected him to arrive late | |||
10 | We hoped her to come early | |||
11 | We can probable come | |||
12 | It's difficult but please try | |||
13 | It's hard but please attempt | |||
14 | She's an unwell child | |||
15 | I very almost was late |
Only four of the above are correct (2, 4, 9 and 12) and the others target specific aspects of colligation which are exemplified in the guide to the area, linked below in the list of related guides.
Word class |
This is a simple test of what is known about a word:
Mark with a
or a
which words on the left belong in the word class at the top.
The first one is an example.
noun | transitive verb | intransitive verb | |
bank | |||
sugar | |||
strength | |||
dig | |||
drift | |||
blow out | |||
haste | |||
music | |||
pause |
Lexical sets |
Sensitivity to sense relations can be tested in the same way:
Word sets:
Mark the odd ones which are not in the same word set as the first
word with a
.
The first one is an example.
late | delayed | overtime | behind | overdue | |||||
change | alter | modify | cancel | postpone | |||||
machine | device | tool | utensil | gadget | |||||
light | fire | ignite | illuminate | show | |||||
taxi | mini-cab | rickshaw | train | rental car | |||||
minor | trivial | small | minimum | important |
Homonymy |
This is a key set of relationships to test. It can be done, for example by:
Mark with a
which word includes the meaning of the four words on the left.
The first one is an example.
facility | shop | building | |
bank post office health centre town hall |
|||
container | holder | vessel | |
tin box can carton |
|||
education | building | institution | |
university college school academy |
Synecdoche, simile, metaphor and other matters |
We can also test more sophisticated and difficult areas of lexical relationships, like this.
Which words can best replace the
underlined words in :
The White House has decided to
impose tariffs on steel.
- the US senate
- the US President
- the President's office
Which words can best replace the
underlined
words in :
He became an actor.
- He went on the stage
- He studied acting
- He went into the film industry
Complete the similes:
- He's like a fish out of __________
- It's as fast as __________
- She's as thick as __________
- I'm as deaf as __________
- It went like a __________
- They purred like __________
I have a lot on my plate this week means:
- I eat too much
- I'm very busy
- I am worried by many things
Word formation |
The understanding of affixation can be tested both receptively and productively, like this:
Mark these words as correct () or incorrect (). If it is incorrect, put the correct form on the right.
or | ||
advertisement | ||
hopeability | ||
understandingness | ||
annoyment | ||
painfulness | ||
treatable | ||
hatefulness | ||
walkable | ||
capableness |
Productive ability can be tested this way, too, as in, e.g.:
Fill the gaps with the correct form of the
base word. Put a
where it is not possible to make a word.
The first one is an example.
noun | transitive verb | intransitive verb | adjective | |
snow | snow | snow | snowy | |
sweet | ||||
add | ||||
dig | ||||
love | ||||
rain | ||||
hurry | ||||
contain | ||||
old |
As you can see from the example of love here, items need to be carefully
chosen because a range of derived words may be formable (lovable,
lovely, loving, loved etc.).
Another way to do this is to populate a grid with some of the target
stems or derivatives and get the learners to complete it with a word
or a
if no item is available.
Like this:
Fill the gaps with the correct form of the
words. Put a
where it is not possible to make a word.
The first one is an example.
noun | verb | adverb | adjective |
snow | snow | snowy | |
hate | |||
hurriedly | |||
advertisement | |||
hot | |||
please | |||
sideways | |||
thought | |||
cheerful |
Not all the answers rely on affixation so if that is the focus, amendments to the items are in order.
A simpler way is something like:
Select the correct word:
- unpossible
- inpossible
- impossible
Select the correct word:
- dirtity
- dirtiness
- dirtfulness
Pronunciation |
Although pronunciation is probably best tested orally for obvious reasons, that is not always practical especially if the test setter and the test taker are not in the same place. It is possible to test it in writing, however. For example:
Which word rhymes with hoped?
- dropped
- shopped
- soaped
- locked
- adopt
Which word contains the same sound of the 's' as in sugar (/ʃ/)?
- sword
- leisure
- school
- shame
- measure
- muscle
One can design items of this sort in which the test taker needs to select multiple possibilities and, if the test taker is familiar with the phonemic script, it makes life considerably easier so we can have, e.g.:
Which words contain the sound /uː/?
- sword
- foot
- lost
- loose
- sure
- should
- goose
- cruise
and one can add, "as in choose" to the rubric to make it clearer.
This may not be an ideal way of testing pronunciation and it is unlikely that one can focus on anything more than vowel and consonant pronunciation this way but it may be the only way in some settings. Trying to test features of connected speech, with the possible exception of the weak-form schwa (/ə/), is very difficult.
If you follow some of the guides linked below, you may discover other phenomena concerning lexis which, with a little imagination, you can assess in ways similar to those exemplified above.
Related guides: | |
testing index | for the index to this area of the site |
idiomaticity | which considers levels of transparency, strong collocation, binomials and so on |
collocation | a guide to a key area to see what you might be testing |
colligation | a guide with examples of colligation types that you may consider testing |
synonymy | which includes explanations of metonymy, synecdoche, simile, metaphor and hyponymy, all of which can be tested |
lexical relationships | for an overview of synonymy, hyponymy and other terms |
word formation | for the guide to suffixation and prefixation and much else |
testing grammar | for the sister guide. Grammar and vocabulary are often tested simultaneously. |
testing and assessment | a general guide to testing, assessment and evaluation with some key terms explained |
the lexis index | for a list of other guides in this area |
References:
Hughes, A, 1989, Testing for Language Teachers, Cambridge: Cambridge
University Press
Schmitt, N, 2000, Vocabulary in Language Teaching, Cambridge:
Cambridge University Press