Dealing with pronunciation: the essential guide

dealing

Especially at the beginning of their teaching careers, some people are reluctant to treat pronunciation very thoroughly in the classroom. This is because:

It is a technical area and that's slightly intimidating
They don't know what to do apart from getting learners to repeat the teacher's model
Teachers who are non-native speakers of English are worried that their own pronunciation is faulty

Here are some counter arguments:

It is a technical area and that's slightly intimidating
1. Yes, it is in parts but the basics of phonemic transcription can be learned in a few hours and the terminology needn't be used with learners. It's the what of pronunciation which is important not the what's-it-called.
They don't know what to do apart from getting learners to repeat the teacher's model
1. There are some ideas on this page for getting beyond modelling
Teachers who are non-native speakers of English are worried that their own pronunciation is faulty
1. That's often true because it is very difficult, after a certain age, to learn native-like pronunciation but the teacher's production is still going to be better than most of the learners' efforts. Teachers who are native speakers of the learners' first language are often better placed than most to know where the problems lie.

So, don't shy away from teaching pronunciation. Most learners need explicit help, most learners know they need help and most learners appreciate their teacher's help in this area.

For this guide it will be useful if you have followed either the guide to essential phonology terminology or the guide to the essentials of pronunciation (preferably both).
If you have completed the course for learning to transcribe English phonemes, that's even better because this guide has to use some transcription.
All these links will open in new tabs so just shut them to return.

Learn about your learners' language(s)

If you are a native or fluent speaker of your learners' first languages, you have an immediate advantage because you know where the issues lie.
Most learners will attempt to impose the phonology of their first language on the phonology of the language they are learning. So, for example:

Consonants

Dutch, Danish or German speakers will often use an unvoiced consonant at the end of words instead of the English voiced consonant and say
    His name is Bop
(/hɪz.ˈneɪm.z.bɒp/)
instead of
    His name is Bob
(/hɪz.ˈneɪm.z.bɒb/)
or:
    That's my back
(/ðæts.maɪ.ˈbæk/)
instead of
    That's my bag
(/ðæts.maɪ.bæɡ/
Many languages lack the 'th' sounds in words like thank and this and even those that do have the sound (such as Spanish or Greek) will not always differentiate between the voiced /ð/ sound (in this) and the unvoiced /θ/ sound (in thank). The temptation is to use a substitute such as /f/, /s/ or /z/ so a French speaker may say fankyou, a German speaker may say zankyou and so on.
In English, the sounds /p/, /t/ and /k/ are often aspirated at the beginning of words (as in pat, cat and tack [/pʰæt/, /kʰæt/ and /tʰæk/]). Spanish, Dutch and Russian speakers, for example, may sound like they are saying bat, gat and dat for these words.
The voiced sounds /dʒ/ and /ʒ/ (as in jab [/dʒæb/] and leisure [ˈle.ʒə/]) do not exist in many languages and learners may use a native language alternative such as /tʃ/ or /ʃ/ saying chap or chab and lesher respectively.
Many languages, such as most Slavic languages, do not make a great (or any) difference between /w/ and /v/ and that results in vite for white etc. German, incidentally, does make the distinction but the spelling of the /f/ sound in German is often 'v', and the spelling of /v/ is often 'w' leading German speakers to confuse the sounds.
What are distinguished in English as full phonemes, making, for example, a distinction between pit and bit (/pɪt/ vs. /bɪt/) in English are not phonemes in other languages (such as most varieties of Arabic) so, even if the sounds exist in the learners' first languages, they may not perceive any difference between pox and box or cot and got and so on.
On the other hand, some languages, such as Thai use /pʰ/, /kʰ/ and /tʰ/ as distinct phonemes from the non-aspirated allophones /p/, /k/ and /t/.
Some sounds such as /h/ are often missing from languages, like Greek, which do not contain it so the substitution is often with /x/ as in the Scottish loch.
Thai has a very limited number of final consonants so /d/, /s/, /z/, /dʒ/, /ʒ/ /tʃ/, /ʃ/, /ð/ and /θ/ may all be replaced by /t/ or left out altogether.

Vowels

The vowel /æ/ as in sad does not exist in many languages (such as Turkish, Dutch, Scandinavian languages, the Chinese languages, Greek and German) so speakers of those languages will insert a native equivalent such as /e/ and say pet for pat.
Vowel length is extremely variable across languages so speakers of many languages (Italian, Greek, French, Chinese languages etc.) will often not distinguish, or hear the distinction, between heap and hip (/hiːp/ vs. /hɪp/).
Languages like Japanese, the Chinese languages and Greek have a much more limited range of vowels than does English so substitution is a real issue with /iː/ for /ɪ/, /e/ for /ɜː/ and many more.
Speakers of Thai and other South Asian languages will sometimes pronounce English diphthongs as long pure vowels so /eɪ/ becomes /e:/ and /əʊ/ becomes /o:/ making day sound like dare and boat sound like bought.

Timing

The distinction between broadly syllable timed languages, in which each syllable takes the same amount of time to utter, and stress-timed languages, in which the timing is controlled by stressed syllables, should not be taken as an absolute either-or difference.
Timing affects the production of weak vowel sounds and broadly stress-timed languages such as English, Dutch, German and the Scandinavian languages will use many more weakened syllables than the broadly syllable-timed languages such as French, Japanese, Spanish, the Chinese languages and Turkish.

This list could be considerably extended so you will need to do your own research. If you are not a native or fluent speaker of your learners' language(s), the simplest way is just to listen to them and make a note of the issues you hear. There are useful reference books such as Swan and Smith's Learner English (2001), Cambridge University Press and internet sources, if handled with care and scepticism, are often very helpful.
To help a little, here is a list of languages divided by the nature of the timing of syllables and sounds. It is not complete and disguises the fact that languages exists on a cline from highly syllable timed to highly stress timed with most occupying part of the middle ground:

BROADLY STRESS-TIMED LANGUAGES	BROADLY SYLLABLE-TIMED LANGUAGES
ARABIC (with variations) CATALAN DUTCH ENGLISH GERMAN PERSIAN (FARSI / DARI / TAJIK) PORTUGUESE (EUROPEAN) RUSSIAN SCANDINAVIAN LANGUAGES	CHINESE LANGUAGES (also tonal) FRENCH GREEK INDIAN LANGUAGES ITALIAN JAPANESE PORTUGUESE (BRAZILIAN) SPANISH SWAHILI THAI (also tonal) TURKISH VIETNAMESE (also tonal) WEST AFRICAN LANGUAGES

You also need to be aware of when languages do not differ so you don't waste time teaching and practising the already known.
For example: Spanish shares 16 consonants with English, Greek even more, German somewhat fewer and Russian shares 17 vowel sounds with English, Thai 14 but Italian only 7.

Two views of teaching pronunciation

Among teachers of English, there are two views about how to handle pronunciation:

As and when
In this view pronunciation work is only undertaken when it arises naturally out of the lesson and the learners' production. This approach:
1. requires the teacher to be alert to and prepared for problems as they arise
2. means that pronunciation teaching is often unplanned
3. does not allow for specially designed materials
4. may interfere with the timing and pacing of a lesson by inserting unplanned phases
5. is instant and gratifying for learners
6. can be short, sharp and to the point
7. takes advantage of language needs as they emerge
Planning a series of dedicated pronunciation lessons
This approach:
1. requires the teacher thoroughly to research the learners' likely problems bearing their first language(s) in mind
2. allows for time to plan and design targeted materials
3. is often appreciated by learners as a sign that the teacher takes the matter as seriously as they do
4. can be effectively combined with the as-and-when approach by having revision materials and procedures to hand

There's no right answer. What follows can be used with either approach.

Drilling

There is a separate guide to drilling on this site. This part only concerns drilling for pronunciation work.

There are conflicting theories concerning the usefulness of drilling learners. The debate is between those who believe that learning a language is essentially a process of acquiring new habits and those who believe that learning involves a cognitive, thinking process. The arguments include:

In favour	Against
Most learners like it	Some learners find it embarrassing
It's essential for pronunciation work	It makes no difference to learning
It makes production automatic	It's based on an outdated learning theory
Drills give learners confidence	Drills are meaningless and non-communicative
Drills provide valuable speaking practice	Drilling is boring

Modelling

Before learners can employ a listen-and-imitate strategy, they need, of course, to hear an accessible, accurate and natural model.

Accessible: This means that the salient feature you are trying to get learners to imitate is obvious to them.
If the pronunciation of a single phoneme is your target, that should be the model. If, for example, the distinction between the short /iː/ for /ɪ/ sound as in pip and the longer /iː/ for /ɪ/ sound as in peep is the concern (as it often is) don't bury the sounds in long tongue twisters at the modelling stage. Just form the two vowels clearly avoiding other sounds such as the initial /pʰ/ and final /p/ in both words which may also be difficult for some learners. Why not use deep and dip instead, if that's easier for your learners, or just model the vowel in isolation?
Make sure that the shape of your mouth and your lips are clearly visible to everyone. With some sounds, /ʊ/, /uː/, /ɔː/ and /əʊ/ as in put, loose, caught and coat, respectively, it is important to get the mouth shape right to be able to make the sounds accurately. That requires lip rounding. The amount of rounding varies and this affects the sound. Other vowels, such as /e/ and /iː/ as in bed and bead are not rounded but the latter requires lateral stretching of the mouth – say cheese. This should be obvious to learners so drilling from an audio recording or even a video recording is often unwise because the learners cannot see the information they need.
In terms of drilling stress and, especially, intonation, some form of exaggeration is often required so that the learners can easily identify the pattern they are being asked to imitate. This is particularly true for learners whose first languages exhibit a narrower pitch or stress range than does English (i.e., most of them).
Accurate: Obviously, you have to be drilling the sound you want the learners to produce so if you have a regional accent or your first language does not have the sound in question, you'll need to practise a little. We should be aiming at enabling our learners to pronounce the sounds in a regionally neutral manner. There's nothing wrong with, e.g., having a Newcastle, Mississippi or Hong Kong accent but you need to consider whether that is the accent you want your learners to acquire (and whether they want it).
Be careful with stress and intonation in this, because it takes some practice to be able to model a stress pattern over a sentence consistently in the same way so that learners are not being given conflicting messages. Audio taped models are useful in this respect because they are unchanging.
Natural: We need to distinguish here between the connected-speech version of a sound and its citation form (i.e., when it is pronounced in isolation, perhaps in a list). Many words change their form in connected speech and, for example, while clothes is pronounced /kləʊðz/ in isolation, it will often be pronounced as close (/kləʊz/) in connected speech.
Weak forms are another obvious case. For example, the word for will rarely be pronounced as /fɔː/ and almost never as /fɔːr / but will usually simply be /fə/.
Even at lower levels, it pays to avoid losing naturalness for the sake of clarity so be unafraid to practise contractions, elisions and weak forms.

An effective model is sometimes a silent model. Just mouthing vowels, especially, and some consonants such as the unvoiced /θ/ in thank can be as effective as saying it aloud because it allows the learners to focus on mouth shape and not worry about how they sound. You can do it like this:


'eee' /iː/	aah /ɑː/	sh /ʃ/

Backchaining

Actors often learn their lines by breaking up the part and learning the last section first. Dog trainers sometimes teach the final part of a command before the beginning and so do animal trainers in circuses. The theory, such as it is, is that the learners focus on the form not the meaning of what they are doing.

The procedure is simple. Instead of drilling the whole of a long word, phrase or sentence from the beginning, start at the end. E.g.:

Don't drill: inde > indepen > independent > independently
Drill instead: ently > pendently > dependently > independently
Don't drill: I would > I would love > I would love t' come
Drill instead: t' come > love t' come > would love t' come > I would love t' come

Who to drill

Many teachers confine themselves to drilling the whole class together or drilling individuals only.
The problem with drilling the whole class, especially if it's large, is that a) you can't hear everyone and b) people don't start and finish together so you get an overlapping cacophony.
Here are some alternatives:

Drill in small groups. If you have island tables in the room, that's easy. If you have rows or sides of a rectangle, drill one row or side at a time. It's easier for you and the learners to hear if they are getting it right.
Drill males and females separately. The mixed voice tones when they speak together make it difficult to hear what's right and wrong.
Select groups to drill randomly (or, actually, quite carefully):
- give everyone a letter and drill all the Bs, all the Ds etc.
- everyone wearing something red
- everyone wearing jeans
- everyone over 25
- everyone who drank coffee at breakfast time
  etc.
After you have modelled the target, get a learner to do the group selection or decide who should repeat.

Cognitive approaches

Drilling has its critics because it is often seen as a behaviourist throwback to a time when we believed that learners acquired the targets by a process of imitation, repetition and the acquisition of good habits. No longer.
If it is true that the language-learning process is one of forming and adjusting hypotheses based on the input you notice, then there is no obvious reason why this should not apply to the acquisition of the phonological system as much as it does to the grammar and lexical systems.

There are three possible approaches:

An inductive approach

You can, of course, expose your learners to rich examples of how the language is pronounced and assume that, being thinking animals, they will eventually work out how to form the sounds of the language acceptably. In other words, this is a modelling approach with the drilling that follows it.
It might work.

A deductive approach

This requires a bit more work on the teacher's part. A deductive approach to grammar, for example, involves providing the learners with the rules and then letting them loose on the data to form acceptable syntax.
A deductive approach to pronunciation involves telling learners explicitly how the sounds are formed and getting them to follow the rules to produce the sounds.
This is easier said than done because it requires some quite sophisticated understanding of mouth part positioning and, especially, tongue positioning for vowel sounds.
The usual way for all sounds is to use a diagram like this:
vocal tract

Sorting out the difference between voiced and unvoiced sounds is the easy part because you can get learners to place their hands on their throats to feel the vibrations that voiced sounds need.

Getting learners to distinguish between aspirated and non-aspirated sounds is also a simple matter of getting them to hold a small piece of paper in front of their mouths and try to make to move on /pʰ/, /kʰ/ and /tʰ/ but not move on /p/, /k/ and /t/.

Moving on to something more difficult, it is then possible to explain to learners, for example, the nature of a labio-dental voiced fricative (/v/) by telling them to position their bottom lips to touch the top set of front teeth and blow air between them while at the same time using some voice cord vibration. The same can be done for a range of consonant sounds that cause problems.
It is easier, usually, for people to produce the unvoiced sound before adding in voice so get people to produce /f/ before /v/, /ʧ/ before /ʤ/ and so on (i.e., fan - van, chain - Jane etc.).

Vowels are more of a challenge but some understanding of the required tongue position in terms of height above the floor of the mouth and distance from the back of the mouth can actually be quite productive. For that, you'll need a different diagram:
vowels

If you try saying beat, bit, bet, about, verse, cup, cap, cruise, foot, hot, fought, bark you will feel the tongue position change from left to right, top to bottom of the grid. In your mouth, it'll move up and down and forward and back depending on the vowel. If you can do this, so can your learners.
If you start with the extremes and distinguish between the sound /iː/ in, e.g., keys and the /ɔː/ sound in caught, for example, it becomes somewhat easier for learners to identify what they should be doing.
Once learners can do some of that, you can move on to lip rounding and vowel length in a similar way.
There are four defining characteristics of vowels: tongue position (forward or back), tongue height (up or down), length (long or short) and lip rounding (or stretching laterally)

A noticing approach

Now that making an audio (or even video) recording is so much simpler in classrooms, it is possible to apply a consistent noticing approach to the pronunciation issues your learners face.
A recording of your model (or someone else's) which learners can compare to their own output is often useful to help them notice the gap between their and a model production.

A noticing approach can be taken with all aspects of pronunciation from individual sounds, word stress, sentence stress, features of connected speech up to intonation patterns across longer utterances. Be aware, however, that the longer the targets and the more data the learners have, the harder it is for them to identify the aspects you need them to notice so be careful to guide and help.

A noticing approach can, naturally, be combined with either a deductive or an inductive approach to pronunciation work.

Other guides

This is, of course, not the end of the story by a long way, but it is somewhere to begin.

There is a range of other guides on this site to various aspects of pronunciation that you may like to consider.

Related guides
consonants	these guides are in the in-service section so they are more technical
syllables and phonotactics
vowels
word stress
connected speech
intonation
teaching troublesome sounds
a word-stress exercise for learners	these guides are in the initial-plus section so they are slightly easier
essentials of pronunciation
drilling
phonology terminology	whatever your background
learn to transcribe	whatever your background

Some more references to help:
Brinton, D, Goodwin, JM & Celce-Murcia, M, 1996, Teaching Pronunciation: A Reference for Teachers of English to Speakers of Other Languages, Cambridge: Cambridge University Press
Kenworthy, J, 1987, Teaching English Pronunciation, Harlow: Longman
There is a comprehensive bibliography of other references available at:
http://liceu.uab.es/~joaquim/applied_linguistics/L2_phonetics/Corr_Fon_Bib.html#Specific_works_on_pronunciation_teaching [accessed April 2017]