Pronunciation: the overview

overview

Before you start following the guides to aspects of pronunciation, it will be useful to explain / remind you of some of the key areas you need to be familiar with if you are hoping to teach people how to say things acceptably in English.

Where the sounds are made

Understanding the terminology in this area depends on having a knowledge of the mouth parts where humans make sounds. Here is the cut-out-and-keep picture to have by your side. Download it here.

vocal tract

Articulators affect the type of sound you make. Cavities affect the quality of the sound.

Some key terms

Phonetics

is the study of all speech sounds and is applicable to all languages. It is concerned with how we produce sounds, how they are transmitted physically and how they are perceived and decoded.

Phonology

is to do with the sounds of a specific language rather than all human speech and it is this which concerns us here.

The phoneme

is the basic sound unit of a language. We combine phonemes into morphemes (meaning units) and into words. The phoneme is written with its symbol between two slash marks: /t/ = the 't' sound in later.
The phoneme is an ideal type for the sound it represents. Individual speakers may pronounce certain phonemes quite variably but native speakers will generally recognise which phoneme is being used (and make a mental adjustment if it varies from what they produce). In this sense, phonemes are a way of digitalising the information contained in the sound stream. For example:
Although /b/ and /p/ are distinct phonemes in English, with the first voiced and the second unvoiced, the amount of voicing is not an on-off phenomenon because in some environments voicing will be more energetically done than in others. Compare, for example, the voicing on the /b/ sound in rubber and the same sound at the end of pub. In the former, voicing is at the more extreme end of the cline from /p/ to /b/ than it is in the second case. Nevertheless, native speaker hearers will easily note the difference between pup and pub even though the amount of voicing on the /b/ is quite minimal.
Phonemes may be better described as sets of allophone (see below) than as single sounds. The fact that some of them are variably pronounced is not semantically significant.

Consonants

When you produce a sound by completely or partially blocking the air flow through the vocal tract, you produce a consonant. For example, if you block and then release air through pressing your lips together, you will produce the sound /p/. If you block the back of your mouth by raising your tongue, you will produce /k/.

Vowels

If you produce a sound without blocking the air flow you will make a vowel such as the 'a' in cat (/kæt/). The quality of the sound is affected by where your tongue is vertically and horizontally in the mouth and whether your lips are rounded or not.

Semi-vowels

are sounds which are produced like vowels but actually don't function like them. An example is the /j/ sound at the beginning of the word yet. The y letter represents a consonant in this case but at the end of the word fly, it is a vowel and transcribed as /flaɪ̯/. The letter w also has this characteristic: at the beginning of was it is close to being a consonant (called a glide, in the trade) but in the centre of cower it is a vowel sound so the transcription of was cowering is /wəz ˈkaʊər.ɪŋ/.

Markedness

In some theoretical approaches, it is asserted that certain phonemes are more heavily marked than others. For example, the unvoiced consonants (/p/, /t/, /k/ etc.) are less marked, and therefore more ubiquitous across languages than their voiced equivalents (/b/, /d/, /ɡ/ etc.). The significance is that these marked phonemes are more difficult to acquire because they are not common to most languages, and deserve more attention. Vowels, too may betray the same distinctions with high back vowels being less marked than low front vowels. For more, see the guide to markedness (new tab).

Minimal pairs

are words which differ in meaning because of a change to a single phoneme. In fact, this is part of the definition of a phoneme.
For example, the words, cat and hat are only distinguished by the first phoneme; /k/ in the first case, /h/ in the second. In English, these are minimal pairs so the sounds are phonemes. Some languages do not recognise the distinction between /h/ and /k/ in this way so in those languages the words are not minimal pairs and the sounds are not phonemes.
For example, /p/ and /b/ can readily be seen to be phonemes in English by applying the minimal pairs test. We know that bat and pat are different words with different meanings so the sounds, /p/ and /b/, are phonemes. In some languages (e.g., most varieties of Arabic) changing /b/ to /p/ will have no effect on the meaning of a word so in those languages the sounds are not phonemes.
To see how minimal pairs may be exploited in the classroom, see the guide to teaching troublesome sounds (new tab).

Allophone

If you say kid and skid aloud you will have produced two different allophones. In the first, there is a slight aspirant or /h/ sound following the /k/; in the second it is absent. These sounds are different but they are not phonemes. If you add the aspirant to the sound in skid you will not make a different word although it will sound odd. In some languages, the two sounds are phonemes and people will understand a different meaning if the /k/ is aspirated or not. These allophones are written either like this: /k/ or like this: /kʰ/. The other aspirated / non-aspirated pairs in English are /p/, aspirated to /pʰ/ and /t/ produced as /tʰ/.
The other common pair of allophones in English affects the pronunciation of /l/. The light /l/ sound appears, for example, at the beginning of light and the dark sound, transcribed phonetically as [ɫ] appears at the end of a word like full.
Allophonic variation is not an on-off phenomenon because there are variations, for example, in the amount of aspiration given to some sounds and the amount of voicing to others on a cline from no aspiration to full aspiration and no voicing to full voicing.
Sets of allophones form single phonemes, in other words.

Voicing

describes how phonemes may be different depending on whether the vocal cords vibrate or not at the time of pronunciation. For example, the /k/ sound is made without voicing but the /ɡ/ sound is made with the mouth parts in the same place but with voice added. If you put your hand on your throat and say the words sue and zoo, you will see what is meant and feel a slight vibration on the second word (/s/ is unvoiced but /z/ is voiced).
Voicing is not an either-or distinction. There is a cline from the fully unvoiced to the fully voiced. For more, see the guide to consonants.
Unvoiced sounds are described as fortis sounds and voiced sounds as lenis.

Intonation

is the way in which the speaker's pitch (or tone) rises and falls to signal, e.g., a question, surprise, disappointment etc. It is frequently shown in transcriptions using arrows: ↑↓→ etc.

Stress

is the term used to describe the emphasis speakers give to certain syllables in a word or certain words in a sentence. For example:
Word stress: when the stress falls on the first syllable in export the word is a noun (export), when it falls on the second, it's a verb (export).
The conventional way to mark stress in transcriptions, used throughout this site and by most authorities, is to place a short, raised vertical line before the stressed syllable (ˈ). Secondary stress is denoted by a short lowered vertical line (ˌ). So, for example, the word unbelievably is transcribed as:
    /ˌʌn.bɪ.ˈliː.və.bli/
with the secondary stress marked on the first syllable and the primary stress marked on the third syllable.
You may encounter other ways to mark stress but this is the convention used almost everywhere.
(There is an exercise on word stress for teachers and advanced learners on this site. Click here to go there.)
Sentence stress in English usually falls on the new information being provided and that, for English, generally comes towards the end of the utterance. So for example in the exchange:
    A: What did you do yesterday?
    B: I went to see my mother.
the first speaker will normally stress yesterday and the second speaker will normally stress my mother because that is the key information in both cases.
We can, of course, stress other elements in order to emphasise their importance. This is called special stress. For example, try reading these sentences aloud, stressing the word in bold:

I went to London with my brother (i.e., not another person)
I went to London with my brother (i.e., not to another place)
I went to London with my brother (i.e., it was not someone else who went with my brother)
I went to London with my brother (i.e., not someone else's brother)

Stress and syllable timing

The following is the theory.
There are, it is claimed, two fundamental forms of stressing.
In some languages, such as French, Italian, Spanish, Cantonese and Mandarin, every syllable is perceived as taking up the same amount of time. This is the so-called 'machine gun' sound of these languages. So we get:
I ... went ... to ... Lon ... don ... with ... my ... bro ... ther
That's syllable timing.
In other languages, notably English, Dutch, Persian languages and Scandinavian languages, some syllables take longer to utter than others and this results in a reduction of the syllables in between. So we get
Iwentto ... L o n d'n ... withmy ... b r o the(r)
That's stress timing.
For this reason, the preposition to is not pronounced in its full form as /tuː/ (rhyming with 'two' and 'too') but with a weak form of the vowel /tə/. The funny, upside-down 'e' is called a schwa and is the commonest weak form in English. Additionally, my is often reduced to m' and so on and in most varieties of British English the final /r/ sound on brother is not pronounced.
Be aware that even if this distinction exists, it is not an either-or one. Languages will vary along a cline from syllable- to stress-timing tendencies.
There is, in fact, a third form of timing: Mora timing. In Japanese, e.g., a vowel (V) takes the same time to utter as a consonant (C) plus a vowel so V takes the same time as CV and CVV takes twice as long as CV. Slovak is often considered also to be a Mora-timed language.

Tone unit

Tone units are also called intonation contours and tone groups. They are also called sense units because identifying them aids understanding the sense of what is heard. Authorities do not agree about how the tone unit is divided (or even what it's called) but most would agree that the most important part of it is the nucleus or tonic syllable where the key information the speaker wishes to convey is found.
For example, we can divide a sentence such as:
    She spoke to the man outside the house
into two distinct patterns of tone units, separated by ||, like this:
    She spoke to || the man || outside the house
which convey the fact that where she spoke to the man is important or we can have:
    She spoke to || the man outside the house
which convey the fact that we now know which man it was that she spoke to.

Weak forms

Because English is a stress-timed language (allegedly), many vowels are reduced in rapid, connected speech so, e.g., for is pronounced /fə/ (not /fɔː/), been is heard as /bɪn/ (not /biːn/), we is heard as /wɪ/ and so on.
For a list of the commonest weak forms in English, click here.

The elements of pronunciation

This is the index of other guides in the in-service pronunciation section.
the overview of pronunciation	connected speech	consonants
intonation	minimal pairs (PDF)	minimal pairs transcription test
sentence stress	syllables and phonotactics	teach yourself transcription
teaching pronunciation IP	teaching troublesome sounds	verb and noun inflexions IP
vowels	word stress	identifying word-stress IP
Guides marked IP are in the initial plus section.

Some references for teaching pronunciation

You may find some of the following useful if you are doing more research.

Bradford, B, 1998, Intonation in Context, Cambridge: Cambridge University Press
Brazil, D, 1994, Pronunciation for Advanced Learners of English, Cambridge: Cambridge University Press
Brazil, D, Coulthard, M and Johns, C, 1980, Discourse Intonation and Language Teaching, Harlow: Longman
Coulthard, M, 1985, An Introduction to Discourse Analysis, Harlow: Longman
Dalton, C and Seidlhofer, B, 1994, Pronunciation, Oxford: Oxford University Press
Jenkins, J, 2007, The Phonology of English as an International Language, Oxford: Oxford University Press
Kelly, G, 2000, How to Teach Pronunciation, Harlow: Longman
Kenworthy, J, 1987, Teaching English Pronunciation, Harlow: Longman
Roach, P, 2009, English Phonetics and Phonology – a practical course (3rd Edition), Cambridge: Cambridge University Press
Underhill, A, 1994, Sound Foundations, London: Heinemann
Walker, R, 2010, Teaching the Pronunciation of English as a Lingua Franca, Oxford: Oxford University Press
Wells, JC, 2006, English Intonation, Cambridge: Cambridge University Press