logo  ELT Concourse teacher training
Concourse 2

Connected speech

connect

If anything in the first part of this guide is unfamiliar to you, you should probably take a little time to refresh your memory concerning the essential concepts in phonology.  You can open that guide in a new tab by clicking here.  You should also have worked through the guide to consonants and the guide to vowels before tackling this.
It is also assumed, in what follows, that you can read and write phonemic transcription.


isolated

Isolated or in a stream?

Connected Speech phenomena occur where words meet.  The first distinction to get clear is that of the pronunciation of a word in isolation and in a stream of speech.  For example, if you read the words on this list aloud, one at a time, you will probably be pronouncing them in what is called their 'canonical', 'citation' or 'isolation' form.  Here's the list to try.  If you can, transcribe the words on a piece of paper as you pronounce them.  Click here when you have done that.

are been
have that
from and
ten bottles

Now memorise this sentence and then say it aloud at normal speed, contracting any words you can.

I have been to town and here are the ten bottles of beer I said that I would get from the shop.

That probably would have been pronounced something like this:

/aɪv bɪn taʊn ənd hɪər ə ðə tem ˈbɒt.l̩z əv bɪər ˈaɪ ˈseðət aɪd ˈɡet frəm ðə ʃɒp/

Look at the parts in black in that transcription and compare them to the transcription of the isolated forms of the words.  What do you notice?  Click here when you have an answer.


features

The features of connected speech

There are five main areas to understand.

Weak forms
weak
We saw examples of these in the sentence above.
The most common weak forms use the schwa (/ə/) so, for example:
    for
is pronounced /fə/
    are
is pronounced /ə/
    to
is pronounced /tə/
    but
is pronounced /bət/ (before a vowel)
or
    /bə/ in other environments
and so on.
There are other weakenings, such as the replacement of the /iː/ in been with the shortened /ɪ/ sound.  The word our in its full form is pronounced /ˈaʊə/ in isolation but is usually weakened to /ə/ or /ɑː/ in connected speech.
Most of these weak forms affect structural words rather than meaning-carrying words but the reduction of the sound at the end of father with the elision of the /r/ before a non-vocalic sound (in British English) is also an example of weakening and another feature of connected speech (elision).
For a list of the commonest weak forms in English, click here.
Assimilation
assimilation
This occurs when a sound is altered because the speaker is anticipating the following sound or influenced by a previous one (or both).
There are three possibilities:
Anticipatory assimilation:
In our example, ten bottles sounds like tem bottles because the speaker is anticipating the bilabial voiced consonant /b/ and changes the alveolar nasal /n/ to the bilabial /m/ to make pronouncing the /b/ sound easier.
Try saying
    his son and his daughter
quickly.
It is pronounced like this:
    /hɪz sʌn ndɪs ˈdɔː.tə/
The 's' in his daughter is not voiced as it is in his son.  (We drop the 'h' on the second his as well (that's also elision).)
(Anticipatory assimilation is sometimes, slightly confusingly, called regressive assimilation.)
Progressive assimilation:
Sounds may change because the speaker is influenced by the preceding sound.  For example, try saying
    There's not much cider left
quickly and focus on how the 'c' in cider is pronounced.  If you say cider individually, the 'c' is pronounced /s/ as one expects (/ˈsaɪ.də/).
However, in this environment, the influence of the /tʃ/ at the end of much means that the 'c' in cider is pronounced as if it were 'sh', as /ʃ/.  The transcription is, then, not
    /ðeəz.nɒt.ˈmʌtʃ.ˈsaɪ.də.left/
but
    /ðeəz.nɒt.ˈmʌtʃ.ˈʃaɪ.də.left/
A simple example of progressive assimilation occurs with the pronunciation of a plural 's' in English.  For example, words ending in unvoiced consonants such as /t/, /k/ or /p/ will make the plural 's' pronounced as /s/:
    hats and coats (/hæts.ənd.kəʊts/)
    talks and walks (/tɔːks.ənd.wɔːks/)
    tops and tips (/tɒps.ənd.ˈtɪps/)
but words ending with voiced consonants such as /d/, /ɡ/ or /b/ will have the 's' pronounced as /z/:
    odds and sods (/ɒdz.ənd.sɒdz/)
    lugs and mugs (/lʌɡz.ənd.mʌɡz/)
    bags and logs (/bæɡz.ənd.lɒɡz/)
It's even easier to spot the difference in
    cats and dogs (/kæts.ənd.dɒɡz/)
A similar pattern may be observed with the pronunciation of the regular past-tense ending in English.
After unvoiced consonants, the -d or -ed ending is usually pronounced as /t/ as in:
    asked (/ɑːskt/)
    spaced (/speɪst/
    tapped (/tæpt/)
but following voiced consonants it is voiced as /d/ as in:
    clubbed (/klʌbd/)
    fazed (/feɪzd/)
    dragged (/dræɡd/)
Reciprocal assimilation:
Here, sounds influence each other and may fuse together.  For example, try saying
    Won't you come with us
quickly and note how won't you is pronounced.  It is not /wəʊnt ju/ except in slow careful speech but is actually pronounced /wəʊntʃu/.  What has happened is that the 't' and 'y' sounds have coalesced to make the /tʃ/ sound.
(Reciprocal assimilation is sometimes called coalescent assimilation.)

Assimilation, by the way, also explains the tendency in English to mess with prefixes, using 'im-' before words beginning with bilabials (so we have impossible, impolite, immobile etc. rather than *inpossible, *inpolite, *inmobile).  On the other hand, words beginning in alveolar sounds such as /t/ or /d/ or velar sounds such as /k/ and /ɡ/ will normally take either un- or in- (so we have intolerant, undefined, unconnected, ungrateful etc.).  This is not an absolute rule, unfortunately, because exceptions such as unmoved, unpleasant etc. are common.
There are lots of possible assimilation changes in English.
Assimilation happens like this (after Field, 2008:150):
Before these sounds this sound assimilates to for example transcription
/m/, /b/, /p/ /n/ /m/ then bake it /ðem.beɪk.ɪt/
then put it /ðemˈpʊt.ɪt/
then mix it /ðe.mɪks.ɪt/
/t/ /p/ or /ʔ/ that mixture /ðəʔ.ˈmɪks.tʃə/
that bread /ðəp.bred/
that paper /ðəʔ.ˈpeɪ.pə/
/d/ /b/ or /ʔ/ mad man /mæʔ.mæn/
mad boy /mæʔ.ˌbɔɪ/
mad policy /mæb.ˈpɒ.lə.si/
/k/, /ɡ/ /n/ /ŋ/ bean cakes /biːŋ.keɪks/
bean good /biːŋ.ɡʊd/
/t/ /k/ or /ʔ/ that cake /ðəʔ.keɪk/
but go /bək.ɡəʊ/
/d/ /ɡ/ bed clothes /beɡ.kləʊðz/
/j/ /t/ /tʃ/ might you /maɪtʃu/
/d/ /dʒ/ had you /hədʒu/
/ʃ/ /s/ /ʃ/ glass shop /ˈɡlɑː.ʃɒp/
/z/ /ʃ/ has shut /hæ.ʃʌt/
The change noted above when had you is pronounced as /hədʒu/ occurs frequently and is sometimes called the Yod coalescence.  Other examples include:
    would you as /wʊdʒu/
    could you as /kədʒu/
and so on, instead of the more careful /wʊd.ju/ and /kəd.ju/.
laugh
Anticipatory assimilation also has comedic possibilities:
    I've got a job at a bowling alley.
    Ten pin?
    No, permanent.
If you have understood the joke, you have appreciated anticipatory assimilation.
Elision
gap
A clear example of this is the tendency in English to use contracted forms, leaving out whole sections of words (hasn't, can't, wouldn't've etc.), but there are other examples such as:
    the loss of the /d/ in sandwich (/ˈsæn.wɪdʒ/)
    the pronunciation of library as /ˈlaɪ.bri/ or comfortable as /ˈkʌmf.təb.l̩/
    the dropping of /h/ sounds in rapid speech, as in give it to him rendered as /ɡɪv.ɪt.tu.ɪm/.
Essentially, three kinds of elision are recognised (as well as the initial /h/ elision):
Function word reduction occurs when all or part of a function word such as of is elided as in
    cup of coffee
being pronounced
cuppa coffee
(/kʌpə ˈkɒ.fi/
In many cases the word and is reduced to 'n' as in tea 'n' cakes (/tiː n̩ keɪks/).
Polysyllabic word reduction occurs in our example of library as /ˈlaɪ.bri/ and also in many other longer words such as probably (/ˈprɒbli/), comfortable (/ˈkʌmf.təb.l̩/) etc.
Cluster reduction occurs when a consonant cluster, such as the one at the end of sixths, is simply difficult to pronounce.  The result is usually something like /sɪkθs/ or even /sɪkfs/.  Learners whose languages do not allow the same clusters as English are often tempted to use cluster reduction inappropriately, for example, pronouncing crisps as /krɪps/ rather than /krɪsps/.  For more see the guides to syllables and phonotactics and the guide to teaching troublesome sounds.
It is usually /t/, /d/, /p/ and /k/ which are elided in this respect, so, for example:
    text message
becomes /teks.ˈme.sɪdʒ/
    midst
becomes /mɪst/
    glimpse
becomes /ɡlɪms/
    and asked can be pronounced /ˈɑːst/.
A word that causes persistent problems is clothes because learners feel they should have a go at the consonant cluster at the end /kləʊðz/.  In rapid speech, however, the word is often pronounced /kləʊz/ with the elision of the /ð/.  If learners always say it that way, they will never be misunderstood and it's a good deal easier for them.
The same phenomenon is observable with the unvoiced /θ/ sound so asthma is pronounced as /ˈæ.smə/.
Occasionally, elision can become fixed in the language so, for example, the confection now known as ice cream was originally iced cream but the /t/ sound of the letter 'd' was routinely elided and the phrase took on its current spelling.

There is some overlap and some debate about whether certain phenomena are examples of assimilation or simple elision.
For example, in the table above, we have classified the dropping of the /s/ sound when it precedes /ʃ/ as a case of assimilation.  So we get, e.g.:
    face shape
pronounced as
    /feɪ.ʃeɪp/
rather than
    /feɪs.ʃeɪp/
At first sight this appears to be a case of elision because the /s/ is not changed, it is omitted entirely.  However, there is some evidence that the /ʃ/ sound is lengthened in connected speech so the correct transcription might properly be
    /feɪsʃ.ʃeɪp/
retaining both instances of the phoneme and clearly constituting a change rather than an omission.
We can avoid the debate altogether and simply refer to both phenomena as simplifications, of course.
For teaching purposes, a technicality like this is not something on which to dwell.

Catenation
chain
This usually occurs when the consonant sound at the end of one word joins the vowel at the beginning of the next so we get, for example
    an orange
pronounced as
    a norange
(/ə nˈɒ.rɪndʒ/)
and
    right arm
becomes something like
    rye tarm
(/raɪ tɑːm/).
Note, too, the way the pronunciation of
    the boys of Eton
differs from
    the boys have eaten
in rapid speech.
A by-product of catenation, incidentally, is the phenomenon variously known as false splitting, misdivision, false separation or coalescence in which a word such as apron, originally from the Old French naparon, is falsely separated into the Modern English an apron.  There are other examples in the guide to word formation.
In British English, the final 'r' on many words is unsounded so, for example, harbour is pronounced as /ˈhɑː.bə/, whereas in AmE, the standard pronunciation includes the /r/ sound and the pronunciation is /ˈhɑːr.bər/.
However, when a word ending in 'r' immediately precedes a word with an initial vowel, we get a phenomenon known as the linking /r/ and the sound is produced so, for example:
    My father asked
will be pronounced as
/maɪ.ˈfɑːð.ər.ˈɑːskt/ in BrE and as
/maɪ.ˈfɑːð.r̩.ˈæskt/ in AmE.
Juncture
boundary
This refers to boundaries between words and awareness of it allows us to distinguish between, for example:
    I scream
and
    ice cream
or
    my turn
and
    might earn
Usually, the distinction between these pairs is recognisable by either stress:
    /ˈaɪ.skriːm/ vs. /ˈaɪ.ˌskriːm/
or whether a consonant is aspirated:
    /maɪtʰɜːn/ vs. /maɪtɜːn/
or by noticing the syllabic structure:
    /maɪ.tɜːn/ vs. /maɪt.ɜːn/
The detail of how we identify the juncture between words is actually usually redundant because the context almost invariably makes clear what is meant and should be understood.
Other examples of juncture provided by Roach (2009: 116) include:
might rain vs. my train
(in the first, the /r/ is voiced and in the second it is voiceless)
all that I'm after today vs. all the time after today
(in the first, the final /t/ on that is unaspirated and in the second the initial /t/ on time is aspirated)
Intrusion
intrude
This is, in contrast, the addition of sounds in connected speech.  The three sounds usually intruded are the approximants (semi-vowels) /w/, /j/ and /r/.  Consider the pronunciation of these phrases and note the transcriptions (the intrusive sounds are in red).
an intrusive /w/:
    go on (/ɡəʊw ɒn/)
    hoe in (/həʊw.ɪn/)
an intrusive /j/:
    I ate it (/aɪj et ɪt/)
    fly it (/flaɪj.ɪt/)
an intrusive /r/:
    law and order (/ˈlɔːr ənd ˈɔː.də/)
    Victoria and Albert Museum (/vɪk.ˈtɔː.rɪər.ənd.ˈæl.bət.mjuː.ˈzɪəm/)
An intrusive /j/ sound may occur in individual words so, for example, British English speakers may insert /j/ in words such as tune, fortune, produce, century, nature, mixture, picture, creature, opportunity, situation, actually in which the /t/ or /d/ sound is followed by a /j/ not shown in the spelling.
Therefore, the transcription is actually:
    tune /tjuːn/
    actually /ˈæk.tjuə.li/
    situation /ˌsɪ.tjʊ.ˈeɪʃ.n̩/
etc. although ˈæk.tʃuə.li/ and /ˌsɪ.tʃʊ.ˈeɪʃ.n̩/ are also heard.  Not all speakers do this.
In some speakers' production, the intrusive sound is avoided and replaced by a glottal stop so, for example, we may find
    Go out
produced as /gəʊʔaʊt/ rather than /ɡəʊ.ˈwaʊt/,
    The gorilla and me
produced as /ðə.ɡə.ˈrɪ.ləʔənd.miː/ rather than /ðə.ɡə.ˈrɪ.lə.rənd.miː/
and
    I am here
produced as /aɪʔæm.hɪə/ rather than /ˈaɪ.jæm.hɪə/.

biscuit
Intrusion, too, has comedic possibilities.
In the 19th century it was common for ship's biscuits to be attacked by small insects called weevils.  In a famous scene from Patrick O'Brian's book concerning the era, we find the following exchange:
    You see those weevils, Stephen? said Jack solemnly.
    I do.
    Which would you choose?
    There is not a scrap of difference ... there is nothing to choose between them.
    But suppose you had to choose?
    Then I should choose the right-hand weevil; it has a perceptible advantage in both length and breadth.
    There I have you, cried Jack. Don't you know that in the Navy you must always choose the lesser of two weevils?

(O'Brian 1970)
If you have got the joke, such as it is, you have understood the nature of the intrusive /w/ in the lesser of the two evils.

Erroneous intrusion:
Learners whose languages do not have many (or any) consonant clusters are often tempted to intrude a vowel, often a /ə/, /ɪ/ or a /e/, between elements of a difficult cluster.
Many Arabic speakers, for example, may pronounce screwdriver as /ˈsekəruː.dəraɪ.vər/ rather than /ˈskruː.draɪ.və/, i.e. 6 instead of 3 syllables.  Japanese speakers may do likewise.
Speakers of many other languages will produce crisps as /krɪspəs/ or /krɪspes/ instead of /krɪsps/ and we saw above that clothes is often produced as /kləʊðez/ or /kləʊðɪz/ instead of /kləʊðz/.
Speakers of other languages, notably French and Italian, are also tempted to intrude a redundant /h/ sound and pronounce, e.g.:
    He is my ally
as /hi.hz.maɪ.ˈhæl.aɪ/ when it should be /hi.z.maɪ.ˈæl.aɪ/.

Of course there's a test.



Go to the index for the pronunciation section of the in-service guides


References:
Field, J, 2008, Listening in the Language Classroom, Cambridge: Cambridge University Press
O'Brian, P, 1970, Master and Commander, London: Collins
Roach, P, 2009, English Phonetics and Phonology: A practical course (4th edition), Cambridge: Cambridge University Press