Assessing Speaking

Prepared by Elena Onoprienko, Yulia Polshina, Tatiana
Based on material by Fumiyo Nakatsuhara
Key questions
Nature of speaking
Speaking as a skill
Test purposes and types of test
Speaking test tasks
Washback effect
Key questions
What is
Why assess
Task types
Scoring criteria
Nature of speaking:
spoken language;
speaking as interaction;
speaking as a social activity;
speaking as a situation-based activity.
What is speaking?
A part of the shared social activity of talking
(Luoma, 2004: 29).
In comparison with writing, speaking is
• transient
• planned
• dynamic
• complex
• formal
• interpersonal
• content dependent. • lexically dense.
Speaking vs Writing
The main differences are in two sets of
conditions - processing and reciprocity:
•Processing is connected with time - speaking
is going on under greater pressure of time.
•Solution to this problem in spoken language
– reciprocity. Speakers take turns and create a
text together.
Spoken language
• Pronunciation
• Spoken grammar
• Lexis
• Speech is judged on the basis of pronunciation.
• What is standard? Native speaker vs non-native speaker.
• Communicative effectiveness, which is based on
comprehensibility and probably guided by native speaker
standards but defined in terms of realistic learner
achievement, is a better standard for learner pronunciation.
(Luoma, 2004).
• What to include in assessment of pronunciation?
• Pronunciation – individual sounds, pitch, volume, speed,
pausing, stress and intonation.
Spoken grammar:
• grammar is easy to judge because it is easy to
detect in speech and writing;
• speakers do not usually speak in sentences;
• speech consists of idea units connected with and,
or, but, or that;
• planned vs unplanned speech – complex
structures vs short idea units;
• the internal structure of idea units - topicalisation
and tails create an impression of naturalness.
Features of spoken lexis:
• ‘simple’ and ‘ordinary’ words are common in
normal spoken discourse and mark a highly
advanced level of speaking skills (Luoma, 2004);
• generic words (important for the naturalness of
• vague words;
• fixed conventional phrases;
• ‘small words’ (the more – the better perceived
Slips and errors
Normal speech contains a fair number of
slips and errors such as mispronounced
words, mixed sounds, and wrong words
due to inattention (Luoma, 2004).
Speaking as a skill
What is skillful speech?
•task fulfillment/content;
•vocabulary and grammar range;
Speaking as meaningful interaction
• Speaking is both personal and a part of the
shared social activity of talking.
• The openness of meanings is not only a
convenience in speech; it is also an effective
strategy for speakers. (Luoma, 2004)
• Chatting vs information-related talk.
• The role of speaking situations.
• Roles, role relationships and politeness.
Why assess speaking?
No single answer:
• different groups of language learners have different needs, such
– international travellers: language for travel, leisure;
– migrants: survival skills, access to employment;
– students: exams, academic communication, social
– professionals: workplace communication, presentations.
• different users have different purposes when they seek
information from tests;
• but most users of language do need to speak.
Test purposes and types of test
Test purposes:
• proficiency tests
• achievement tests
• placement tests
• diagnostic tests.
What do we need to decide before
giving a speaking test?
• what aspects of language we want to assess;
• how to elicit ratable language samples from testtakers suitable for the aspects of language.
We need to decide:
• rating criteria [marking categories, levels,
descriptors] [holistic scales vs. analytical scales];
• elicitation techniques / test format (types of
questions, task types).
Performance testing
Performance testing in second language
proficiency assessment is traditionally used
to describe the approach in which a
candidate produces a sample of spoken or
written language that is observed and
evaluated by an agreed judging process.
(McNamara, 1996)
What is performance testing?
sample of written or spoken language;
simulates behaviour in the real world - not
like paper-and-pencil ‘objective’ tests;
observed and evaluated by an agreed
judging process.
Speaking tasks
• A communicative task is a piece of classroom work
which involves learners in comprehending,
manipulating, producing or interacting in the target
language while their attention is principally focused
on meaning rather than form… (Nunan 1993:59).
• Speaking tasks can be seen as activities that involve
speakers in using language for the purpose of
achieving a particular goal or objective in a particular
speaking situation (Bachman and Palmer 2010).
Types of information-related talk
Factually-oriented talk:
• description
• narration
• instruction
• comparison.
Evaluative talk:
• explanation
• justification
• prediction
• decision.
Communicative functions
‘Microfunctions’ according to CEFR:
• giving and asking for factual information (describing reporting,
• expressing and asking about attitudes (agreement/disagreement);
• suasion (suggesting, requesting, warning);
• socialising (attaching attention, addressing, greeting, introducing);
• structuring discourse (opening, summarising, changing the topic);
• communication repair (signalling non-understanding, appealing
for assistance, paraphrasing).
(Council of Europe, 2001:123, Luoma, 2004:33)
Features of a speaking task:
• input, or material used in the task;
• roles of the participants;
• settings, or classroom arrangements for paired or group
• actions, or what is to happen in the task;
• monitoring, or who is to select input, choose role or setting,
alter actions;
• outcomes as the goal of the task;
• feedback given as evaluation to participants.
Candlin (1987) cited by Fulcher (2003)
Speaking test task formats
• Individual
• Open-ended tasks
• Paired
• Structured tasks
• Group
Advantages and disadvantages of
an interview
+ tester’s control over interaction
+ opportunity for an examinee to show the
range of their speaking skills
- it is costly in terms of tester’s time
- interviewer’s power over an examinee
Advantages and disadvantages of
paired formats
+ Capable of eliciting more symmetrical contribution to the
interaction from test-takers
+ Capable of eliciting much richer and more varied language
+ Positive reaction from test-takers (less anxious), a sign of
positive washback effect
+ Practical: time-efficient, cost-effective, less burden and less
training for the examiners
- The amount of responsibility on examinees who are not
trained in interview techniques
Advantages and disadvantages of
group formats
+ Well-received by learners
+ Support learning
- Difficult to administer and manage (size
of the groups and mixture of learners’
- Difficult to monitor the progress of the
Speaking test tasks:
oral presentation (verbal essay, prepared monologue);
information transfer (description of picture sequence,
questions on a single picture, alternative visual stimuli);
interaction tasks (information gap: student – student,
student – examiner, open role play, guided role play);
interview (free, structured);
discussion (student-student, student-examiner).
(O’Sullivan, 2008: 10-11)
Framework for designing test tasks
• Operations (activities/skills) - informational routines (e.g.
telling a story) and improvisational skills (negotiation of
meaning and management of interaction)
• Conditions under which the tasks are performed (e.g. time
constraints, the number of people involved and familiarity
with each other)
• Quality of output, the expected level of performance in
terms of various relevant criteria, e.g. accuracy, fluency or
(Weir, 1993: 30)
Developing criteria for assessing
• The importance of double marking for reducing unreliability
is undeniable.
• These criteria need to reflect the features of spoken
language interaction the test task is designed to generate.
• The criteria used would depend on the nature of the skills
being tested and the level of detail desired by the end users.
The crucial question would be what the tester wants to find
out about a student’s performance on appropriate spoken
interaction tasks.
(Weir, 1993, p.30)
Rating criteria
Phonological control; Grammatical accuracy; Vocabulary range; Fluency
(Council of Europe 2001)
Test format: interview format with the following structure:
1.Openings (1 minute).
2.Conversation on familiar topics (3 minutes) The interviewer asks the
candidate to talk about him/herself.
3.Picture Description (2 minutes) The interviewer asks the candidate to
describe a photo.
4.Conversation on topics from the given picture (5 minutes) The interviewer
asks the candidate questions linked to the picture (from general to extended
5.Closings (1 minute).
(Nakatsuhara, 2012)
Holistic scale
e.g. Trinity College
Bands A, B, C, D
Analytic scale
e.g. IELTS
Fluency and coherence
Lexical resources
Grammatical range and
Holistic rating scales
• Positive features:
– practicality: fast;
– rating holistically may be more naturalistic.
• Disadvantages:
– no useful diagnostic information: single score;
– not always easy to interpret: raters not required
to use same criteria to arrive at score.
Analytic rating scales
Positive features:
– can provide diagnostic information if scores reported
– potentially clear, explicit and detailed;
– usually more reliable (multiple scores);
– useful in training raters to focus on our construct;
– potentially useful in guiding learners.
– time-consuming;
– may overburden raters.
(Green, 2012)
The role of an interviewer
Interrater/ intrarater reliability
The solution – training raters:
• understanding criteria for assessment;
• agreement with other raters;
• consistency of performance.
Washback Effect:
The effect of testing on teaching and learning
Positive / negative washback:
• positive – test stimulates classroom
teaching of important skills;
• negative – narrow focus on teaching just for
the test.
