
1、Write out the full form of the following acronyms. (1x10=10)
SD = standard deviation (标准差)
CV = coefficient of variability (差异系数)
FV = facility value (易度值)
MC format = multiple-choice format (多项选择题)
TOEFL = Test of English as a Foreign Language (托福考试)
IRT = item-response theory (项目反应理论)
NR test = norm-referenced test (常模参照测验)
CR test = criterion-referenced test (标准参照测验)
CLA = communicative language ability (语言交际能力)
CTS = classical true score theory (经典真分数理论)
G-study = generalizability study (概化研究)
D-study = decision study (决策研究)
SEM = standard error measurement (标准误差测量)
ICC = item characteristic curve (项目特征曲线)
ACTFL = American Council on the Teaching of Foreign Language
(全美外语教学学会)
TIF = test information function(测试信息功能)
ANOV A = analysis of variance(方差分析)
IELTS = International English Language Testing System (雅思考试) MTMM design = multitrait-multimethod design
RL approach = real life approach
IA approach = interactional/ability approach
2、Give the Chinese or English version of the following terms. (1x10=10)
结构主义/心理测量法the structuralist-psychometric approach
定量评估方式quantitative modes of assessment
考试后效作用washback effect
语言磨蚀language attrition
进行性评估formative evaluation
终结性评估summative evaluation
分离式考试discrete point test
考试信度test reliability
平行卷测试法parallel-form method
共时效度concurrent validity
构念效度construct validity
因子分析factor analysis
考试规范test specification
考试命题细目表test development chart
学期档案袋式评估portfolio
整体评分法holistic scoring
信息沟information gap
转换分数weighted score
标准分数standard score
频数分布frequency distribution
正态分布normal distribution
易度指数或值item facility index or value
题目区分度item discrimination
区分度指数discrimination index
干扰项/选择项分析distractor analysis
3、Define the following terms. (4x5=20)
(1)G-theory (概化理论)
Generalizability theory (G-theory) provides a conceptual framework and a set of procedures for examining several different sources of measurement error simultaneously. Using G-theory, test developers can determine the relative effects, for example, of using different test forms, of giving a test more than once, or of using different scoring procedures, and can thus estimate the reliability, or generalizability, of tests more accurately. ‘G-theory’ has recently been used to analyze different sources of measurement error in subjective ratings of oral interviews and writing samples.
(2) Item response theory (项目反应理论)
Item response theory (IRT) is a powerful measurement theory that provides a superior means for estimating both the ability levels of test takers and the characteristics of test items (difficulty, discrimination). If certain specific conditions are satisfied, IRT estimates are not dependent upon specific samples, and are thus stable across different groups of individuals and across different test administrations. This makes it possible to tailor tests to individual test-takers’ levels of ability, and thus to design tests that are very efficient in the way they measure these abilities. These characteristics are particularly useful for developing computer-adaptive tests, and item response theory is being used increasingly in the development and analysis of language tests.
(3)Pragmatic competence (语用能力)
According to Van Dijk, pragmatics is concerned with the relationships between utterances and the acts or functions that speakers (or writers) intend to perform through these utterances, which can be called the illocutionary force of utterances, and the characteristics of the context of language use that determine the appropriateness of utterances. The notion of pragmatic competence presented here thus includes illocutionary competence, or the knowledge of pragmatic conventions for performing acceptable language functions, and sociolinguistic competence, or knowledge of the sociolinguistic conventions for performing language functions appropriately in a given context.
(4)Sociolinguistic competence (社会语言能力)
Sociolinguistic competence is the sensitivity to, or control of the conventions of language use that are determined by the features of the specific language use context; it enables us to perform language functions in ways that are appropriate to that context. This includes sensitivity to differences in dialect or variety, to differences in register and to naturalness, and the ability to interpret cultural references and figures of speech.
(5)Spearman-Brown prophecy formula
Spearman-Brown prophecy formula yields a split-half reliability coefficient:
r xx' = 2r hh'
1 + r hh'
where r hh' is the obtained correlation between the two halves of the test. Two assumptions must be met in order to use this method. First, they have equal means and variances. Second, the two halves are experimentally independent of each other.
(6)Coefficient alpha
Cronbach (1951) developed a general formula for estimating internal consistency which he called‘coefficient alpha’, and which is often referred to as‘Cronbach’s alpha’:
α =
where k is the number of items on the test, Ʃs2i is the sum of the variances of the different parts of the test, and s2x is the variance of the test scores.
(7)Construct validation
Construct validity concerns the extent to which performance on tests is consistent with predictions that we make on the basis of a theory of abilities, or constructs. A construct is defined as‘a postulated attribute of people, assumed to be reflected in test performance’. Thus, constructs can be viewed as definitions of abilities that permit us to state specific hypotheses about how these abilities are or are not related to other abilities, and about the relationship between these abilities and observed behavior.
(8)Plato’s problems
The Plato’s problem is also called the logical problem. This means the fact that children come to know more about the structure of their language than they could reasonably be expected to learn from the language samples available. So when exposed to confusing information or when guidance or correction is not available, children, born with UG, can discover for themselves the underlying rules of the language system.
4、Answer the following questions. (4x10=40)
(1)What is strategic competence? Try to exemplify the influence of strategic competence on language test performance.
Answer: One characteristic of recent frameworks of communicative competence is the recognition of language use as a dynamic process, involving the assessment of relevant information in the context, and a negotiation of meaning on the part of the language user. There have been two approaches to defining communication strategies: the‘interactional’definition and the‘psycholinguistic’definition. According to Canale, strategic competence refers to mastery of verbal and nonverbal strategies both (a) to compensate for breakdowns in communication due to insufficient competence or to performance limitations and (b) to enhance the rhetorical effect of utterances. So strategic competence is seen as the capacity that relates language competence, or knowledge of language, to the language user’s knowledge structures and the features of the context in which communication takes place. Strategic competence performs assessment, planning, and execution functions in determining the most effective means of achieving a communicative goal.
At this point we may well wonder about the extent to which strategic competence affects scores on language tests. Suppose that two nonnative speakers of a language were to take three tests: a test of usage, a test of contextualized receptive performance in which the scores are influenced in part by practical outcomes, and a test of productive oral performance. Suppose we find that the two subjects’scores are the same on the first two tests but different on the third. When we analyze tapes of the third test, we find that the more effective test taker made use of more of the various different ways of performing illocutionary acts than did the second, and that her propositions made more references to relevant objects in the environment. The less effective test taker just replied that she just didn’t think of them at the time, or she didn’t notice the objects in the environment, or It didn’t seem worth the effort. The more effective language user is more willing and adept at making use of what she knew was available in order to perform a function using language, but we would be reluctant to say that the two speakers’language competence differed.
(2) How do tests differ from evaluations and measurements?Give some specific examples of the following:
a. measures that are not tests
b. measures that are not evaluative
c. evaluation that does not involve measurement.
Answer: The terms‘measurement’,‘test’, and‘evaluation’are often used synonymously.
