Linguistics and Other Fields

Crystal D. Linguistics. Second ed.
Penguin Book, 1990. – pp. 256-267.
The main merit of research over the past few years is that people now have a much clearer idea as to what the important questions of linguistic theory are: over the next few years, we may go some way towards solving some of them. It should be clear from this attitude, then, that those who clamour for applications of linguistics - myself included - are not likely to be satisfied for a while. Too much of the subject is in an unformulated state to be able to be applied in any useful way to the study of some other field - though, as we shall see, some restricted areas have come to be fairly well investigated and introduced. The absence of any complete grammar of English (which has been the most analysed of all languages) is one of the most obvious limitations of the applicability of linguistics at the present time. The presence of so much fundamental theoretical disagreement, which has to be gone into before one can adopt a particular 'applied' line, is another. However, it would be wrong to criticize linguistics for failing to come up to expectations, or for being too negative (in its criticisms of earlier work), or for being too complicated and abstract - such criticisms are not uncommon. The negative flavour of early linguistics was, as we have seen, an essential preliminary to the development of a more constructive and open-minded state of mind on the part of language scholars. Understanding the weaknesses of early accounts of language helped them to reach an understanding of the fact that it was complex, and to appreciate the nature and extent of its complexity. It was this awareness which promoted the careful analysis of data and the development of the necessary (albeit abstract) distinctions of phonetics, morphology, and the other levels. It is in fact this very complexity which is the reason why linguistics has not developed further than it has. It would be perfectly possible for any competent linguist to sit down and write a linguistic grammar of English, in the light of available knowledge, for the purpose of language teaching; but it is unlikely that it would be a wholly satisfying job. There is still too much dispute about the theoretical principles on which such a grammar should be based, too much dispute over terminology, and too much uncertainty over the facts of the language, to produce a sound, comprehensible and comprehensive grammar. And bearing in mind that linguistics has been with us such a short time, this inadequacy is perhaps not surprising. A great deal has nonetheless been achieved.
Awareness of this inadequacy has not of course stopped people from trying to write such grammars; nor should it. The more attempts there are to formulate adequate grammars for particular applications in teaching and elsewhere, the more quickly the difficulties will be appreciated, and the sooner they will be overcome. What is important is that the potential users of these books should not make premature demands for their production (rushed research is regretted research), and that the authors of these books - or their publishers - should not make premature claims for their product. This prematurity can be possible in two ways. First, a linguistic introduction to the structure of English, let us say, can be premature in the sense that the kind of model in which it presents its rules and facts has been outdated by new ideas about the nature of the model, or about the formalization of the rules, or even about the nature of the facts (e.g. new statistical information about usage having become available). This has often happened, particularly in generative grammar, where the development of ideas has been so rapid that a grammar book is liable to find itself dismissed as old-hat by linguists, even when it is hot off the press. Naturally, teachers who are trying to get to grips with generative grammar are disturbed by this reaction; but they should not be, if they appreciate the inevitable movement in the progression of scientific theory. They should use a grammar book, for the time being, not as an authoritative account of linguistic structure, that has to be taught to the letter; but as a set of suggestions about ways of looking at language which they are likely to find illuminating and applicable to specific problems. This can be done even though there is a likelihood of further developments in the subject which will make some of the specific features of the approach redundant. This critical attitude is also helpful, I believe, in that it helps to reduce the difficulties inherent in the second cause of prematurity mentioned above, namely, that not enough is known about the psychological and other demands linguistics makes upon the student, or about the methodological difficulties involved in grading linguistic material for presentation pedagogically. One book may be suitable for pedagogical context A (e.g. language teaching to immigrants from the West Indies), but not for context B (e.g. language teaching to immigrants from India and Pakistan). Teachers, however, who are eclectic in their use of linguistic material, who build up, in a personal but informed way, their own 'theory' of language and their own description of English, bearing in mind the specific needs of the situation in which they are working, are likely to avoid the more serious of these pragmatic difficulties. This of course is what many teachers already try to do, if they are in the unfortunate position of not having an applied linguistics research project trying to do the job for them (and there are more and more such projects producing materials in a variety of fields these days). It is good to see an increasing number of centres in Britain and the United States organizing courses, conferences, in-service training, and the like, in order to try to bridge the gap between theory/research and pedagogy, and to develop a positive and selective state of mind of this kind.
For such a gap does exist, and there is no point in trying to deny it. There is a considerable gap in this book, for instance, between the practical claims and suggestions which show the potential applicability of the subject. There might almost seem to be two subjects involved, the study of language, on the one hand, and the study of linguistics, on the other - and there are those who make this distinction in their work. But ultimately there is and can be no such distinction: whether or not we commit ourselves to the detail of a specific linguistic approach, when we commence the study of language, on no matter how small a scale, we are necessarily committed to the demands for clarity, consistency and accuracy, which it is the ultimate purpose of linguistic study to fulfil. As soon as we ask ourselves how we are using terms, as soon as we impose a certain grading or selection on material, we are committing ourselves to a particular linguistic view of the world. Whether we realize it is another matter. Naturally, one hopes that intelligent people will take pains to realize what they are doing - linguists included. But developing this awareness of principles of analysis is at once to do linguistics. There is no natural gap between theory and practice in language study; but there is a very real psychological and practical gap, due to the apparent complexity of many linguistic ideas, and the lack of time and material for people outside the subject to get into it. Indeed, the bridging of this gap is the whole purpose of the present book.
But there is another way in which this gap can be bridged, through the development of the relationship between linguistics and other fields of study. A cardinal principle underlying the whole linguistic approach is that language is not an isolated phenomenon; it is a part of society, and a part of ourselves. It is a distinctive feature of human nature (some, who talk of 'homo loquens', say it is the distinctive feature); and it is a prerequisite -or so it would appear - for the development of any society or social group. […] it enters into a very large number of specialized fields. Consequently, it is not possible to study language, using the methods of linguistics or any other, without to some extent studying - or at least presupposing the study of - other aspects of society, behaviour, and experience. The way in which linguistics overlaps in its subject-matter with other academic studies has become well appreciated over the last few years, and in the past decade we have seen the development of quite distinct interdisciplinary subjects, such as sociolinguistics, psycholinguistics, philosophical linguistics, biological linguistics, and mathematical linguistics. These, as their titles suggest, refer to aspects of language which are relevant and susceptible to study from two points of view (sociology and linguistics, psychology and linguistics and so on), and which thus require awareness and development of concepts and techniques derived from both. And as many of the points of contact refer to issues which are obviously of everyday concern, these marginal branches of the subject stand a much better chance of avoiding the charges of irrelevance levelled at its 'purer' aspects. This can be seen by looking briefly at the kind of topic covered by the two most important branches to have developed so far, sociolinguistics and psycholinguistics.
Sociolinguistics studies the ways in which language interacts with society. It is the study of the way in which language's structure changes in response to its different social functions, and the definition of what these functions are. 'Society' here is used in its broadest sense, to cover a spectrum of phenomena to do with race, nationality, more restricted regional, social and political groups, and the interactions of individuals within groups. Different labels have sometimes been applied to various parts of this spectrum. 'Ethno-linguistics' is sometimes distinguished from the rest, referring to the linguistic correlates and problems of ethnic groups - illustrated at a practical level by the linguistic consequences of immigration; there is a language side to race relations, as anyone working in this field is all too readily aware. The term 'anthropological linguistics' is sometimes distinguished from 'sociological linguistics', depending on one's particular views as to the validity or otherwise of a distinction between anthropology and sociology in the first place (e.g. the former studying primitive cultures, the latter studying more 'advanced' political units). Usage of British and American scholars differs considerably in this respect. 'Stylistics' is another label which is sometimes distinguished, referring to the study of the distinctive linguistic characteristics of smaller social groupings (such as those due to occupational or class differences). More usually, however, stylistics refers to the study of the literary expression of a community, using linguistic methods. None of these labels has any absolute basis: the subject-matter of ethnolinguistics gradually merges into that of anthropological linguistics, that into sociological linguistics, and that into stylistics, and the subject-matter of social psychology. The kinds of problem which turn up are many and various, and some have been illustrated in Chapter i, which was very much concerned with the role of language in society. They include: the problems of communities which develop a standard language, and the reactions of minority groups to this (as in Belgium, India, or Wales); the problems of people who have to be educated to a linguistic level where they can cope with the demands of a variety of social situations; the problems of communication which exist between nations or groups using a different language, which affects their 'world-view'; the problems caused by linguistic change in response to social factors; the problems caused (and solved) by bilingualism or multilingualism; the problems caused by the need for individuals to interact with others in specific linguistic ways (language as an index of intimacy or distance, of solidarity, of prestige or power, of pathology, and so on). I am not arguing that sociolinguistics by itself can solve problems such as these; but it can identify precisely what the problems are (this is sometimes a major task in itself), and obtain information about the particular manifestation of a problem in a given area, so that possible solutions can thereby be hastened.
One thing is clear. There is little chance of solving any of these problems until certain basic principles about the relationship of language to society have been established, and accurate techniques of study developed. And so far, there are many basic issues about which there is much controversy - for example, the extent to which our social background determines our linguistic abilities, or the rationale on which multilingual individuals use their different languages for different social purposes. There are of course innumerable facts to be discovered, even about a language as well investigated as English, concerning the nature of the different kinds of English we use in different situations - when we are talking to equals, superiors or subordinates; when we are 'on the job'; when we are old or young, upper class or lower class, male or female; when we are trying to persuade, inform or bargain; and so on. An informal definition of sociolinguistics highlights this concern to get even the most elementary of descriptive information down on paper: 'Who can say what, how, using what means, to whom, when, and why?' If we knew all these factors, we would know a great deal about social problems. These days sociolinguistics has progressed far in accumulating its own data in order to answer these questions.
To analyse a problem sociolinguistically implies being able to analyse it linguistically. Sociolinguistics makes use of the findings of linguistic theory and description in its work; and in one sense its success is dependent on success in 'pure linguistics'. On the other hand, the nature of its subject-matter means that there will arise a great deal which will be both theoretically and methodologically novel - explanatory constructs of one kind or another which are not constructs of either linguistics or sociology, but a derivative of both. One example of this is the notion of 'interference', that is, linguistic disturbance which results from two languages (or dialects) coming into contact in a specific situation. The problem of interference is not something which linguistics, or any other subject, on its own, could handle. There has been some debate as to whether the existence of uniquely sociolinguistic problems of this kind requires the establishment of a quite independent discipline, with a theoretical identity and methodology of its own, or whether the dependence on linguistics in its general sense is so fundamental that such a prospect is impossible. This is an issue which will doubtless continue to be discussed for some time. Meanwhile, it is the case that for practical purposes (as in teaching linguistics) most courses would not make a clear-cut distinction, but would consider the study of sociolinguistics to be an essential part of the explanation of the subject as a whole.
An even stronger link is argued these days for my second example of interdisciplinary overlap, psycholinguistics. The relation of linguistics to psychology has been the source of some heated discussion of late, largely due to Chomsky's particular emphasis on this question. His view of linguistics, as outlined for instance in his book Language and Mind, is that the most important contribution linguistics can make is to the study of the human mind; and that linguistics is accordingly best seen as a branch of cognitive psychology. This is not an altogether surprising thing in view of the mentalistic claims of parts of his theory (cf. p. 103) and his particular views on the nature of language acquisition in children. But it is an extreme view, which most linguists at the present time do not share. On the other hand, no one would want to deny the existence of strong mutual bonds of interest operating between psychology and linguistics. The extent to which language mediates or structures thinking, the extent to which talk about language 'simplicity' or 'complexity' can be given any meaningful psychological basis, the extent to which language is influenced by and itself influences such things as memory, attention, recall and constraints on perception, and the extent to which language has a central role to play in the understanding of human development are broad illustrations of such bonds.
Psycholinguistics as a distinct area of interest developed in the early sixties, and in its early form covered the psychological implications of an extremely broad area, from acoustic phonetics to language pathology. Nowadays, certain areas of language and linguistic theory tend to be concentrated on by those who call themselves psycholinguists, and most of them have been influenced by the development of generative theory. The most important area is the investigation of the acquisition of language by children. Here, there have been many studies of both a theoretical and a descriptive kind. The descriptive need is prompted by the fact that until recently hardly anything was known about the actual facts of language acquisition in children, in particular about the order in which grammatical structures were acquired. Even elementary questions such as when and how children develop their ability to ask questions syntactically, or when they learn the inflectional systems of their language, went unanswered. And a great deal of work has gone on recently into the methodological and descriptive problems involved in obtaining and analysing information of this kind.
The theoretical questions have focused on the issue of how we can account for the phenomenon of language development in children at all. Normal children have mastered most of the structure of their language by the age of five. The generative approach argued against the earlier behaviourist assumptions that it was possible to explain language development largely in terms of imitation and selective reinforcement. It asserted that it was impossible to explain the rapidity or the complexity of language development solely in terms of children imitating the language used by the people around them. And as a result of the arguments supporting this assertion, it would now be generally agreed that imitation alone is not enough. Imitation is an important factor in the development of language (cf. p. 46), but it cannot be the major one, and thus the basis of any theory of language acquisition, because there is too much of central importance in language which is not amenable to direct observation, and thus not imitatable - the various meaning-relations between sentences or parts of sentences, for instance, or, more generally, the abstract knowledge of the grammatical rules of their language which adults have as part of their competence. All normal children come to develop this abstract knowledge for themselves; and the generative approach argues that such a process is only explicable if one postulates that certain features of this competence are present in the brains of children right from the beginning. In other words, what is being claimed is that children's brains contain certain innate characteristics which 'pre-structure' them in the direction of language learning. To enable these innate features to develop into adult competence, children must be exposed to human language, i.e. they must be stimulated in order to respond. But the basis on which they develop their linguistic abilities is not describable in behaviourist terms.
What we have here, then, is a hypothesis about the nature of language acquisition. So far, it has not been tested in any convincing way (and it may not be possible to test it, in the usual sense); but it has provoked a great deal of speculation. In particular, it raises the question of how far the innate features could be identified with the primitive meaning-relations of grammatical theory - that is, the linguistic universals talked about at the end of Chapter 4. Are all children born with an ability to discriminate 'subjects' from 'objects', let us say, in some sense? How many such basic relations might one plausibly ascribe to the child? And how specific is its innateness? Clearly, it is not possible to suggest that the child has any features of a particular language innate, for instance a particular feature of English syntax which does not occur in French or German. To suggest this would be tantamount to saying that children of any race would find it easier to learn English than to learn other languages (that is, their brains would predispose them towards English); and all available evidence points to the implausibility of this conclusion. A Zulu child learns Zulu just as rapidly as an English child learns English, it seems. No, the innate features must be sufficiently general, sufficiently 'deep', to be capable of equally readily underlying the structure of any language. And on this point, the identity of interests between linguistic and psycholinguistic theory (at least, in this field) should be clear. There have of course been a number of objections raised to the innateness hypothesis - for example, on the grounds that what is innate is not so much deep structural information, but rather learning principles of a more general kind. Some people would like to see what would happen if the hypothesis were formulated in terms other than those provided by Chomsky's later work. As someone put it once, 'Why should we see the child as if it were born with a copy of Aspects of the Theory of Syntax tucked inside its head!' Unkind, perhaps; for without Aspects, and the work which followed it, many interesting questions might never have been raised. The issue, however, is by no means determined.
In the 1980s, the interest in the innateness hypothesis has been largely replaced by a focus on the relationship between language development and a child's cognitive skills, following on the influential work of Jean Piaget and other psychologists. There has been renewed interest in the strategies which children use in acquiring language, and the significance of such topics as imitation has come to be reconsidered in this light. Above all, there has been a concern to study the factors which characterize children's learning environment - in particular, the nature of the input language they receive from mothers and other caretakers (motherese). The Journal of Child Language, which commenced publication in 1975, is now the best source of information on current trends in the subject. Its contributors span the disciplines of psychology and linguistics, and their work illustrates a wide range of experimental and naturalistic approaches to the subject. Without doubt, the field of language acquisition remains one of the most intriguing areas of linguistics study, at the present time, and one which will certainly remain in the forefront of linguists' attention over the next few years.
There are many other applications of linguistics in fields not so far mentioned, which tend to be grouped together anonymously as 'applied linguistics'. Foreign language teaching and learning is the major application, as suggested in Chapter 1; but there is also native language teaching, translation (either individually, or using machines), the many facets of telecommunications, lexicography . . . The list could go on for some time. Each of these fields selects its basic information and theoretical framework from the overall perspective which linguistics provides, and applies it to the clarification of some general area of human experience. And it is surely the many branches of applied linguistics that will ultimately provide the main link between Chapters 1 and 4, if such a link be needed. But, as always, we must remember that an application is but the tip of a theoretical iceberg: many hours of research and discussion, much of it highly specialized, abstract, and quite unpractical, will have taken place in order to provide the basic knowledge which can be implemented in a specific application. Indeed, in many cases it is only through the illuminating models developed in linguistic theory, and the demonstration of a coherent system underlying apparently disorganized data, that applications and approaches to a problem have been thought of at all.
[…] 'What does it matter', such queries run, 'whether the basic phonological unit is the phoneme or the distinctive feature? or whether the morpheme concept fits all cases? or whether there is a boundary-line between syntax and semantics?' If these questions are still being asked, then the arguments underlying my Chapter 3 about the scientific aims of linguistics have not been appreciated. The kinds of distinction drawn there are essential if we hope to build up a general theory of language; we have to appreciate the kinds of reasoning relevant to this task, even if we do not always agree with the conclusions reached. If we are adopting a rational approach to our study of (or interest in) language, then we cannot just blindly analyse and describe in a random, arbitrary way. Whatever our purpose, whether 'pure' or 'applied', we must know why we are doing what we are doing, if we hope to be clear and consistent and wish to convince others (or even ourselves) of its validity. It does matter about these questions, and many others like them, because the answers constitute our world-view of language. Choosing to work with distinctive features is one choice we make, along with many others, which ultimately builds up a coherent and self-consistent picture of language structure that intuitively satisfies us. We sit back and say, 'Yes, that makes sense.' To a certain extent, then, our final decisions about which concepts to work with are a matter of taste. But the more we understand the relative merits and demerits of the various theories, descriptions and procedures which the subject provides, the more likely we will be to reach a view of language that is reasonable and convincing, as well as personally satisfying.
Corpus Linguistics
Kennedy G. An Introduction to Corpus Linguistics.
London and New York: Addison Wesley
Longman Limited, 1998. – pp. 1-12.
Introduction
In the language sciences a corpus is a body of written text or transcribed speech which can serve as a basis for linguistic analysis and description. Over the last three decades the compilation and analysis of corpora stored in computerized databases has led to a new scholarly enterprise known as corpus linguistics. The purpose of this book is to introduce the various activities which come within the scope of corpus linguistics, and to set current work within its historical context. It brings together some of the findings of corpus-based studies of English, the language which has so far received the most attention from corpus linguists, and shows how quantitative analysis can contribute to linguistic description. It is hoped that, by concentrating in particular on some of the results of corpus analysis, the book will whet the appetites of the growing body of teachers and students with access to corpora to discover more for themselves about how languages work in all their variety. The book is intended primarily for those who are already familiar with general linguistic concepts but who want to know more of what can be done with a corpus and why corpus linguistics may be relevant in research on language. Corpus linguistics is not an end in itself but is one source of evidence for improving descriptions of the structure and use of languages, and for various applications, including the processing of natural language by machine and understanding how to learn or teach a language.
The main focus of this book is on four major areas of activity in corpus linguistics:*
corpus design and development*
corpus-based descriptions of aspects of English structure and use*
the particular techniques and tools used in corpus analysis*
applications of corpus-based linguistic description
Readers may choose to work through the book in the above order or to begin with the sections dealing with corpus-based descriptions of English in order first to become more familiar with some of the results of corpus analysis. In focusing on the contribution of corpus linguistics to the description of English and on some of the central issues and problems which are being addressed within corpus linguistics, the book also attempts to bring together disparate work which is often hard to get hold of. However, such is the speed of development and change in corpus linguistics at the present time that anyone writing about it must be conscious that it would be easy to produce a Ptolemaic picture of the field - with the world distorted and with Terra Australis Incognita, the Great Southern Continent, both misconceived and misplaced. Work relevant for corpus linguistics is being done in many fields, including computer science and artificial intelligence, as well as in various branches of descriptive and applied linguistics. It would not be surprising if some of the scholars contributing to corpus linguistics from these and other perspectives found that their work is inadequately represented here. However, they can be assured that such neglect is not intended.
Because corpus linguistics is a field where activity is increasing very rapidly and where there is as yet no magisterial perspective, even the very notion of what constitutes a valid corpus can still be controversial. It also needs to be understood at the outset that not every use of computers with bodies of text is part of corpus linguistics. For example, the aim of Project Gutenberg to distribute 10,000 texts to 100 million computer users by the year 2001 is not in itself part of corpus linguistics although texts included in this ambitious project may conceivably provide textual data for corpus analysis. Similarly, contemporary reviews of computing in the humanities show the enormous extent of corpus-based work in literary studies. While some of the methodology used in literary studies resembles some of the activity being undertaken in corpus linguistics, research on authorial attribution or thematic structure, for example, does not come within the scope of this book. Nor does the book attempt to cover systematically the wide range of corpus-based work being undertaken in computational linguistics in such areas of natural language processing as speech recognition and machine translation.
Although there have been spectacular advances in the development and use of electronic corpora, the essential nature of text-based linguistic studies has not necessarily changed as much as is sometimes suggested. In this book, reference is made to corpus studies which were undertaken manually before computers were available. Corpus linguistics did not begin with the development of computers but there is no doubt that computers have given corpus linguistics a huge boost by reducing much of the drudgery of text-based linguistic description and vastly increasing the size of the databases used for analysis. It should be made clear, however, that corpus linguistics is not a mindless process of automatic language description. Linguists use corpora to answer questions and solve problems. Some of the most revealing insights on language and language use have come from a blend of manual and computer analysis. It is now possible for researchers with access to a personal computer and off-the-shelf software to do linguistic analysis using a corpus, and to discover facts about a language which have never been noticed or written about previously. The most important skill is not to be able to program a computer or even to manipulate available software (which, in any case, is increasingly user-friendly). Rather, it is to be able to ask insightful questions which address real issues and problems in theoretical, descriptive and applied language studies. Many of the key problems and challenges in corpus linguistics are associated with the following questions:*
How can we best exploit the opportunities which arise from having texts stored in machine-retrievable form?*
What linguistic theories will best help structure corpus-based research?*
What linguistic phenomena should we look for?*
What applications can make use of the insights and improved descriptions of languages which come out of this research?
In answering these and other questions corpus linguistics has potential to provide solutions and new directions to some of the major issues and problems in the study of human communication.
Corpora
The definition of a corpus as a collection of texts in an electronic database can beg many questions for there are many different kinds of corpora. Some dictionary definitions suggest that corpora necessarily consist of structured collections of text specifically compiled for linguistic analysis, that they are large or that they attempt to be representative of a language as a whole. This is not necessarily so. Not all corpora which can be used for linguistic research were originally compiled for that purpose. Historically it is not even the case that corpora are necessarily stored electronically so that they can be machine-readable, although this is nowadays the norm. […] electronic corpora can consist of whole texts or collections of whole texts. They can consist of continuous text samples taken from whole texts; they can even be made up of collections of citations. At one extreme an electronic dictionary may serve as a kind of corpus for certain types of linguistic research while at the other extreme a huge unstructured archive of texts may be used for similar purposes by corpus linguists.
Corpora have been compiled for many different purposes, which in turn influence the design, size and nature of the individual corpus. Some current corpora intended for linguistic research have been designed for general descriptive purposes - that is, they have been designed so that they can be examined or trawled to answer questions at various linguistic levels on the prosody, lexis, grammar, discourse patterns or pragmatics of the language. Other corpora have been designed for specialized purposes such as discovering which words and word meanings should be included in a learners' dictionary; which words or meanings are most frequently used by workers in the oil industry or economics; or what differences there are between uses of a language in different geographical, social, historical or work-related contexts.
A distinction is sometimes made between a corpus and a text archive or text database. Whereas a corpus designed for linguistic analysis is normally a systematic, planned and structured compilation of text, an archive is a text repository, often huge and opportunistically collected, and norm-ally not structured. It is generally the case, as Leech (1991:11) suggested, that 'the difference between an archive and a corpus must be that the latter is designed or required for a particular "representative" function'. It is nevertheless not always easy to see unequivocally what a corpus is representing, in terms of language variety.
Databases which are made up not of samples, but which constitute an entire population of data, may consist of a single book (e.g. George Eliot's Middlemarch) or of a number of works. These corpora may be the work of a single author (e.g. the complete works of Jane Austen) or of several authors (e.g. medieval lyrics), or all the editions of a particular newspaper in a given year. Some projects have assembled all the known available texts in a particular genre or from a particular historical period. Some of these databases or text archives described in Section 2.4 are very large indeed, and although they have rarely yet been used as corpora for linguistic research, there is no reason why they should not be in the future. In many respects it is thus the use to which the body of textual material is put, rather than its design features, which define what a corpus is.
A corpus constitutes an empirical basis not only for identifying the elements and structural patterns which make up the systems we use in a language, but also for mapping out our use of these systems. A corpus can be analysed and compared with other corpora or parts of corpora to study variation. Most importantly, it can be analysed distributionally to show how often particular phonological, lexical, grammatical, discoursal or pragmatic features occur, and also where they occur.
In the early 1980s it was possible to list on a few fingers the main electronic corpora which a small band of devotees had put together over the previous two decades for linguistic research. These corpora were available to researchers on a non-profit basis, and were initially available for processing only on mainframe computers. The development of more powerful microcomputers from the mid-1970s and the advent of CD-ROM in the 1980s made corpus-based research more accessible to a much wider range of participants.
By the 1990s there were many corpus-making projects in various parts of the world. Lancashire (1991) shows the huge range of corpora, archives and other electronic databases available or being compiled for a wide variety of purposes. Some of the largest corpus projects have been undertaken for commercial purposes, by dictionary publishers.' Other projects in corpus compilation or analysis are on a smaller scale, and do not necessarily become well known. Undertaken as part of graduate theses or undergraduate projects, they enabled students to gain original insights into the structure and use of language.
The role of computers in corpus linguistics
The analysis of huge bodies of text "by hand' can be prone to error and is not always exhaustive or easily replicable. Although manual analysis has made an important contribution over the centuries, especially in lexicography, it was the availability of digital computers from the middle of the 20th century which brought about a radical change in text-based scholarship. Rather than initiating corpus research, developments in information technology changed the way we work with corpora. Instead of using index cards and dictionary 'slips', lexicographers and grammarians could use computers to store huge amounts of text and retrieve particular words, phrases or whole chunks of text in context, quickly and exhaustively, on their screens. Furthermore the linguistic items could be sorted in many different ways, for example, taking account of the items they collocate with and their typical grammatical behaviour.
Corpus linguistics is thus now inextricably linked to the computer, which has introduced incredible speed, total accountability, accurate replicability, statistical reliability and the ability to handle huge amounts of data. With modern software, computer-based corpora are easily accessible, greatly reducing the drudgery and sheer bureaucracy of dealing with the increasingly large amounts of data used for compiling dictionaries and other information sources. In addition to greatly increased reliability in such basic tasks as searching, counting and sorting linguistic items, computers can show accurately the probability of occurrence of linguistic items in text. They have thus facilitated the development of mathematical bases for automatic natural language processing, and brought to linguistic studies a high degree of accuracy of measurement which is important in all science. Computers have permitted linguists to work with a large variety of texts and thus to seek generalizations about language and language use which can go beyond particular texts or the intuitions of particular linguists. The quantification of language use through corpus-based studies has led to scientifically interesting generalizations and has helped renew or strengthen links between linguistic description and various applications. Machine translation, text-to-speech synthesis, content analysis and language teaching have been among the beneficiaries.
Some idea of the changes which the computer has made possible in text studies can be gauged from a report in an early issue of the ALLC Bulletin, the forerunner of the journal Literary and Linguistic Computing. A brief report by Govindankutty (1973) on the coming of the computer to Dravidian linguistics captures the moment of transition between manual and electronic databases. The text he was working with of 300,000 words is small by today's standards, but what took the researcher and his long-suffering colleagues nearly six years of data management and analysis could, 20 years later, be carried out in minutes.
It took nearly six years' hard labour and the co-operation of colleagues and students to complete the Index of Kamparamayanam, the longest middle Tamil text, in the Kerala University under the supervision of Professor V. I. Subramoniam. The text consists of nearly 12,500 stanzas and each stanza has four lines; each line has an average of six words. All the words and some of the sufwere listed on small cards by the late Mr. T. Velaven who is the architect of this voluminous index. Later, the cards were sorted into alphabetical order and each item was again arranged according to the ascending order of the stanza and line. Finally, each entry was checked with the text and the meaning and grammatical category were noted. The completed index consists of about 3,500 typed pages (28 x 20 cm).
While indexing, some suffixes such as case were listed separately. This posed some problems when I started to work on the grammar of the language of the text. When it was necessary to find out after what kind of words and after which phonemes and morphemes the alternants of a suffix occur, it became necessary again to go through all the entries. Though I have tried to work out the freof all the suffixes, for want of time it was not completely possible. How-ever, the frequency study helped to unearth different strata in the linguistic excavation and indirectly emphasized that it is a sine qiui non, at least, for such a descriptive and historical study.
Though it took a lot of time, energy and patience, the birth of an index brought with it an unknown optimism in the grammatical description. After completing the index and the grammatical study of Kamparamayanam, three months ago I started indexing Ramacaritam, an early Malayalam text, using small cards. This project is being carried out in the Leiden University with the guidance of Professor F. B. J. Kuiper. While I was half my way through the indexing, Dr. B. J. Hoff of the Linguistics Department informed me of the work done in the Institute for Dutch Lexicology with the help of a computer. When I discussed the problems with Dr. F. de Tollenaere, who is the head of this institute, he outlined with great enthusiasm how a computer can be utilized for this purpose. Immediately, I started transcribing the text and now it is being punched on paper tape, using an AREA paper tape punch at the Institute. This paper tape punch, having an extra shift, has twice the eighty-eight standard possibilities, which results in one hundred and seventy-six different punch-ing codes, which for the computer has the value of one hundred and seventy-six characters. Moreover, a coding system makes it possible to have up to two hundred and seven possibilities, which are also available at the output stage, as the Institute has at its disposal a print train with two hundred and seven symbols.
To a present-day corpus linguist, even the laborious data entry by punched paper seems quaintly archaic, and Govindankutty's task could now be undertaken on a personal computer accessed directly through a keyboard.
Until the mid-1980s corpus linguistics typically involved mainframe computing and was largely associated with universities having access to large machines. In the 1970s, with shared access to a standard mainframe, it could take an hour or more to make a concordance consisting of all the instances of a word such as when in a one-million-word corpus. By the late 1980s, the time taken to run such a program had been reduced to minutes. In the 1990s, the same job can be done just as quickly on the faster personal computers running at 60 or more megahertz. Hard disk drives of 500 megabytes or more on personal computers and input from a CD-ROM are now common, thus facilitating storage and rapid analysis.
In the early 1980s a captive computer scientist or friendly computer programmer was almost indispensable to assist many aspiring corpus linguists to cope with inevitable technical problems associated with data management and the programming skills necessary for corpus analysis. By the 1990s, improvements in personal computers of the kind already mentioned, and the availability of commercial software packages designed for corpus analysis, have meant that most corpus linguists can now connot on how to program and use a computer but on problems and issues in linguistics which can be addressed through a corpus.
The scope of corpus linguistics
Corpus linguistics is based on bodies of text as the domain of study and as the source of evidence for linguistic description and argumentation. It has also come to embody methodologies for linguistic description in which quantification of the distribution of linguistic items is part of the research activity. As Leech (1992:107) has noted, the focus of study is on performance rather than competence, and on observation of language in use leading to theory rather than vice versa.
It would be misleading, however, to suggest that corpus linguistics is a theory of language in competition with other theories of language such as transformational grammar, or even more that it is a new or separate branch of linguistics. Linguists have always needed sources of evidence for theories about the nature, elements, structure and functions of language, and as a basis for stating what is possible in a language. At various times, such evidence has come from intuition or introspection, from experimentation or elicitation, and from descriptions based on observations of occurrence in spoken or written texts. In the case of corpus-based research, the evidence is derived directly from texts. In this sense corpus linguistics differs from approaches to language which depend on introspection for evidence. In his celebrated work, Coral Gardens and their Magic, Malinowski (1935: 9) wrote about the paradigm shift which he considered was necessary in the linguistics of the day.
The neglect of the obvious has often been fatal to the development of scientific thought. The false conception of language as a means of transfusing ideas from the head of the speaker to that of the listener has, in my opinion largely vitiated the philological approach to language. The view set forth here is not merely academic: it compels us, as we shall see, to correlate other activities, to interpret the meaning - text; and this means a new departure in the handling of linguistic evidence. It will also force us to define meaning in terms of experience and situation.
Linguists may not see the necessity for such a sea change today. However, it is the case that corpus linguists often have different concerns from many other linguists. Corpus linguists are concerned typically not only with what words, structures or uses are possible in a language but also with what is probable - what is likely to occur in language use. The use of a corpus as a source of evidence however is not necessarily incompatible with any linguistic theory, and progress in the language sciences as a whole is likely to benefit from a judicious use of evidence from various sources: texts, introspection, elicitation or other types of experimentation as appropriate. Any scientific enterprise must be empirical in the sense that it has to be supported or falsified on evidence and, in the final analysis, statements made about language have to stand up to the evidence of language use. The evidence can be based on the introspective judgment of speakers of the language or on a corpus of text. The difference lies in the richness of the evidence and the confidence we can have in the generalizability of that evidence, in its validity and reliability. The boundaries, therefore, between corpus-based description and argumentation and other approaches to language description are not rigid, and linguists of varied theoretical persuasions now use corpora for evidence which is complementary to evidence obtained from other sources.
Corpus linguistics, like all linguistics, is concerned primarily with the description and explanation of the nature, structure and use of language and languages and with particular matters such as language acquisition, variation and change. Corpus linguistics has nevertheless developed something of a life of its own within linguistics, with a tendency sometimes to focus on lexis and lexical grammar rather than pure syntax. This is partly a result of using methodologies such as concordancing where the contextual evidence available in a single line of wide-carriage computer printout of 130 characters is sometimes too limited for the analysis of syntax or discourse.
Work in corpus linguistics is currently associated with several quite different activities. Scholars working in the field tend to be identified with one or more of them. The first group of researchers consists of corpus makers or compilers. These scholars are concerned with the design and compilation of corpora, the collection of texts and their preparation and storage for later analysis.
A second group of researchers has been concerned with developing tools for the analysis of corpora. Important contributions to software development especially for the syntactic analysis of corpora have been associated particularly but not exclusively with researchers in computational linguistics. These researchers have been concerned with the use of corpora to develop, among other things, algorithms for natural language processing and the modelling of linguistic theories.
A third group of researchers consists of descriptive linguists whose main concern has been to make use of computerized corpora to describe reliably the lexicon and grammar of languages, both of the linguistic systems we use and our likely use of those systems. It is the probabilistic aspect of corpus-based descriptive linguistic studies which especially distinguishes them from conventional descriptive fieldwork in linguistics or lexicography. That is, corpus-based descriptive linguistics is concerned not only with what is said or written, where, when and by whom, but how often particular forms are used. The measurement of the distribution of words and grammar has encouraged new ways of studying the linguistic basis of variation in text types, language change and regional and other varieties of language. The corpus provides contexts for the study of meaning in use and, by making available techniques for extracting linguistic information from texts on a scale previously undreamed of, it facilitates linguistic investigations where empiricism is text based.
A fourth area of activity, which has been among the most innovative outcomes of the corpus revolution, has been the exploitation of corpus-based linguistic description for use in a variety of applications such as language learning and teaching, and natural language processing by machine, including speech recognition and translation.
At the present time in corpus linguistics, some researchers tend to focus on issues in corpus design, others on methods for text analysis and processing, and still others, probably the majority, on corpus-based linguistic description and the application of such descriptions.
Although the scope of corpus linguistics may be defined in terms of what people do with corpora, it would be a mistake to assume that corpus linguistics is simply a faster way of describing how a language works, or is about the nature of linguistic evidence. Analysis of a corpus by means of standard corpus linguistic research software can and frequently does reveal facts about a language which we might never previously have thought of seeking. Altenberg's (1991a) study of amplifier collocations in English, for example, raised questions about semantic
classes of maximizers and boosters such as perfectly or awfully which probably would not have been asked without the evidence of a corpus. He found for example that frequent maximizers such as quite tend to collocate with non-scalar words (quite obviously) while absolutely has a greater tendency than other maximisers to collocate with negatives (absolutely not). The major shift in methodology associated with corpus linguistics comes not from theory but rather from what the use of corpora makes possible.
As we have seen, corpus linguistics goes beyond the use of corpora as a source of evidence in linguistic description. It also revives and carries on a concern of some linguists with the statistical distribution of linguistic items in the context of use. From the 1920s there was, especially in the United States and the United Kingdom/a tradition of word counting in texts in order to discover the most frequent, and arguably therefore the most pedagogically useful, words and grammatical structures for language teaching purposes.
From the 1930s, Prague School linguists undertook quantitative studies (mainly of Czech, English and Russian) of the frequency of certain grammatical processes, the relative frequencies of different parts of speech, the location and distribution of information in the sentence, and the statistical distribution of syllable types and structures. Some of this work was directed towards comparative stylistic analysis (e.g. Kramsky, 1972) and some towards quantitative comparisons of varieties of English (e.g. Duskova, 1977). Such Prague School quantitative studies, which were carried out manually, differ from modern computer corpus-based studies particularly in the size of the corpora and in their representativeness. Duskova, for example, studied 10,000 finite verb forms from 10 plays to draw conclusions about the functions and use of the preterite and the perfect in British and American English, but it is not clear why these 10 plays were chosen as representative of contemporary English. Nevertheless, the Prague School focus on quantitative studies was commendable at a time when orthodox linguistics eschewed them. Other quantitative studies were directed towards discovering the 'statistical laws' of text.
The work of the American philologist George Zipf, from the 1930s, was concerned with such quantitative analyses as the relation between the frequency of words in text and text length, the frequency of words and their antiquity, and the relation between the rank order of an item in a word frequency list and the number of occurrences or tokens of that item in a text. Zipf (1949) sets out his famous 'law' which held that the relationship between the frequency of use of a word in a text and the rank order of that word in a frequency list is a constant (f.r=c).
As noted above, the earliest computerized corpora compiled for linguistic research from the 1960s required the use of mainframe computers, and researchers frequently had to design their own software for analysis. Initial interest was often in lexis, including word counts, but it was quickly apparent that a computer corpus facilitated the study of permissible or likely word sequences or collocations (are we more likely to write different from, different to or different than?) and grammatical and stylistic characteristics of particular authors and genres. There was a particular interest in what characterized 'scientific style', 'newspaper style' and 'literary or imaginative style'.
With a corpus stored in a computer, it is easy to find, sort and count items, either as a basis for linguistic description or for addressing language-related issues and problems. It is not surprising, therefore, that a wide range of research activities have come to be within the scope of corpus linguistics. Analyses can contribute to the making of dictionaries, word lists, descriptive grammars, diachronic and synchronic comparative studies of speech varieties, and to stylistic, pedagogical and other applications. With appropriate software it is easy to study the distribution of phonemes, letters, punctuation, inflectional and derivational morphemes, words (as variously defined), collocations, instances of particular word classes, syntactic patterns, or discourse structures. Recent work at Birmingham University described by Renouf (1993) shows how new words and new uses can be identified in corpora at the time these words enter journalistic use.
The scope and current concerns of a field of scholarship can sometimes be seen or defined through the topics which make up conference programmes and the content of specialist journals. In the 1990s the topics which appear on conference programmes and in journals which cover corpus linguistics include improved ways of annotating corpora, the tagging of parts of speech and the senses of polysemous word forms, improved automatic parsing, identification of collocations, phraseological units and discourse structure, text categorization, research methodology in the face of more and bigger corpora, and the application of this work in lexico-graphy, syntactic description, translation, speech and handwriting recognition, and language teaching. Educational applications are increasingly on the agenda. At Lancaster University in 1994 and 1996 the pedagogical significance of electronic corpora was the subject of conferences on the teaching of linguistics and the teaching of languages.
In March 1993, a Georgetown University Round Table meeting in Washington, DC, on corpus-based linguistics identified the following topics as those in particular need of investigation and dissemination at a time when linguistics was returning to more text-based approaches to language:*
the design and development of text-speech corpora*
tools for searching and processing on-line corpora*
critical assessments of on-line corpora and corpus-processing tools*
methodological issues in corpus-based analysis*
applications and results in linguistics and related disciplines, including language teaching, computational linguistics, historical linguistics, discourse analysis and stylistic analysis
The scope of computer corpus-based scholarship can also be measured by some of its achievements. In lexicography the revision of the Oxford English Dictionary, its publication in electronic form on CD-ROM and the publication of new learners' dictionaries of English by other major publishers were all based on corpora. The completion of the 100-million-word British National Corpus in 1994 set a new standard in corpus design and compilation. Another important international standard set in corpus preparation and formatting has been in the gradual adoption of the Standard Generalized Markup Language (SGML) through the Text Encoding Initiative (TEI) (see Section 2.6.5). In the analysis of corpora there have been improvements in the accuracy of the automatic grammatical tagging and parsing of texts. There has also been a substantial and rapidly growing amount of descriptive detail on the elements and structure of languages (particularly English) arising from corpus-based research.
Current issues
Widdowson H.G. Linguistics. – Oxford
University Press, 1996. – pp. 69-77.
Linguistics, like language itself, is dynamic and therefore subject to change. It would lose its validity otherwise, for like all areas of intellectual enquiry, it is continually questioning established ideas and questing after new insights. That is what enquiry means. Its very nature implies a degree of instability. So although there is, in linguistics, a reasonably secure conceptual common ground, which this book has sought to map out, there is, beyond that, a variety of different competing theories, different visions and revisions, disagreements and disputes, about what the scope and purpose of the discipline should be. There are three related issues which are particularly prominent in current debate. One has to do with the very definition of the discipline and takes us back to the question of idealization. Another issue concerns the nature of linguistic data and has come into prominence with the development of computer programs for the analysis of large corpora of language. A third issue raises the question of accountability and the extent to which linguistic enquiry should be made relevant to the practical problems of everyday life.
The scope of linguistics
[…] linguistics has traditionally been based on an idealization which abstracts the formal properties of the language code from the contextual circumstances of actual instances of use, seeking to identify some relatively stable linguistic knowledge (langue, or competence) which underlies the vast variety of linguistic behaviour (parole, or performance). It was also pointed out that there are two reasons for idealizing to such a degree of abstraction. One has to do with practical feasibility: it is convenient to idealize in this way because the actuality of language behaviour is too elusive to capture by any significant generalization. But the other reason has to do with theoretical validity, and it is this which motivates Chomsky's competence-performance distinction. The position here is that the data of actual behaviour are disregarded not because they are elusive but because they are of little real theoretical interest: they do not provide reliable evidence for the essential nature of human language. Over recent years, this formalist definition of the scope of linguistics has been challenged with respect to both feasibility and validity.
As far as feasibility is concerned, it has been demonstrated that the data of behaviour are not so resistant to systematic account as they were made out to be. There are two aspects of behaviour. One is psychological and concerns how linguistic knowledge is organized for access and what the accessing processes might be in both the acquisition and use of language. This has been a subject of enquiry in psycholinguistics. The second aspect of behaviour is sociological. This accessing of linguistic knowledge is prompted by some communicative need, some social context which calls for an appropriate use of language. These conditions for appropriateness can be specified, as indeed was demonstrated in part in the discussion of pragmatics. The account of the relationship between linguistic code and social context is the business of sociolinguistics.
Psycholinguistic work on accessing processes and socio-linguistic work on appropriateness conditions have demonstrated that there are aspects of behaviour that can be systematically studied, and that rigorous enquiry does not depend on the high degree of abstraction proposed in formalist linguistics. In other words, psycholinguistics and sociolinguistics have things to say about language which are also within the legitimate scope of the discipline. Such a point of view would be a tolerant and neighbourly one: we stake out different areas of language study, each with its own legitimacy.
But the challenge to the formalist approach in respect to validity is quite different. It is not tolerant and neighbourly at all, but a matter of competing claims for the same territory. It is not just an issue of delimitation but of definition, and proposes a functionalist one in opposition to a formalist one. The argument here is that it diminishes the very study of language to reduce it to abstract forms because to do so is to eliminate from consideration just about everything that is really significant about it and to make it hopelessly remote from people's actual experience. Language, the argu-ment goes, is not essentially a static and well-defined cognitive construct but a mode of communication which is intrinsically dynamic and unstable. Its forms are of significance only so far as we can associate them with their communicative functions. On this account, the only valid linguistics is functional linguistics.
But, as was indicated in Chapter z, there are two senses in which linguistic forms can be said to be associated with functions, and therefore two ways of defining functional linguistics. Firstly, we can consider how the linguistic code has developed in response to the uses to which it is put. In this sense, functional linguistics is the study of how the formal properties of language are informed by the functions it serves, how it encodes perceptions of reality, ways of thinking, cultural values, and so on.
Secondly, we can think of the form-function association as a matter not of encoded meaning potential but of its actual realization in communication; and here we are concerned with the way language forms function pragmatically in different contexts of use. In this case formalist linguistics is challenged not because it defines the language code too narrowly without regard to the social factors which have formed it, but because it defines language only in reference to the code, without regard to how it is put to use in communication. The argument here is that linguistics should extend its scope to account not only for the knowledge of the internalized language of the code, or linguistic competence, but for the knowledge people have of how this is appropriately acted upon, or communicative competence.
These two senses of functional linguistics are frequently confused, and there has sometimes been a tendency to suppose that if you define the code in reference to the communicative functions that have influenced its formation over time, then it follows that you will automatically be accounting for the way in which the code functions in communication here and now. But to do this is to equate the semantic potential of the code with actual pragmatic realizations of it in communication.
Functional linguistics, in both senses, considers language as an essentially social phenomenon, designed for communication. There is no interest in what makes human language a species-specific endowment, in those universal features of language which might provide evidence of innateness which were described in Chapter 1. The concerns of functional linguistics are closer in this respect to the reality of language as people experience it, and it is therefore often seen as more likely than formal linguistics to be applicable to the problems of everyday life. Opponents might argue that this is only achieved at the expense of theoretical rigour. This raises the general question of how far relevance and accountability are valid considerations in linguistic enquiry, and this will be taken up again a little later. It also raises the question of what the source of linguistic data should be, and it is to this matter that we now turn.
The data of linguistics
There are, broadly speaking, three sources of linguistic data we can draw upon to infer facts about language. We can, to begin with, use introspection, appealing to our own intuitive competence as the data source. This is a tradition in linguistics of long standing, and essentially makes operational Saussure's concept of langue as common knowledge, imprinted in the mind like a book of which all members of the community have identical copies. So if linguists want data, as representative members of a language community they have only to consult the copy in their head. Most grammars and dictionaries until recent times have been based on this assumption that linguistic description can be drawn from the linguist's introspection. And it is not only linguistic competence which is accessible to introspection, but communicative competence as well, so the argument is that the conventions that define appropriate language use can also be drawn from the same intuitive source.
If, however, there is some reason to doubt the representative nature of such intuitive sampling, there is a second way of getting at data, namely by elicitation. In this case, you use other members of the community as informants, drawing on their intuitions. And again, this might be directed at obtaining the data of the code or its communicative use. Thus, you might ask informants whether a particular combination of linguistic elements are grammatically possible in their language, or what would be an appropriate expression given a particular context.
Introspection and elicitation can be used to establish both the formal properties of a language and how they typically function in use. But in both cases the data is abstract knowledge, and not actual behaviour. They reveal what people know about what they do but not what they actually do. If you want data of that kind, the data of performance rather than competence, you need to turn to observation.
The development of computer technology over recent years has made observation possible on a vast scale. Programs have been devised within corpus linguistics to collect and analyse large corpora of actually occurring language, both written and spoken, and this analysis reveals facts about the frequency and co-occurrence of lexical and grammatical items which are not intuitively accessible by introspection or elicitation.
It would seem on the face of it that this is a much more reliable source of data. It is surely better to find out what people actually do than depend on intuitions which are often uncertain and con-tradictory. Claims have indeed been made that these large-scale observations reveal patterns of attested usage which call for a complete revision of the existing categories of linguistic description, which are generally based on intuition and elicitation. Corpus linguistics, in dealing with actual behaviour, clearly has an affinity with functional linguistics in that it too claims to get closer to the facts of 'real' language.
There is no doubt that corpus analysis can reveal facts of usage, the data of actual linguistic performance, which throw doubt on the validity of any model of language based on the idea of a stable and well-defined system. The elaborate picture it presents is very different from the abstract painting proposed by the formal linguist. If language use is indeed a rule-governed activity, as is often said, the rules are not easy to discern in the detail. And it is also true that this detail is not accessible to introspection or elicitation. Even a limited corpus analysis can show patterns of occurrence of which language users, the very producers of the data, are unaware. Corpus linguistics transcends intuitive knowledge and in this respect can be seen as a valuable, and valid, corrective to unfounded abstraction: a case of description influencing theory for once, rather than the other way round.
But the claims of corpus linguistics can be questioned too. The facts of usage revealed by computer analysis, for example, carry no guarantee of absolute truth. The intuitions that people have about their language have their own validity as data. These conceptual constructs are also real, but the reality is of a different order.
One example of this is the way lexical knowledge (in some areas of vocabulary at least) seems to be organized semantically in terms of prototypes, and these cannot be observed, but only elicited. Thus, when a group of English-speaking informants were asked to give the first example that came to mind of a more inclusive category of things they showed a striking unanimity. The word 'bird' elicited 'robin' (rather than, say, 'chaffinch' or 'wren') and the word 'vegetable' elicited 'pea' (rather than, say, 'parnsip' or 'potato'). For these informants, then, a robin is the prototypical bird, a pea the prototypical vegetable. But this conceptual preference does not correspond with how frequently these words actually occur in a corpus. The same point can be made about grammatical structures. If English-speaking informants are asked to provide examples of a sentence, they are likely to come up with simple subject-verb-object (SVO) constructions ('The man opened the door'; 'John kissed Mary'). These, we might say, are prototypical English sentences. But they are unlikely to figure very frequently in a corpus of actual usage. Since people do not use simple sentences like this very often, they do not have much reality as observed data, but they may have a significant psychological reality nevertheless. They may be evidence of competence which is not reflected in the facts of performance.
Prototypes thus elicited do not, of course, invalidate the observed data of corpus linguistics. They provide a different kind of data which are evidence of competence which is not directly projected into performance. Intuitive, elicited, and observed data all have their own validity, but this validity depends on what kind of evidence you are looking for, on what aspects of language knowledge or behaviour you are seeking to explain. If you are looking for evidence of the internal relationship between language and the mind, you are more likely to favour intuition and elicitation. If you are looking for evidence of how language sets up external links with society, then you are more likely to look to the observed data of actual occurrence. The validity of different kinds of linguistic data is not absolute but relative: one kind is no more 'real' than another. It depends on what you claim the data are evidence of, and what you are trying to explain.
The relevance of linguistics
From questions of validity we turn now to questions of utility. What is linguistics for? What good is it to anybody? What practical uses can it be put to? One response to such questions is, of course, to deny the presupposition that it needs any practical justification at all. Like other disciplines, linguistics is an intellectual enquiry, a quest for explanation, and that is sufficient justification in itself. Understanding does not have to be accountable to practical utility, particularly when it concerns the nature of language, which, as was indicated in Chapter i, is so essential and distinctive a feature of the human species.
Whether or not linguistics should be accountable, it has been turned to practical account. Indeed, one important impetus for the development of linguistics in the first part of this century was the dedicated work done in translating the Bible into languages hitherto unwritten and undescribed. This practical task implied a prior exercise in descriptive linguistics, since it involved the analysis of the languages (through elicitation and observation) into which the scriptures were to be rendered. And this necessarily called for a continual reconsideration of established linguistic categories to ensure that they were relevant to languages other than those, like English, upon which they were originally based. The practical tasks of description and translation inevitably raised issues of wider theoretical import.
They raise other issues as well about the relationship between theory and practice and the role of the linguist, issues which are of current relevance in other areas of enquiry, and which bear upon the relationship between descriptive and applied linguistics.
The process of translation involves the interpretation of a text encoded in one language and the rendering of it into another text which, though necessarily different in form, is, as far as possible, equivalent in meaning. In so far as it raises questions about the differences between language codes it can be seen as an exercise in contrastive analysis. In so far as it raises questions about the meaning of particular texts, particular communicative uses of the codes, it can be seen as an exercise in discourse analysis. Both of these areas of enquiry have laid claim to practical relevance and so to be the business of applied linguistics.
With regard to contrastive analysis, one obvious area of application is language teaching. After all, second language learning, like translation, has to do with working out relationships between one language and another: the first language (L1) you know and the second language (L2) you do not. It seems self-evident that the points of difference between the two codes will constitute areas of difficulty for learners and that a contrastive analysis will therefore be of service in the design of a teaching programme.
It turns out, however, that the findings of such analysis cannot be directly applied in this way. Although learners do undoubtedly refer the second language they are learning (L2) to their own mother tongue (L1), in effect using translation as a strategy for learning, they do not do so in any regular or predictable manner. Linguistic difference is not a reliable measure of learning difficulty. The data of actual learner performance, as established by error analysis, call for an alternative theoretical explanation.
One possibility is that learners conform to a pre-programmed cognitive agenda and so acquire features of language in a particular order of acquisition. In this way they proceed through different interim stages of an interlanguage which is unique to the acquisition process itself. Enquiry into this possibility in Second Language Acquisition (SLA) research has been extensive.
There is another possibility. It might be that the categories of description typically used in contrastive analysis are not sufficiently sensitive to record certain aspects of learner language. Learners may be influenced by features of their L1 experience other than the most obvious forms of the code. Contrastive analysis has been mainly concerned with syntactic structure, but this is only one aspect of language, and one which, furthermore, inter-relates with others in complex ways. So it may be that the learners' difficulties do correspond to differences between their L1 and L2, but that we need a more sophisticated theory to discern what the differences are, a theory which takes a more comprehensive view of the nature of language by taking discourse into account.
Discourse analysis is potentially relevant to the problems of language pedagogy in two other ways. Firstly, it can provide a means of describing the eventual goal of learning, the ability to communicate, and so to cope with the conventions of use associated with certain discourses, written or spoken. Secondly, it can provide the means of describing the contexts which are set up in classrooms to induce the process of learning. In this case it can provide a basis for classroom research.
But the relevance of discourse analysis is not confined to language teaching. It can be used to investigate how language is used to sustain social institutions and manipulate opinion; how it is used in the expression of ideology and the exercise of power. Such investigations in critical discourse analysis seek to raise awareness of the social significance and the political implications of lanuse. Discourse analysis can also be directed to developing awareness of the significance of linguistic features in the interpretation of literary texts, the particular concern of stylistics.
In these and other cases, descriptive linguistics becomes applied linguistics to the extent that the descriptions can be shown to be relevant to an understanding of practical concerns associated with language use and learning. These concerns may take the form of quite specific problems: how to design a literacy programme, for example, or how to interpret linguistic evidence in a court of law (the concern of the growing field of forensic linguistics).
But other concerns for relevance are more general and more broadly educational. We began this book by noting how thoroughly language pervades our reality, how central it is to our lives as individuals and social beings. To remain unaware of it what it is and how it works is to run the risk of being deprived or exploited. Control of language is, to a considerable degree, control of power. Language is too important a human resource for its understanding to be kept confined to linguists. Language is so implicated in human life that we need to be as fully aware of it as possible, for otherwise we remain in ignorance of what constitutes our essential humanity.

Внимание, отключите Adblock

Вы посетили наш сайт со включенным блокировщиком рекламы!
Ссылка для скачивания станет доступной сразу после отключения Adblock!

Скачать

Рефераты по иностранным языкам Crystal D. Linguistics. Second ed. Penguin Book, 1990. – pp. 256-267. The main merit of research over the past few years is that people now have a

Оценок: 546 (Средняя 5 из 5)

Специалисты RetsCorp работают в digital-сфере более 7 лет. За это время мы разработали более 500+ успешных проектов. Основываясь на своем опыте и знании рынка, мы с уверенностью можем сказать, что будет работать, а что — нет. Заказывая создание лендинга для бизнеса в нашей студии, вы получаете работающие решения, необходимые именно вашему бизнесу.

Сотрудничая с нами, вы будете не клиентом, а нашим партнером. Благодаря этому мы будем развивать ваш бизнес как собственный. Мы так же как и вы заинтересованы в успехе проекта, поскольку ваша успешность будет нашей рекламой.