lexicografie computationala feb., 2012 anca dinu university of bucharest

21
Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Upload: ira-fletcher

Post on 18-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Lexicografie computationala

Feb., 2012

Anca Dinu

University of Bucharest

Page 2: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Intoducere

Lexicologia computationala este utilizarea calculatoarealor in studiul lexiconului (teoretic).

Lexicografia computationala inseamna crearea de machine readable dictionaries (MRD) (practic).

Se folosesc uneori ca sinonime.

Page 3: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Introducere

MRD sunt resurse esentiale pentru NLP (Summarization, question answering, inference, etc).

Importanta lor este si mai mare pentru limbile cu o morfologie bogata. Au o componenta generativa care construieste formele inflectionale pornind de la leme si reguli de formare.

Page 4: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Directii in CL

Adnotare de corpus (de obicei in XML): Markup Languages permit crearea de corpusuri adnotate standardizat din care apoi se pot extrage automat sau semi-automat (Corpus Pattern Analysis) date pentru crearea de lexicoane (structura argumentala, roluri tematice, etc.)

Page 5: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Directii in CL

Creare Lexical Knowledge Bases (LKBs). Contin aceleasi informatii ca un dictionar printat, avand in plus informatii sintactice, semantice si relationale (ontologii)

Ex: WordNet; FrameNet

Page 6: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

WordNet

Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser.

Page 7: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Annotation: principii generale

Annotation schemata should focus on a single coherent theme:

Different linguistic phenomena should be annotated separately over the same corpus

Annotations must be consistent with each other:

Unification and merging of multiple annotation is necessary (standard XML)

Page 8: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Examples of Semantic Annotation

• Predicators and their named arguments:

[The man]agent painted [the wall]patient.

• Anaphors and their antecedents:

[The protein] inhibits growth in yeast. [It] blocks

production . . .

• Acronyms and their long forms:

[Platelet-derived growth factor] (known as [pdgf]) impacts . . .

• Semantic Typing of entities:

[The man]human fired [the gun]firearm.

Page 9: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Probleme cu LKB existente

Organizarea traditionala a lexicoanelor este statica, i.e. presupune ca intelesul unui cuvant pote fi definit exhaustiv printr-o enumerare a sensurilor (tip lista).

In consecinta, cand o problema de interpretare a limbajului natural se loveste de ambiguitate lexicala, sistemul incearca sa selecteze cea mai apropiata definitie din lista oferita de lexicon

Page 10: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Probleme cu LKB existente

2 dezavantaje:

Trebuie specificate a priori “toate” contextele posibile in care poate aparea un cuvant; in caz contrar, rezulta acoperire incompleta;

Nu se poate explica/prezice utilizarea creativa a cuvintelor in contexte noi

Page 11: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Solutie : Generative Lexicon (GL)

James Pustejovsky: 1995 (cartea Generative lexicon), 2001, 2005

De citit pt data viitoare articolul “Type Theory and Lexical Decomposition”

Page 12: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Language meaning is compositional. Compositionality is a desirable property of a

semantic model. Many linguistic phenomena appear non-

compositional. GL exploits richer representations and fixes

the holes in the compositionality model.

Assumptions for GL

Page 13: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Exemple de fapte lingvistice care par de natura non-compozitionala

• intensionality (think),

• binding (she),

• quantification (most),

• interrogatives (who),

• focus (only), and

• presuppositions (the king of France).

Page 14: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

The meaning of a complex expression is determined by its structure and the meanings of its constituents.

Questions . . .

1. What is the nature of the structure?

2. What is the meaning of a constituent?

3. What counts as a constituent?

Compositionality

Page 15: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

(1) a. Mary began [to eat her breakfast]. b. Mary began [eating her breakfast]. c. Mary began [her breakfast].

(2) a. Mary enjoyed her beer. b. John enjoys his coffee in the morning. c. Bill enjoyed the movie.

Challenges to Simple Compositionality

Page 16: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Challenges to Simple Compositionality

(3)a. The woman baked a potato in the oven. b. The woman baked a cake in the oven.

(4) a. John swept. b. John swept the floor. c. John swept the dirt into the corner. d. John swept the dirt off the sidewalk. e. John swept the floor clean. f. John swept the dirt into a pile.

Page 17: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Challenges to Simple Compositionality

shovel, rake, shave, weed.

(5) a. John whistled.b. John whistled at the dog.c. John whistled a tune.d. John whistled a warning.e. John whistled his appreciation.f. John whistled to the dog to come.

yell, snap, whisper.

Page 18: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

(6) Externally Caused Events: break, etc. a. The vase broke. b. Mary broke the vase. c. The storm broke the window.

(7) Internally Caused Events (unacusatives): decay, bloom, etc.

a. The flowers bloomed early. b. *The gardener bloomed the flowers.

Challenges to Simple Compositionality

Page 19: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

1. What is the nature of the function?

2. What does it apply to; i.e., what can be an argument?

1. John loves Mary.

2. love(Arg1,Arg2)

3. Apply love(Arg1,Arg2) to Mary

4. love(Arg1,Mary)

5. Apply love(Arg1,Mary) to John

6. love(John,Mary)

Compunere = aplicare de functii

Page 20: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Lambda Calcul

(a) e is a type.

(b) t is a type.

(c) If a and b are types, then a -> b is a type.

A simple type tree:

t

e e->t

Function Application: If α is of type a, and β is of

type a -> b, then β(α) is of type b.

Page 21: Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Lambda calcul data viitoare