27 12-43-26computationallexicography1

21
Lexicografie computationala Feb., 2012 Anca Dinu University of Bucharest

Upload: iul7777

Post on 16-Dec-2015

222 views

Category:

Documents


1 download

DESCRIPTION

c

TRANSCRIPT

  • Lexicografie computationala

    Feb., 2012

    Anca Dinu

    University of Bucharest

  • Intoducere

    Lexicologia computationala este utilizarea

    calculatoarealor in studiul lexiconului

    (teoretic).

    Lexicografia computationala inseamna

    crearea de machine readable dictionaries

    (MRD) (practic).

    Se folosesc uneori ca sinonime.

  • Introducere

    MRD sunt resurse esentiale pentru NLP

    (Summarization, question answering,

    inference, etc).

    Importanta lor este si mai mare pentru limbile

    cu o morfologie bogata. Au o componenta

    generativa care construieste formele

    inflectionale pornind de la leme si reguli de

    formare.

  • Directii in CL

    Adnotare de corpus (de obicei in XML):

    Markup Languages permit crearea de

    corpusuri adnotate standardizat din care apoi

    se pot extrage automat sau semi-automat

    (Corpus Pattern Analysis) date pentru

    crearea de lexicoane (structura argumentala,

    roluri tematice, etc.)

  • Directii in CL

    Creare Lexical Knowledge Bases (LKBs).

    Contin aceleasi informatii ca un dictionar

    printat, avand in plus informatii sintactice,

    semantice si relationale (ontologii)

    Ex: WordNet; FrameNet

  • WordNet

    Nouns, verbs, adjectives and adverbs are

    grouped into sets of cognitive synonyms

    (synsets), each expressing a distinct

    concept. Synsets are interlinked by means of

    conceptual-semantic and lexical relations.

    The resulting network of meaningfully related

    words and concepts can be navigated with

    the browser.

  • Annotation: principii generale

    Annotation schemata should focus on a

    single coherent theme:

    Different linguistic phenomena should be

    annotated separately over the same corpus

    Annotations must be consistent with each

    other:

    Unification and merging of multiple

    annotation is necessary (standard XML)

  • Examples of Semantic Annotation

    Predicators and their named arguments:

    [The man]agent painted [the wall]patient.

    Anaphors and their antecedents:

    [The protein] inhibits growth in yeast. [It] blocks

    production . . .

    Acronyms and their long forms:

    [Platelet-derived growth factor] (known as [pdgf])

    impacts . . .

    Semantic Typing of entities:

    [The man]human fired [the gun]firearm.

  • Probleme cu LKB existente

    Organizarea traditionala a lexicoanelor este

    statica, i.e. presupune ca intelesul unui cuvant

    pote fi definit exhaustiv printr-o enumerare a

    sensurilor (tip lista).

    In consecinta, cand o problema de interpretare a

    limbajului natural se loveste de ambiguitate

    lexicala, sistemul incearca sa selecteze cea mai

    apropiata definitie din lista oferita de lexicon

  • Probleme cu LKB existente

    2 dezavantaje:

    Trebuie specificate a priori toate contextele posibile in care poate aparea un cuvant; in

    caz contrar, rezulta acoperire incompleta;

    Nu se poate explica/prezice utilizarea

    creativa a cuvintelor in contexte noi

  • Solutie : Generative Lexicon (GL)

    James Pustejovsky: 1995 (cartea Generative

    lexicon), 2001, 2005

    De citit pt data viitoare articolul Type Theory and Lexical Decomposition

  • Language meaning is compositional.

    Compositionality is a desirable property of a

    semantic model.

    Many linguistic phenomena appear non-

    compositional.

    GL exploits richer representations and fixes

    the holes in the compositionality model.

    Assumptions for GL

  • Exemple de fapte lingvistice care par de natura non-compozitionala

    intensionality (think),

    binding (she),

    quantification (most),

    interrogatives (who),

    focus (only), and

    presuppositions (the king of France).

  • The meaning of a complex expression is

    determined by its structure and the meanings

    of its constituents.

    Questions . . .

    1. What is the nature of the structure?

    2. What is the meaning of a constituent?

    3. What counts as a constituent?

    Compositionality

  • (1) a. Mary began [to eat her breakfast].

    b. Mary began [eating her breakfast].

    c. Mary began [her breakfast].

    (2) a. Mary enjoyed her beer.

    b. John enjoys his coffee in the morning.

    c. Bill enjoyed the movie.

    Challenges to Simple Compositionality

  • Challenges to Simple Compositionality

    (3)a. The woman baked a potato in the oven.

    b. The woman baked a cake in the oven.

    (4) a. John swept.

    b. John swept the floor.

    c. John swept the dirt into the corner.

    d. John swept the dirt off the sidewalk.

    e. John swept the floor clean.

    f. John swept the dirt into a pile.

  • Challenges to Simple Compositionality

    shovel, rake, shave, weed.

    (5) a. John whistled.

    b. John whistled at the dog.

    c. John whistled a tune.

    d. John whistled a warning.

    e. John whistled his appreciation.

    f. John whistled to the dog to come.

    yell, snap, whisper.

  • (6) Externally Caused Events: break, etc.

    a. The vase broke.

    b. Mary broke the vase.

    c. The storm broke the window.

    (7) Internally Caused Events (unacusatives): decay,

    bloom, etc.

    a. The flowers bloomed early.

    b. *The gardener bloomed the flowers.

    Challenges to Simple Compositionality

  • 1. What is the nature of the function?

    2. What does it apply to; i.e., what can be an

    argument?

    1. John loves Mary.

    2. love(Arg1,Arg2)

    3. Apply love(Arg1,Arg2) to Mary

    4. love(Arg1,Mary)

    5. Apply love(Arg1,Mary) to John

    6. love(John,Mary)

    Compunere = aplicare de functii

  • Lambda Calcul

    (a) e is a type.

    (b) t is a type.

    (c) If a and b are types, then a -> b is a type.

    A simple type tree:

    t

    e e->t

    Function Application: If is of type a, and is of

    type a -> b, then () is of type b.

  • Lambda calcul

    data viitoare