Paper Review

[PAPER REVIEW 231228] NLP review paper

Sungyeon Kim 2023. 12. 28. 03:21

Natural language processing: state of the art, current trends and challenges

Jul. 2022

 

1. NLP

 

1) Natural Language Understanding (NLU) = Linguistics

 

(1) Phonology - sound

 

(2) Morphology - the smallest units of meaning

   e.g., precancellation -> pre (prefix), cancella (root), -tion (suffix)

   a. Lexical morpheme (e.g., table, chair)

   b. Grammatical morphemes (e.g., Worked, Consulting)

   c. Bound morphemes (e.g., -ed, -ing)

      c-1. inflectional morphemes: change the different grammatical categories

      c-2. derivational morphemes: change the semantic meaning of the word

 

(3) Lexical

   a. part-of-speech

   b. stemming: remove the suffix

   c. lemmatization: correct basic form

 

(4) Syntax - sentence structure

   -> doesn't support stemming or lematization

 

(5) Semantics - literal meaning

(6) Pragmatics - inferred meaning

   e.g., "Do you know what time is it?"

   Semantics: "Asking for the current time"

   Pragmatics: "Expressing resentment to someone"

 

2) Natural Language Generation (NLG)

 

(1) Components and Levels of Representaiton

   a. Content selection

   b. Textual Organization

   c. Linguistic Resources

   d. Realization

 

2. NLP tasks

1) Automatic Summarization

 

2) Co-Reference Resolution - Find words used in different ways to describe an arbitrary entity and connect them to the same entity

 

3) Discourse Analysis - chat data

 

4) Machine translation

 

5) Morphological Segmentation - breaking words into individual meaning-bearing morphemes

 

6) Named entity recognition (NER)

 

7) Optical Character Recognition

 

8) Part Of Speech Tagging

 

3. Datasets

1) Language Modelling

(1) Salesforce's WikiText-103: 103 million tokens

(2) WikiText-2: 2 million tokens

(3) Penn Treebank piece of the Wall Street Diary corpus: 929,000