본문 바로가기
Studies & Courses/NLP & Text Mining

[Text Mining] Introduction to Text Mining

by Air’s Big Data 2020. 3. 30.

Text Mining

 1. Introduction to Text Mining

 

 

Data mining

Data mining : a process of automatically extracting meaningful, useful, previously unknown and ultimately comprehensible information from large databases.

  • Descriptive: Understanding underlying processes or behavior

  • Predictive: Understanding underlying processes or behavior

Why Data Mining? :We are drowning in data, but starving for knowledge!

 

Cross-Industry Standard Process for Data Mining (CRISP-DM)

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment

 

Text mining

Text mining : a highly interdisciplinary research area, bringing together research insights from the fields of data mining, natural language processing, machine learning, and information retrieval.

  • Text mining = Statistical NLP + Data mining

Relations : DR, DM, IR, TM

 

 

 

 

Data Retrieval : finds records within a structured database.

  • Database type: Structured

  • Search mode: Goal-driven

Information Retrieval : finds a relevant information in unstructured information source.

  • Database type: Unstructured

  • Search mode: Goal-driven

Data Mining : discovers new knowledge through analysis of data.

  • Database type: Structured

  • Search mode: Opportunistic

Text Mining : discovers new knowledge through analysis of text.

  • Database type: Unstructured

  • Search mode: Opportunistic

The steps of text mining

1. Application understanding

2. Corpus generation

3. Data understanding

4. Text preprocessing

5. Search for patterns / modeling

6. Evaluation

7. Deployment

 

 

 

Text mining process

•Text preprocessing

•Feature generation

•Feature selection

•Core mining operation: Classification(Supervised), Clustering(Unsupervised)

•Analyzing the results

 

Text mining applications

•Classification

    –Spam detection(filtering)

    –Document classification

    –language identification

    –sentiment analysis

•Clustering

    –trend analysis, topic identification

•Web mining

    –trend analysis, opinion mining, ontology creation

•Classical NLP

    –text summarization

    –question answering ex) IBM’s Watson

    -information extraction ex) event detection in e-mail

 

 

Quizlet : https://quizlet.com/_89awfc?x=1qqt&i=184b21

'Studies & Courses > NLP & Text Mining' 카테고리의 다른 글

[Text Mining] Text Classification  (0) 2020.05.24
[Text Mining] Text Reprocessing  (0) 2020.04.02

댓글