Text Mining
1. Introduction to Text Mining
Data mining
Data mining : a process of automatically extracting meaningful, useful, previously unknown and ultimately comprehensible information from large databases.
-
Descriptive: Understanding underlying processes or behavior
-
Predictive: Understanding underlying processes or behavior
Why Data Mining? :We are drowning in data, but starving for knowledge!
★Cross-Industry Standard Process for Data Mining (CRISP-DM)
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
Text mining
Text mining : a highly interdisciplinary research area, bringing together research insights from the fields of data mining, natural language processing, machine learning, and information retrieval.
-
Text mining = Statistical NLP + Data mining
Relations : DR, DM, IR, TM
Data Retrieval : finds records within a structured database.
-
Database type: Structured
-
Search mode: Goal-driven
Information Retrieval : finds a relevant information in unstructured information source.
-
Database type: Unstructured
-
Search mode: Goal-driven
Data Mining : discovers new knowledge through analysis of data.
-
Database type: Structured
-
Search mode: Opportunistic
Text Mining : discovers new knowledge through analysis of text.
-
Database type: Unstructured
-
Search mode: Opportunistic
The steps of text mining
1. Application understanding
2. Corpus generation
3. Data understanding
4. Text preprocessing
5. Search for patterns / modeling
6. Evaluation
7. Deployment
Text mining process
•Text preprocessing
•Feature generation
•Feature selection
•Core mining operation: Classification(Supervised), Clustering(Unsupervised)
•Analyzing the results
Text mining applications
•Classification
–Spam detection(filtering)
–Document classification
–language identification
–sentiment analysis
•Clustering
–trend analysis, topic identification
•Web mining
–trend analysis, opinion mining, ontology creation
•Classical NLP
–text summarization
–question answering ex) IBM’s Watson
-information extraction ex) event detection in e-mail
'Studies & Courses > NLP & Text Mining' 카테고리의 다른 글
[Text Mining] Text Classification (0) | 2020.05.24 |
---|---|
[Text Mining] Text Reprocessing (0) | 2020.04.02 |
댓글