[SOUND]. So, as we explained the different text representation tends to enable different analysis. In particular, we can gradually add more and more deeper analysis results to represent text data. And that would open up a more interesting representation opportunities and also analysis capacities. So, this table summarizes what we have just seen. So the first column shows the text representation. The second visualizes the generality of such a representation. Meaning whether we can do this kind of representation accurately for all the text data or only some of them. And the third column shows the enabled analysis techniques. And the final column shows some examples of application that can be achieved through this level of representation. So let's take a look at them. So as a stream text can only be processed by stream processing algorithms. It's very robust, it's general. And there was still some interesting applications that can be down at this level. For example, compression of text. Doesn't necessarily need to know the word boundaries. Although knowing word boundaries might actually also help. Word base repetition is a very important level of representation. It's quite general and relatively robust, indicating they were a lot of analysis techniques. Such as word relation analysis, topic analysis and sentiment analysis. And there are many applications that can be enabled by this kind of analysis. For example, thesaurus discovery has to do with discovering related words. And topic and opinion related applications are abounded. And there are, for example, people might be interesting in knowing the major topics covered in the collection of texts. And this can be the case in research literature. And scientists want to know what are the most important research topics today. Or customer service people might want to know all our major complaints from their customers by mining their e-mail messages. And business intelligence people might be interested in understanding consumers' opinions about their products and the competitors' products to figure out what are the winning features of their products. And, in general, there are many applications that can be enabled by the representation at this level. Now, moving down, we'll see we can gradually add additional representations. By adding syntactical structures, we can enable, of course, syntactical graph analysis. We can use graph mining algorithms to analyze syntactic graphs. And some applications are related to this kind of representation. For example, stylistic analysis generally requires syntactical structure representation. We can also generate the structure based features. And those are features that might help us classify the text objects into different categories by looking at the structures sometimes in the classification. It can be more accurate. For example, if you want to classify articles into different categories corresponding to different authors. You want to figure out which of the k authors has actually written this article, then you generally need to look at the syntactic structures. When we add entities and relations, then we can enable other techniques such as knowledge graph and answers, or information network and answers in general. And this analysis enable applications about entities. For example, discovery of all the knowledge and opinions about real world entities. You can also use this level representation to integrate everything about anything from scaled resources. Finally, when we add logical predicates, that would enable large inference, of course. And this can be very useful for integrating analysis of scattered knowledge. For example, we can also add ontology on top of the, extracted the information from text, to make inferences. A good of example of application in this enabled by this level of representation, is a knowledge assistant for biologists. And this program that can help a biologist manage all the relevant knowledge from literature about a research problem such as understanding functions of genes. And the computer can make inferences about some of the hypothesis that the biologist might be interesting. For example, whether a gene has a certain function, and then the intelligent program can read the literature to extract the relevant facts, doing compiling and information extracting. And then using a logic system to actually track that's the answers to researchers questioning about what genes are related to what functions. So in order to support this level of application we need to go as far as logical representation. Now, this course is covering techniques mainly based on word based representation. And these techniques are general and robust and that's more widely used in various applications. In fact, in virtually all the text mining applications you need this level of representation and then techniques that support analysis of text in this level. But obviously all these other levels can be combined and should be combined in order to support the sophisticated applications. So to summarize, here are the major takeaway points. Text representation determines what kind of mining algorithms can be applied. And there are multiple ways to represent the text, strings, words, syntactic structures, entity-relation graphs, knowledge predicates, etc. And these different representations should in general be combined in real applications to the extent we can. For example, even if we cannot do accurate representations of syntactic structures, we can state that partial structures strictly. And if we can recognize some entities, that would be great. So in general we want to do as much as we can. And when different levels are combined together, we can enable a richer analysis, more powerful analysis. This course however focuses on word-based representation. Such techniques have also several advantage, first of they are general and robust, so they are applicable to any natural language. That's a big advantage over other approaches that rely on more fragile natural language processing techniques. Secondly, it does not require much manual effort, or sometimes, it does not require any manual effort. So that's, again, an important benefit, because that means that you can apply it directly to any application. Third, these techniques are actually surprisingly powerful and effective form in implications. Although not all of course as I just explained. Now they are very effective partly because the words are invented by humans as basically units for communications. So they are actually quite sufficient for representing all kinds of semantics. So that makes this kind of word-based representation all so powerful. And finally, such a word-based representation and the techniques enable by such a representation can be combined with many other sophisticated approaches. So they're not competing with each other. [MUSIC]