Open Access Open Access  Restricted Access Subscription or Fee Access

Short Text Classification Based on LDA and SVM

Chengfang Tan

Abstract


Short text classification is different from traditional document in its high sparse features, strong context dependency and noise, etc. To solve the problem, this paper proposes a kind of classification method based on LDA(Latent Dirichlet Allocation) and SVM(Vector Space Model). We use LDA to model the topic of text corpus, infer parameter by Gibbs sampling algorithm, calculate the model parameter indirectly, and determine the optimal number of topics through Bayes standard method. Then each document is represented as the probability distribution of fixed implied topic set, the latent topic-text matrix is obtained and SVM classifier is trained on this matrix. Compared with other classification methods, the experiments results verify the effectiveness and superiority, the effect of feature dimension reduction is obviously, and the value of Macro-P, Macro-R, Macro-F and Precision is improved efficiently.

Keywords


latent Dirichlet allocation, vector space model, short text classification, feature selection.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Disclaimer/Regarding indexing issue:

We have provided the online access of all issues and papers to the indexing agencies (as given on journal web site). It’s depend on indexing agencies when, how and what manner they can index or not. Hence, we like to inform that on the basis of earlier indexing, we can’t predict the today or future indexing policy of third party (i.e. indexing agencies) as they have right to discontinue any journal at any time without prior information to the journal. So, please neither sends any question nor expects any answer from us on the behalf of third party i.e. indexing agencies.Hence, we will not issue any certificate or letter for indexing issue. Our role is just to provide the online access to them. So we do properly this and one can visit indexing agencies website to get the authentic information.