Abstract:
In recent years, conditional random fields is widely used in various types of sequence data labeling, hundreds of millions of features will be extended out in the context modeling using CRFs for Chinese part of speech tagging, feature template set is optimized after in-depth analysis of the context features.We further studied the relations of the feature template set and the training model size, tagging accuracy for Chinese part-of-speech tagging via using maximum entropy model.Experimental results show that optimized feature set of templates is the overall optimum.