Machine Learning for UIMA

ClearTK is a framework for developing machine learning and natural language processing components within the Apache Unstructured Information Management Architecture.

View on GitHub Documentation

Train Classifiers

ClearTK provides a common interface and wrappers for popular machine learning libraries such as SVMlight, LIBSVM, LIBLINEAR, OpenNLP MaxEnt, and Mallet. ClearTK’s interfaces support simple classification, sequence classification, BIO-style chunking classification and more.

Extract Features

ClearTK provides a rich feature extraction library that can be used with any of the machine learning classifiers. Under the covers, ClearTK understands each of the native machine learning libraries and translates your features into a format appropriate to whatever model you’re using.

Parse Language

ClearTK provides UIMA wrappers for common natural language processing (NLP) tools including the Snowball stemmer, OpenNLP tools, MaltParser dependency parser, and Stanford CoreNLP. And it provides UIMA readers for corpora including the Penn Treebank, ACE 2005, CoNLL 2003, Genia, TimeBank and TempEval.