API Documentation
API documentation (Javadoc) for the latest release:
ClearTK 2.0.0 API
You can also view the ClearTK 1.4.1 API documentation for the previous release, though this release is no longer supported.
Mailing Lists
The best place to ask questions about ClearTK is the main mailing list:
ClearTK-Users
We also have a separate mailing list, ClearTK-Developers, primarily for those with commit access to the ClearTK repository.
Maven Setup
ClearTK is built with Apache Maven and we strongly recommend that you also build your ClearTK-based projects with Maven. See the Maven Getting Started Guide if you are not already familiar with Maven.
To use ClearTK in your Maven-based project, add something like the following to your pom.xml:
<properties>
<cleartk.version>2.0.0</cleartk.version>
</properties>
...
<dependencies>
<dependency>
<groupId>org.cleartk</groupId>
<artifactId>cleartk-ml</artifactId>
<version>${cleartk.version}</version>
</dependency>
<dependency>
<groupId>org.cleartk</groupId>
<artifactId>cleartk-ml-liblinear</artifactId>
<version>${cleartk.version}</version>
</dependency>
...
</dependencies>
The example above declares dependencies on cleartk-ml
, the main ClearTK machine learning interfaces, and cleartk-ml-liblinear
, one of the more efficient implementations of these interfaces. For the full listing of ClearTK modules, see below.
ClearTK Modules
Machine learning
- cleartk-ml: the core machine learning APIs: classifiers, features, instances, feature extractors, feature encoders, etc.
- cleartk-ml-crfsuite: wrappers to use CRFsuite as a ClearTK classifier
- cleartk-ml-liblinear: wrappers to use LIBLINEAR as a ClearTK classifier
- cleartk-ml-libsvm: wrappers to use LIBSVM as a ClearTK classifier
- cleartk-ml-mallet: wrappers to use MALLET and GRMM as ClearTK classifiers
- cleartk-ml-opennlp-maxent: wrappers to use OpenNLP-maxent as a ClearTK classifier
- cleartk-ml-svmlight: wrappers to use SVM-light as a ClearTK classifier
- cleartk-ml-tksvmlight: wrappers to use Tree Kernels in SVM-light as a ClearTK classifier
Evaluation
- cleartk-eval: classes for evaluating pipelines - train/test, cross-validation, etc.
Type system
- cleartk-type-system: the official UIMA type system for ClearTK
- cleartk-corpus: readers and writers that operate on corpora using the ClearTK type system, including support for ACE 2005, CoNLL 2003, CoNLL 2005, Genia, PennTreebank, PropBank, TimeML, etc.
- cleartk-feature: feature extractors for the ClearTK type system, e.g. for extracting features from ClearTK tokens or ClearTK parse trees
Wrappers for external components
- cleartk-snowball: a wrapper around the Snowball stemmer
- cleartk-opennlp-tools: wrappers around the OpenNLP sentence segmenter, part-of-speech tagger, and syntactic parser
- cleartk-berkeleyparser: a wrapper around the Berkeley syntactic parser
- cleartk-clearnlp: a wrapper around ClearNLP, the successor to clearparser. Includes wrappers for its tokenizer, POS tagger, morphological analyzer (lemmatizer), dependency parser, and semantic role labeler.
- cleartk-maltparser: a wrapper around the Malt dependency parser
- cleartk-stanford-corenlp: a wrapper around the Stanford CoreNLP sentence segmenter, tokenizer, part-of-speech tagger, named-entity tagger, syntactic parser, dependency parser and coreference resolution system.
Home-grown components
- cleartk-token: sentence segmenters, tokenizers and part-of-speech taggers, including a segmenter based on java.text.BreakIterator, a PennTreebankTokenizer
- cleartk-timeml: models for extracting events, times and temporal relations, trained on TempEval 2013 data
Utility modules
- cleartk-test-util: test case base classes and utilities for testing licenses, parameter names, etc.
- cleartk-util: simple type-system agnostic readers and writers, and various utilities used by other ClearTK modules
Example code
Note: This module is only provided as source code. The code may change at any time. Never add cleartk-examples as a dependency.
- cleartk-examples: example code for part-of-speech tagging, BIO-chunking, bag-of-words document classification, etc.