`transformertc`¶

Submodules¶

Package Contents¶

transformertc.__version__ = 0.0.1¶

class transformertc.BertTC(config: BertConfig, configtc: ConfigTC, tokenizer: BertTokenizer, model: BertForTokenClassification)¶

Bases: object

BertTC class: BERT for token classification tasks.

This class allows:

loading pretrained and/or fine-tuned BERT models;
fine-tuning (really, training) models;
using fine-tuned models for classification (inference).

It acts primarly as a wrapper around a transformer model, its config object, and tokenizer. Put togetherwith a handfull of useful functions. Namely save/load and fine-tune/classify.

config¶

pretained BertConfig object from transformers.

Type: BertConfig

configtc¶

ConfigTC object.

Type: ConfigTC

tokenizer¶

pretrained BertTokenizer from transformers.

Type: BertTokenizer

model¶

pretrained BERT model with the token classification layers added in (but not necessarily trained).

Type: BertForTokenClassification

tokenizer: Sets the attributes.

save_pretrained(self, save_directory: str)¶: Save to a given directory.

to(self, device)¶: Send model to a specific device.

classmethod from_pretrained(cls, model_path)¶: Load from a given path.

classmethod create_from_pretrained(cls, model_name_or_path, labels, max_seq_length=0, task_format='BIO')¶

classify(self, texts: List[List[str]], batch_size: int = None, n_jobs: int = -1, progressbar: bool = False)¶

Classifiy a list of tokenized texts with this model.

Parameters

texts (list of list of str) – list of (word) tokenized documents (e.g. sentences). Example: [['This', 'is', '1'], ['And', 'this', 'is', '2']].
batch_size (int) – size of batches to use. Defaults to None which will try to use len(texts) as the batch_size.
n_jobs (int) – number of threads/processes to use when converting texts to features (i.e. InputFeaturesTC). Defaults to -1 which means a number equal to the number of CPU cores.
progressbar (bool) – show a progressbar (via TQDM) for the classification progress.

Returns

A list of lists of ResultTC corresponding to the list of texts.

finetune(self, dataloader, epochs: int = 4, lr: float = 5e-05, wdecay: float = 0.0, warmup_steps: int = 0, adam_epsilon: float = 1e-08, progressbar: bool = False)¶

Fine-tune pretrained model on a TC task.

Parameters

dataloader (DataLoader) – a pytorch dataloader.
epochs (int) – number of epochs to fine-tune for.
lr (float) – the learning rate.
wdecay (float) – weight decay.
warmup_steps (int) – number of steps to run linear warmup for.
adam_epsilon (float) – epsilon parameter for Adam optimizer.
progressbar (bool) – use TQDM progress bar during fine tuning.

transformertc¶

Submodules¶

Package Contents¶

`transformertc`¶