transformertc¶
Submodules¶
Package Contents¶
-
transformertc.__version__= 0.0.1¶
-
class
transformertc.BertTC(config: BertConfig, configtc: ConfigTC, tokenizer: BertTokenizer, model: BertForTokenClassification)¶ Bases:
objectBertTC class: BERT for token classification tasks.
- This class allows:
loading pretrained and/or fine-tuned BERT models;
fine-tuning (really, training) models;
using fine-tuned models for classification (inference).
It acts primarly as a wrapper around a transformer model, its config object, and tokenizer. Put togetherwith a handfull of useful functions. Namely save/load and fine-tune/classify.
-
config¶ pretained BertConfig object from transformers.
- Type
BertConfig
-
tokenizer¶ pretrained BertTokenizer from transformers.
- Type
BertTokenizer
-
model¶ pretrained BERT model with the token classification layers added in (but not necessarily trained).
- Type
BertForTokenClassification
-
tokenizer Sets the attributes.
-
save_pretrained(self, save_directory: str)¶ Save to a given directory.
-
to(self, device)¶ Send model to a specific device.
-
classmethod
from_pretrained(cls, model_path)¶ Load from a given path.
-
classmethod
create_from_pretrained(cls, model_name_or_path, labels, max_seq_length=0, task_format='BIO')¶
-
classify(self, texts: List[List[str]], batch_size: int = None, n_jobs: int = -1, progressbar: bool = False)¶ Classifiy a list of tokenized texts with this model.
- Parameters
texts (
listoflistofstr) – list of (word) tokenized documents (e.g. sentences). Example:[['This', 'is', '1'], ['And', 'this', 'is', '2']].batch_size (int) – size of batches to use. Defaults to None which will try to use len(texts) as the batch_size.
n_jobs (int) – number of threads/processes to use when converting texts to features (i.e.
InputFeaturesTC). Defaults to -1 which means a number equal to the number of CPU cores.progressbar (bool) – show a progressbar (via TQDM) for the classification progress.
- Returns
A list of lists of
ResultTCcorresponding to the list of texts.
-
finetune(self, dataloader, epochs: int = 4, lr: float = 5e-05, wdecay: float = 0.0, warmup_steps: int = 0, adam_epsilon: float = 1e-08, progressbar: bool = False)¶ Fine-tune pretrained model on a TC task.
- Parameters
dataloader (DataLoader) – a pytorch dataloader.
epochs (int) – number of epochs to fine-tune for.
lr (float) – the learning rate.
wdecay (float) – weight decay.
warmup_steps (int) – number of steps to run linear warmup for.
adam_epsilon (float) – epsilon parameter for Adam optimizer.
progressbar (bool) – use TQDM progress bar during fine tuning.