Chat Bot Using Microsoft's DialoGPT Model

Chat Bot Using Microsoft's DialoGPT Model

Hello, Folks!

I am a Researcher who's practicing Data Science, Machine Learning, Deep Learning, and Artificial Intelligence with a Cloud Computing Platform.

In this article, we are going to discuss the Conversational Bot which answers the real-time question asked by users. I am going to create the basic Conversational Bot with the help of Microsoft's DialoGPT Model.

Let's understand What is Conversational Bot? There are two definitions for that:

First, in simple words, it is a program that answers the questions asked by users in real-time.

Second, in technical words, Chatbot, short for chatterbot, is an artificial intelligence (AI) feature that can be embedded and used through any major messaging application.

There are many libraries that are used in creating conversational chatbots. If you are a beginner then start with this tutorial where I have demonstrated how to program, train, test, and deploy the model on Hugging Face Platform.

Let's Setup the initial requirements

from google.colab import drive
drive.mount('/content/drive/')
Mounted at /content/drive/

Install Transformers

!pip -q install transformers
|████████████████████████████████| 3.1 MB 5.3 MB/s 
|████████████████████████████████| 596 kB 46.4 MB/s 
|████████████████████████████████| 59 kB 6.7 MB/s 
|████████████████████████████████| 3.3 MB 41.3 MB/s 
|████████████████████████████████| 895 kB 51.4 MB/s 
import os
os.chdir("/content/drive/My Drive")
import glob
import logging
import os
import pickle
import random
import re
import shutil
from typing import Dict, List, Tuple

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler
from tqdm.notebook import tqdm, trange

from pathlib import Path

from transformers import (
    MODEL_WITH_LM_HEAD_MAPPING,
    WEIGHTS_NAME,
    AdamW,
    AutoConfig,
    PreTrainedModel,
    PreTrainedTokenizer,
    get_linear_schedule_with_warmup,
)


try:
    from torch.utils.tensorboard import SummaryWriter
except ImportError:
    from tensorboardX import SummaryWriter

Now let's get DataSets from Kaggle

!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/kaggle.json
!kaggle datasets download vedantpandya/harry-potter-1csv -f "Harry Potter 1.csv"
Downloading Harry%20Potter%201.csv to /content/drive/My Drive
  0% 0.00/65.9k [00:00<?, ?B/s]
100% 65.9k/65.9k [00:00<00:00, 8.55MB/s]
!head Harry%20Potter%201.csv
Character; Sentence
Dumbledore; I should've known that you would be here, Professor McGonagall.
McGonagall; Good evening, Professor Dumbledore.
McGonagall; Are the rumors true, Albus?
Dumbledore; I'm afraid so, professor.
Dumbledore; The good and the bad.
McGonagall; And the boy?
Dumbledore; Hagrid is bringing him.
McGonagall; Do you think it wise to trust Hagrid with something as important as this?
Dumbledore; Ah, Professor, I would trust Hagrid with my life.
data = pd.read_csv("Harry%20Potter%201.csv", sep=";")
data.sample(6)
Character Sentence
255 Harry Hagrid, what exactly are these things?
1292 Hagrid I shouldn't have told you that.
931 Hermione Whoo-hoo!
546 Percy They like to change.
1362 Ron Ahh! Harry!
233 Harry Oh, nice to meet you.

I'm using Harry's Dialogues for training.

CHARACTER_NAME = 'Harry'
data.rename(columns = {'Character' : 'name', 'Sentence' : 'line'}, inplace = True)
contexted = []
n = 7

for i in data[data.name == CHARACTER_NAME].index:
  if i < n:
    continue
  row = []
  prev = i - 1 - n # we additionally substract 1, so row will contain current responce and 7 previous responces  
  for j in range(i, prev, -1):
    row.append(data.line[j])
  contexted.append(row)

columns = ['response', 'context'] 
columns = columns + ['context/' + str(i) for i in range(n - 1)]

df = pd.DataFrame.from_records(contexted, columns=columns)
df.sample(6)
response context context/0 context/1 context/2 context/3 context/4 context/5
8 Do you...? It's just, I've never talked to a snake before. Can you hear me? ...watching people press their ugly faces in o... He doesn't understand what it's like, lying th... Sorry about him. He's boring. He's asleep!
136 I found him! Besides, if anyone cared to notice my eyebrows... I don't appreciate the insinuation, Longbottom. You to set my bloody kneecups on fire! No, that's all I need! I'll do the countercurse! How? I can barely stand at all! You have got to start standing up to people, N...
114 Give it here or I'll knock you off your broom! What an idiot. Besides, you don't even know how to fly. You heard what Madam Hooch said. Harry, no way! Bit beyond your reach? What's the matter, Potter? How about on the roof?
37 Just Harry. I mean, I'm just Harry. I mean...l can't be a wizard. No, you've made a mistake. And a thumping good one, I'd wager, once you'r... A wizard. I'm a what? You're a wizard, Harry.
130 Stand there. There. Look in properly. Go on. I only see us. Come on. Come. Come look, it's my parents! Now, come on! There's something you've got to see. Why? Ron, Ron, come on. Get out of bed!
90 Hey, he's gone! I got about six of him. I've got Dumbledore! They've only got one good jump in them to begi... Oh, that's rotten luck. Watch it! I've got about 500 myself. Each pack's got a famous witch or wizard.
trn_df, val_df = train_test_split(df, test_size=0.1)
trn_df.head()
response context context/0 context/1 context/2 context/3 context/4 context/5
38 Dear Mr. Potter, Anything you couldn't explain, when you were a... Well, Just Harry, did you ever make anything h... Just Harry. I mean, I'm just Harry. I mean...l can't be a wizard. No, you've made a mistake. And a thumping good one, I'd wager, once you'r...
138 I knew the name sound familiar. For the discovery of the twelve uses of dragon... Go on! Dumbledore is particularly famous for his defe... I found him! Besides, if anyone cared to notice my eyebrows... I don't appreciate the insinuation, Longbottom. You to set my bloody kneecups on fire!
136 I found him! Besides, if anyone cared to notice my eyebrows... I don't appreciate the insinuation, Longbottom. You to set my bloody kneecups on fire! No, that's all I need! I'll do the countercurse! How? I can barely stand at all! You have got to start standing up to people, N...
24 Get off! Give me that letter! Give me that! Mummy, what's happening? Stop it! No, sir, not one blasted, miserable... Not one single bloody letter. Not one! No blasted letters today! No, sir.
65 He killed my parents, didn't he? You seem very quiet. You all right, Harry? Happy birthday. Harry! Harry! Terrible yes, but great After all, He-Who-Must-Not-Be-Named did great ... But I think it is clear that we can expect gre...
def construct_conv(row, tokenizer, eos = True):
    flatten = lambda l: [item for sublist in l for item in sublist]
    conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
    conv = flatten(conv)
    return conv

class ConversationDataset(Dataset):
    def __init__(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):

        block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

        directory = args.cache_dir
        cached_features_file = os.path.join(
            directory, args.model_type + "_cached_lm_" + str(block_size)
        )

        if os.path.exists(cached_features_file) and not args.overwrite_cache:
            logger.info("Loading features from cached file %s", cached_features_file)
            with open(cached_features_file, "rb") as handle:
                self.examples = pickle.load(handle)
        else:
            logger.info("Creating features from dataset file at %s", directory)

            self.examples = []
            for _, row in df.iterrows():
                conv = construct_conv(row, tokenizer)
                self.examples.append(conv)

            logger.info("Saving features into cached file %s", cached_features_file)
            with open(cached_features_file, "wb") as handle:
                pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, item):
        return torch.tensor(self.examples[item], dtype=torch.long)
def load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False):
    return ConversationDataset(tokenizer, args, df_val if evaluate else df_trn)


def set_seed(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(args.seed)


def _sorted_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
    ordering_and_checkpoint_path = []

    glob_checkpoints = glob.glob(os.path.join(args.output_dir, "{}-*".format(checkpoint_prefix)))

    for path in glob_checkpoints:
        if use_mtime:
            ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
        else:
            regex_match = re.match(".*{}-([0-9]+)".format(checkpoint_prefix), path)
            if regex_match and regex_match.groups():
                ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))

    checkpoints_sorted = sorted(ordering_and_checkpoint_path)
    checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
    return checkpoints_sorted


def _rotate_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> None:
    if not args.save_total_limit:
        return
    if args.save_total_limit <= 0:
        return

    checkpoints_sorted = _sorted_checkpoints(args, checkpoint_prefix, use_mtime)
    if len(checkpoints_sorted) <= args.save_total_limit:
        return

    number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - args.save_total_limit)
    checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
    for checkpoint in checkpoints_to_be_deleted:
        logger.info("Deleting older checkpoint [{}] due to args.save_total_limit".format(checkpoint))
        shutil.rmtree(checkpoint)

Now let's Build the Model

I have used Microsoft/DialoGPT-small Model for this ChatBot. It has 3 variants: Small, Medium, and Large.

from transformers import AutoModelWithLMHead, AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")
Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/641 [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]


/usr/local/lib/python3.7/dist-packages/transformers/models/auto/modeling_auto.py:698: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
  FutureWarning,



Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]
logger = logging.getLogger(__name__)

MODEL_CONFIG_CLASSES = list(MODEL_WITH_LM_HEAD_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
class Args():
    def __init__(self):
        self.output_dir = 'output-small'
        self.model_type = 'gpt2'
        self.model_name_or_path = 'microsoft/DialoGPT-small'
        self.config_name = 'microsoft/DialoGPT-small'
        self.tokenizer_name = 'microsoft/DialoGPT-small'
        self.cache_dir = 'cached'
        self.block_size = 512
        self.do_train = True
        self.do_eval = True
        self.evaluate_during_training = False
        self.per_gpu_train_batch_size = 4
        self.per_gpu_eval_batch_size = 4
        self.gradient_accumulation_steps = 1
        self.learning_rate = 5e-5
        self.weight_decay = 0.0
        self.adam_epsilon = 1e-8
        self.max_grad_norm = 1.0
        self.num_train_epochs = 16
        self.max_steps = -1
        self.warmup_steps = 0
        self.logging_steps = 1000
        self.save_steps = 3500
        self.save_total_limit = None
        self.eval_all_checkpoints = False
        self.no_cuda = False
        self.overwrite_output_dir = True
        self.overwrite_cache = True
        self.should_continue = False
        self.seed = 42
        self.local_rank = -1
        self.fp16 = False
        self.fp16_opt_level = 'O1'

args = Args()

It's time to Train and Evaluate the Model

Training Function

def train(args, train_dataset, model: PreTrainedModel, tokenizer: PreTrainedTokenizer) -> Tuple[int, float]:
    if args.local_rank in [-1, 0]:
        tb_writer = SummaryWriter()

    args.train_batch_size = args.per_gpu_train_batch_size * max(1, args.n_gpu)

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    train_sampler = RandomSampler(train_dataset) if args.local_rank == -1 else DistributedSampler(train_dataset)
    train_dataloader = DataLoader(
        train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, collate_fn=collate, drop_last = True
    )

    if args.max_steps > 0:
        t_total = args.max_steps
        args.num_train_epochs = args.max_steps // (len(train_dataloader) // args.gradient_accumulation_steps) + 1
    else:
        t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs

    model = model.module if hasattr(model, "module") else model 
    model.resize_token_embeddings(len(tokenizer))


    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
            "weight_decay": args.weight_decay,
        },
        {"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], "weight_decay": 0.0},
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
    scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=args.warmup_steps, num_training_steps=t_total
    )

    if (
        args.model_name_or_path
        and os.path.isfile(os.path.join(args.model_name_or_path, "optimizer.pt"))
        and os.path.isfile(os.path.join(args.model_name_or_path, "scheduler.pt"))
    ):
        optimizer.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "optimizer.pt")))
        scheduler.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "scheduler.pt")))

    if args.fp16:
        try:
            from apex import amp
        except ImportError:
            raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
        model, optimizer = amp.initialize(model, optimizer, opt_level=args.fp16_opt_level)

    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    if args.local_rank != -1:
        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True
        )

    logger.info("**** Running training ****")
    logger.info("  Num examples = %d", len(train_dataset))
    logger.info("  Num Epochs = %d", args.num_train_epochs)
    logger.info("  Instantaneous batch size per GPU = %d", args.per_gpu_train_batch_size)
    logger.info(
        "  Total train batch size (w. parallel, distributed & accumulation) = %d",
        args.train_batch_size
        * args.gradient_accumulation_steps
        * (torch.distributed.get_world_size() if args.local_rank != -1 else 1),
    )
    logger.info("  Gradient Accumulation steps = %d", args.gradient_accumulation_steps)
    logger.info("  Total optimization steps = %d", t_total)

    global_step = 0
    epochs_trained = 0
    steps_trained_in_current_epoch = 0
    if args.model_name_or_path and os.path.exists(args.model_name_or_path):
        try:
            checkpoint_suffix = args.model_name_or_path.split("-")[-1].split("/")[0]
            global_step = int(checkpoint_suffix)
            epochs_trained = global_step // (len(train_dataloader) // args.gradient_accumulation_steps)
            steps_trained_in_current_epoch = global_step % (len(train_dataloader) // args.gradient_accumulation_steps)

            logger.info("  Continuing training from checkpoint, will skip to saved global_step")
            logger.info("  Continuing training from epoch %d", epochs_trained)
            logger.info("  Continuing training from global step %d", global_step)
            logger.info("  Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
        except ValueError:
            logger.info("  Starting fine-tuning.")

    tr_loss, logging_loss = 0.0, 0.0

    model.zero_grad()
    train_iterator = trange(
        epochs_trained, int(args.num_train_epochs), desc="Epoch", disable=args.local_rank not in [-1, 0]
    )
    set_seed(args)
    for _ in train_iterator:
        epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])
        for step, batch in enumerate(epoch_iterator):

            if steps_trained_in_current_epoch > 0:
                steps_trained_in_current_epoch -= 1
                continue

            inputs, labels = (batch, batch)
            if inputs.shape[1] > 1024: continue
            inputs = inputs.to(args.device)
            labels = labels.to(args.device)
            model.train()
            outputs = model(inputs, labels=labels)
            loss = outputs[0]

            if args.n_gpu > 1:
                loss = loss.mean()
            if args.gradient_accumulation_steps > 1:
                loss = loss / args.gradient_accumulation_steps

            if args.fp16:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            tr_loss += loss.item()
            if (step + 1) % args.gradient_accumulation_steps == 0:
                if args.fp16:
                    torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm)
                else:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
                optimizer.step()
                scheduler.step()
                model.zero_grad()
                global_step += 1

                if args.local_rank in [-1, 0] and args.logging_steps > 0 and global_step % args.logging_steps == 0:
                    if (
                        args.local_rank == -1 and args.evaluate_during_training
                    ):
                        results = evaluate(args, model, tokenizer)
                        for key, value in results.items():
                            tb_writer.add_scalar("eval_{}".format(key), value, global_step)
                    tb_writer.add_scalar("lr", scheduler.get_lr()[0], global_step)
                    tb_writer.add_scalar("loss", (tr_loss - logging_loss) / args.logging_steps, global_step)
                    logging_loss = tr_loss

                if args.local_rank in [-1, 0] and args.save_steps > 0 and global_step % args.save_steps == 0:
                    checkpoint_prefix = "checkpoint"
                    output_dir = os.path.join(args.output_dir, "{}-{}".format(checkpoint_prefix, global_step))
                    os.makedirs(output_dir, exist_ok=True)
                    model_to_save = (
                        model.module if hasattr(model, "module") else model
                    )
                    model_to_save.save_pretrained(output_dir)
                    tokenizer.save_pretrained(output_dir)

                    torch.save(args, os.path.join(output_dir, "training_args.bin"))
                    logger.info("Saving model checkpoint to %s", output_dir)

                    _rotate_checkpoints(args, checkpoint_prefix)

                    torch.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                    torch.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
                    logger.info("Saving optimizer and scheduler states to %s", output_dir)

            if args.max_steps > 0 and global_step > args.max_steps:
                epoch_iterator.close()
                break
        if args.max_steps > 0 and global_step > args.max_steps:
            train_iterator.close()
            break

    if args.local_rank in [-1, 0]:
        tb_writer.close()

    return global_step, tr_loss / global_step

### Evaluation Function

def evaluate(args, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, df_trn, df_val, prefix="") -> Dict:
    eval_output_dir = args.output_dir

    eval_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=True)
    os.makedirs(eval_output_dir, exist_ok=True)
    args.eval_batch_size = args.per_gpu_eval_batch_size * max(1, args.n_gpu)

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    eval_sampler = SequentialSampler(eval_dataset)
    eval_dataloader = DataLoader(
        eval_dataset, sampler=eval_sampler, batch_size=args.eval_batch_size, collate_fn=collate, drop_last = True
    )

    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    logger.info("**** Running evaluation {} ****".format(prefix))
    logger.info("  Num examples = %d", len(eval_dataset))
    logger.info("  Batch size = %d", args.eval_batch_size)
    eval_loss = 0.0
    nb_eval_steps = 0
    model.eval()

    for batch in tqdm(eval_dataloader, desc="Evaluating"):
        inputs, labels = (batch, batch)
        inputs = inputs.to(args.device)
        labels = labels.to(args.device)

        with torch.no_grad():
            outputs = model(inputs, labels=labels)
            lm_loss = outputs[0]
            eval_loss += lm_loss.mean().item()
        nb_eval_steps += 1

    eval_loss = eval_loss / nb_eval_steps
    perplexity = torch.exp(torch.tensor(eval_loss))

    result = {"perplexity": perplexity}

    output_eval_file = os.path.join(eval_output_dir, prefix, "eval_results.txt")
    with open(output_eval_file, "w") as writer:
        logger.info("***** Eval results {} *****".format(prefix))
        for key in sorted(result.keys()):
            logger.info("  %s = %s", key, str(result[key]))
            writer.write("%s = %s\n" % (key, str(result[key])))

    return result

Main Function

def main(df_trn, df_val):
    args = Args()

    if args.should_continue:
        sorted_checkpoints = _sorted_checkpoints(args)
        if len(sorted_checkpoints) == 0:
            raise ValueError("Used --should_continue but no checkpoint was found in --output_dir.")
        else:
            args.model_name_or_path = sorted_checkpoints[-1]

    if (
        os.path.exists(args.output_dir)
        and os.listdir(args.output_dir)
        and args.do_train
        and not args.overwrite_output_dir
        and not args.should_continue
    ):
        raise ValueError(
            "Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.".format(
                args.output_dir
            )
        )

    device = torch.device("cuda")
    args.n_gpu = torch.cuda.device_count()
    args.device = device

    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,
    )
    logger.warning(
        "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
        args.local_rank,
        device,
        args.n_gpu,
        bool(args.local_rank != -1),
        args.fp16,
    )

    set_seed(args)

    config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
    model = AutoModelWithLMHead.from_pretrained(
        args.model_name_or_path,
        from_tf=False,
        config=config,
        cache_dir=args.cache_dir,
    )
    model.to(args.device)

    logger.info("Training/evaluation parameters %s", args)

    if args.do_train:
        train_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False)

        global_step, tr_loss = train(args, train_dataset, model, tokenizer)
        logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

    if args.do_train:
        os.makedirs(args.output_dir, exist_ok=True)

        logger.info("Saving model checkpoint to %s", args.output_dir)
        model_to_save = (
            model.module if hasattr(model, "module") else model
        )
        model_to_save.save_pretrained(args.output_dir)
        tokenizer.save_pretrained(args.output_dir)

        torch.save(args, os.path.join(args.output_dir, "training_args.bin"))

        model = AutoModelWithLMHead.from_pretrained(args.output_dir)
        tokenizer = AutoTokenizer.from_pretrained(args.output_dir)
        model.to(args.device)

    results = {}
    if args.do_eval and args.local_rank in [-1, 0]:
        checkpoints = [args.output_dir]
        if args.eval_all_checkpoints:
            checkpoints = list(
                os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + "/**/" + WEIGHTS_NAME, recursive=True))
            )
            logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN)  # Reduce logging
        logger.info("Evaluate the following checkpoints: %s", checkpoints)
        for checkpoint in checkpoints:
            global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
            prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""

            model = AutoModelWithLMHead.from_pretrained(checkpoint)
            model.to(args.device)
            result = evaluate(args, model, tokenizer, df_trn, df_val, prefix=prefix)
            result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
            results.update(result)

    return results

Now it's time to Run the Main Function

main(trn_df, val_df)
11/29/2021 14:29:48 - WARNING - __main__ -   Process rank: -1, device: cuda, n_gpu: 1, distributed training: False, 16-bits training: False



Downloading:   0%|          | 0.00/641 [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]



Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]


/usr/local/lib/python3.7/dist-packages/transformers/models/auto/modeling_auto.py:698: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
  FutureWarning,



Downloading:   0%|          | 0.00/335M [00:00<?, ?B/s]


11/29/2021 14:30:18 - INFO - __main__ -   Training/evaluation parameters <__main__.Args object at 0x7f2db3df8c50>
11/29/2021 14:30:18 - INFO - __main__ -   Creating features from dataset file at cached
11/29/2021 14:30:19 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
11/29/2021 14:30:19 - INFO - __main__ -   **** Running training ****
11/29/2021 14:30:19 - INFO - __main__ -     Num examples = 139
11/29/2021 14:30:19 - INFO - __main__ -     Num Epochs = 16
11/29/2021 14:30:19 - INFO - __main__ -     Instantaneous batch size per GPU = 4
11/29/2021 14:30:19 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 4
11/29/2021 14:30:19 - INFO - __main__ -     Gradient Accumulation steps = 1
11/29/2021 14:30:19 - INFO - __main__ -     Total optimization steps = 544



Epoch:   0%|          | 0/16 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]



Iteration:   0%|          | 0/34 [00:00<?, ?it/s]


11/29/2021 14:33:17 - INFO - __main__ -    global_step = 544, average loss = 1.5153655338046306
11/29/2021 14:33:17 - INFO - __main__ -   Saving model checkpoint to output-small
11/29/2021 14:33:22 - INFO - __main__ -   Evaluate the following checkpoints: ['output-small']
11/29/2021 14:33:25 - INFO - __main__ -   Creating features from dataset file at cached
11/29/2021 14:33:25 - INFO - __main__ -   Saving features into cached file cached/gpt2_cached_lm_512
11/29/2021 14:33:25 - INFO - __main__ -   **** Running evaluation  ****
11/29/2021 14:33:25 - INFO - __main__ -     Num examples = 16
11/29/2021 14:33:25 - INFO - __main__ -     Batch size = 4



Evaluating:   0%|          | 0/4 [00:00<?, ?it/s]


11/29/2021 14:33:25 - INFO - __main__ -   ***** Eval results  *****
11/29/2021 14:33:25 - INFO - __main__ -     perplexity = tensor(4.8437)





{'perplexity_': tensor(4.8437)}

Now let's Load the Trained Model

tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-small')
model = AutoModelWithLMHead.from_pretrained('output-small')
/usr/local/lib/python3.7/dist-packages/transformers/models/auto/modeling_auto.py:698: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.
  FutureWarning,
for step in range(4):
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    chat_history_ids = model.generate(
        bot_input_ids, max_length=200,
        pad_token_id=tokenizer.eos_token_id,  
        no_repeat_ngram_size=3,       
        do_sample=True, 
        top_k=100, 
        top_p=0.7,
        temperature=0.8
    )

    print("Bot: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
>> User: Hi
Bot: Hello.
>> User: How are you?
Bot: Goodbye.
>> User: What?
Bot: You?
>> User: How are you?
Bot: I'm fine.

Now Push Model to Hugging Face

os.chdir('/content/')
!pip install huggingface_hub
Requirement already satisfied: huggingface_hub in /usr/local/lib/python3.7/dist-packages (0.1.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (4.62.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (3.4.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (4.8.2)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (6.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (2.23.0)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (21.3)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.7/dist-packages (from huggingface_hub) (3.10.0.2)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.9->huggingface_hub) (3.0.6)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->huggingface_hub) (3.6.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->huggingface_hub) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->huggingface_hub) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->huggingface_hub) (2021.10.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->huggingface_hub) (3.0.4)
!huggingface-cli login
        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|


Username: "YOUR_USER_NAME_HERE"
Password: "YOUR_PASSWORD_HERE"
Login successful
Your token has been saved to /root/.huggingface/token
Authenticated through git-crendential store but this isn't the helper defined on your machine.
You will have to re-authenticate when pushing to the Hugging Face Hub. Run the following command in your terminal to set it as the default

git config --global credential.helper store
!huggingface-cli repo create DialoGPT-small-Bot
git version 2.17.1
Looks like you do not have git-lfs installed, please install. You can install it from https://git-lfs.github.com/. Then run `git lfs install` (you only have to do this once)

You are about to create pandyaved98/DialoGPT-small-Bot
Proceed? [Y/n] Y

Your repo now lives at:
https://huggingface.co/pandyaved98/DialoGPT-small-Bot

You can clone it locally with the command below, and commit/push as usual.

git clone https://huggingface.co/pandyaved98/DialoGPT-small-Bot
!sudo apt-get install git-lfs
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following NEW packages will be installed:
  git-lfs
0 upgraded, 1 newly installed, 0 to remove, and 37 not upgraded.
Need to get 2,129 kB of archives.
After this operation, 7,662 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 git-lfs amd64 2.3.4-1 [2,129 kB]
Fetched 2,129 kB in 1s (2,044 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package git-lfs.
(Reading database ... 155222 files and directories currently installed.)
Preparing to unpack .../git-lfs_2.3.4-1_amd64.deb ...
Unpacking git-lfs (2.3.4-1) ...
Setting up git-lfs (2.3.4-1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
!cat /root/.huggingface/token
"YOUR_TOKEN_HERE"
!git clone https://"YOUR_USER_NAME_HERE":"YOUR_TOKEN_HERE"@huggingface.co/pandyaved98/DialoGPT-small-AlchemyBot
Cloning into 'DialoGPT-small-AlchemyBot'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
!ls "/content/drive/My Drive/output-small"
config.json      pytorch_model.bin       tokenizer.json
eval_results.txt  special_tokens_map.json  training_args.bin
merges.txt      tokenizer_config.json    vocab.json
!mv /content/drive/My\ Drive/output-small/* DialoGPT-small-AlchemyBot
os.chdir('DialoGPT-small-AlchemyBot')
!git lfs install
Updated git hooks.
Git LFS initialized.
!ls
config.json      pytorch_model.bin       tokenizer.json
eval_results.txt  special_tokens_map.json  training_args.bin
merges.txt      tokenizer_config.json    vocab.json
!pwd
/content/DialoGPT-small-AlchemyBot
!git status
On branch main
Your branch is up to date with 'origin/main'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

config.json
meval_results.txt
merges.txt
pytorch_model.bin
special_tokens_map.json
tokenizer.json
tokenizer_config.json
training_args.bin
vocab.json

nothing added to commit but untracked files present (use "git add" to track)
!git add .
!git config --global user.email "YOUR_EMAIL_HERE"
!git config --global user.name "YOUR_USER_NAME_HERE"
!git commit -m "Push Modal on HuggingFace"
[main af3998c] Push Modal on HuggingFace
 9 files changed, 50050 insertions(+)
 create mode 100644 config.json
 create mode 100644 eval_results.txt
 create mode 100644 merges.txt
 create mode 100644 pytorch_model.bin
 create mode 100644 special_tokens_map.json
 create mode 100644 tokenizer.json
 create mode 100644 tokenizer_config.json
 create mode 100644 training_args.bin
 create mode 100644 vocab.json
!git push
Git LFS: (2 of 2 files) 486.76 MB / 486.76 MB
Counting objects: 11, done.
Delta compression uses up to 2 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (11/11), 753.80 KiB | 4.26 MiB/s, done.
Total 11 (delta 2), reused 0 (delta 0)
To https://huggingface.co/pandyaved98/DialoGPT-small-AlchemyBot
   2712e3f..af3998c  main -> main

All Done!

Did you find this article valuable?

Support Vedant Pandya by becoming a sponsor. Any amount is appreciated!