NLP Techniques: Text Generation, Semantic Search & More
Classified in Electronics
Written at on English with a size of 3.92 KB.
Preprocessing the Dataset
a. Normalize the Text
Python
import re
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')
# Preprocessing Function
def preprocess(text):
text = re.sub(r'[^a-z\s]', '', text.lower())
return word_tokenize(text)
# Apply Preprocessing
df['Processed'] = df['Sentence'].apply(preprocess)
print(df)
GPT-2 Text Generation
Python
from transformers import pipeline
# Load GPT-2 Model for Text Generation
generator = pipeline('text-generation', model='gpt2')
# Generate Text for a Given Prompt
prompt = "Once upon a time"
result = generator(prompt, max_length=50,
num_return_sequences=1)
print(result[0]['generated_text'])
GPT-2 for AI Prompts
a. Prompt 1 - Future of AI
Python
prompt = "What is the future of AI?"
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
b. Prompt 2 - Importance of Dialogue in AI
Python
prompt = "Explain the importance of dialogue in AI systems."
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
Custom Transformer Model for Dialogue System
Python
import torch.nn as nn
from transformers import BertModel
# Define Custom Transformer Model
class TransformerModel(nn.Module):
def __init__(self):
super().__init__()
self.encoder = BertModel.from_pretrained('bert-base-uncased')
self.fc = nn.Linear(768, 3) # Sentiment classification (3 classes)
def forward(self, x):
out = self.encoder(**x).pooler_output
return self.fc(out)
# Instantiate the Model
model = TransformerModel()
Text Categorization (Positive, Negative, Neutral)
Python
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
# Train Naive Bayes Model
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['Sentence'])
model = MultinomialNB()
model.fit(X, df['Sentiment'])
# Test Prediction
print(model.predict(vectorizer.transform(["I love it!"])))
Semantic Search with Sentence-BERT
Python
from sentence_transformers import SentenceTransformer, util
# Load Pre-trained Sentence-BERT Model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Encode Corpus and Query
corpus_embeddings = model.encode(df['Sentence'].tolist())
query = "I enjoyed the service."
query_embedding = model.encode(query)
# Find Similar Sentences
distances = util.pytorch_cos_sim(query_embedding, corpus_embeddings)
print(df.iloc[distances.argmax().item()])
K-Means Clustering with TF-IDF
Python
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeansvectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['Sentence'])model = KMeans(n_clusters=3)
clusters = model.fit_predict(X)df['Cluster'] = clusters
print(df)
GPT-2 Creative Text Generation
Python
# Generate Creative Text Based on Prompt
prompt = "Once there was a dragon"
result = generator(prompt, max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
Impact of Prompts on GPT Outputs
Different Prompts for Text Generation
Python
from transformers import pipeline
# Load pre-trained model and tokenizer
generator = pipeline("text-generation", model="gpt2")
# Define prompts
prompts = [
"Tell a story about a robot.",
"Describe a futuristic city.",
"Write a poem about love."
]
# Generate Outputs for Each Prompt
for prompt in prompts:
result = generator(prompt, max_length=50, num_return_sequences=1,truncation = True)
print(f"Prompt: {prompt}\nOutput: {result[0]['generated_text']}\n")