AI Research Scientist Lesson Plan
Teaching English for AI Professionals
Topic: Supervised vs. Unsupervised Learning
Core AI & Machine Learning Topics English Level: B2-C1
Task 1: Vocabulary Building
Group 1: Jargon
Labeled Data (n.) – Data where each input is paired with the correct output.
Example: A dataset of emails marked as "spam" or "not spam."
Sentence: "Supervised learning requires labeled data to train the model effectively."
Clustering (n.) – Grouping similar data points without predefined categories.
Example: Customer segmentation based on purchasing behavior.
Sentence: "Unsupervised learning uses clustering to find hidden patterns in data."
Regression (n.) – Predicting continuous numerical values (e.g., temperature, prices).
Example: Forecasting house prices based on features like location and size.
Sentence: "Linear regression is a common supervised learning technique."
Group 2: Phrasal Verbs & Collocations
Train on (phr. v.) – To teach a model using a specific dataset.
Example: "The AI was trained on thousands of medical scans."
Collocation: "Train a model on labeled data."
Break down (phr. v.) – To analyze or separate into smaller components.
Example: "We need to break down the dataset before preprocessing."
Collocation: "Break down complex algorithms."
Sort through (phr. v.) – To organize or examine systematically.
Example: "Unsupervised learning helps sort through unorganized data."
Collocation: "Sort through raw input."
Group 3: Antonyms
Structured (data) ↔ Unstructured (data)
Example: Spreadsheets (structured) vs. social media posts (unstructured).
Predictive (model) ↔ Descriptive (model)
Example: Forecasting sales (predictive) vs. summarizing trends (descriptive).
Explicit (guidance) ↔ Implicit (patterns)
Example: Supervised learning uses explicit labels, while unsupervised finds implicit connections.
Group 4: Idiomatic & Figurative Language
"Connect the dots" – To find hidden relationships.
Example: "Unsupervised learning helps connect the dots in large datasets."
"Learn the ropes" – To understand the basics.
Example: "New researchers must learn the ropes of both supervised and unsupervised methods."
"Throw spaghetti at the wall" – To test many ideas randomly.
Example: "Without labeled data, early models just threw spaghetti at the wall to see what stuck."
Group 5: Slang & Informal Terms
"Garbage in, garbage out" (GIGO) – Poor input leads to poor output.
Example: "If your training data is messy, remember—GIGO!"
"Noob" (n.) – A beginner (derogatory or playful).
Example: "Only a noob would mix up supervised and unsupervised learning!"
"Glitchy" (adj.) – Unpredictable due to errors.
Example: "The unsupervised model's results were glitchy without clean data."
Task 2: Dialogue Scenarios
Participants: Emma (Data Scientist), Liam (ML Engineer), Noah (Product Manager)
Emma: "We need to decide whether to use supervised or unsupervised learning for the customer segmentation project. Liam, what's your take?"
Liam: "Well, since we have labeled data from past purchases, training on that would give us a predictive model. But if we want to sort through raw behavior patterns, clustering might reveal hidden trends."
Noah: "I'm worried about garbage in, garbage out—our sales data is messy. Emma, can we break down the dataset first?"
Emma: "Good point. Let's connect the dots with unsupervised learning first, then refine with supervised techniques. That way, we avoid glitchy results."
Liam: "Agreed. And Noah, no noob mistakes—we'll clean the data before training the model!"
Participants: Aria (Candidate), Mr. Chen (Hiring Manager)
Mr. Chen: "How would you explain regression to a non-technical stakeholder?"
Aria: "Imagine predicting house prices (continuous values) based on features like location. That's supervised learning—like teaching a child with answers. But if we had no prices, we'd sort through similarities (clustering), a descriptive approach."
Mr. Chen: "What if data lacks labels?"
Aria: "That's where unsupervised learning shines. It's like throwing spaghetti at the wall—testing patterns blindly. But with clean data, you connect the dots faster."
Mr. Chen: "Ever dealt with unstructured data?"
Aria: "Yes! I once broke down social media posts—garbage in, garbage out at first, but after preprocessing, the model learned the ropes."
From: Dr. Patel (AI Researcher)
To: Prof. Müller (Computer Science Dept.)
Subject: Proposal for Joint Research on Unsupervised Learning
Dear Prof. Müller,
Our team is exploring clustering techniques to analyze unstructured medical records. Since labels are scarce, supervised methods aren't ideal. Instead, we'll train on raw data to connect the dots in patient histories.
Could your lab help break down the dataset? We'd avoid glitchy outcomes by combining your expertise in descriptive models with our predictive tools. Let's discuss—no noob mistakes on this project!
Best regards,
Dr. Patel
Host: "Today's topic: 'Should AI learn the ropes without human guidance?'"
Guest 1 (Pro-Unsupervised): "Letting algorithms sort through data alone finds implicit patterns—like how Netflix recommends shows!"
Guest 2 (Pro-Supervised): "But explicit guidance prevents bias. Garbage in, garbage out—remember Tay, Microsoft's racist chatbot?"
Guest 1: "Fair, but throwing spaghetti at the wall sometimes works! Unsupervised learning connects the dots in ways humans miss."
Host: "So is labeled data the gold standard, or are we stifling innovation?"
User: "My predictive model keeps failing!"
Support: "Did you train on labeled data, or is it glitchy from raw input?"
User: "I tried clustering first... maybe a noob move?"
Support: "No worries! Break down your dataset, then switch to supervised learning. Remember, garbage in, garbage out—clean it first!"
User: "Got it. I'll learn the ropes before throwing spaghetti at the wall again!"
Task 3: What Would You Do? (Problem Solving)
Problem: You're hired by an e-commerce company to improve product recommendations. They give you two datasets:
- Dataset A: Clean, labeled data (customer ratings: 1-5 stars).
- Dataset B: Raw, unstructured data (browsing history, cart abandonments).
Challenge: The CEO wants quick results but refuses to pay for extra data labeling. What would you do?
- Use supervised learning (Dataset A) and risk limited insights?
- Try unsupervised clustering (Dataset B) and face "garbage in, garbage out"?
- Propose a hybrid approach? Explain your choice.
Problem: Your predictive hiring tool (trained on labeled data from past job applications) unfairly rejects candidates from minority groups. The HR team panics and says, "Just remove gender/race fields!"
Challenge:
- Would you retrain the model with unsupervised learning to find implicit patterns without labels?
- Or argue for better structured data collection?
- How would you fix this without causing glitchy outcomes?
Problem: Your health-tech startup built a supervised model to diagnose diseases from X-rays—but hospitals won't share labeled data due to privacy laws. Investors demand a demo in 3 months.
Challenge:
- Would you switch to unsupervised learning and "throw spaghetti at the wall" with unlabeled scans?
- Or simulate synthetic data (risking inaccurate regression)?
- How would you "connect the dots" ethically?
Task 4: Personal Experiences & Discussions
Personal Experience Questions
- "Have you ever worked with labeled data? Describe a project where it was essential."
- "Share a time when you had to sort through messy, unstructured data. How did you handle it?"
- "Have you encountered a glitchy AI model? What went wrong, and how did you fix it?"
- "Describe an instance where clustering revealed unexpected patterns in your work or studies."
- "Have you ever had to 'throw spaghetti at the wall' with unsupervised learning? Did it work?"
Discussion Prompts
- "Should companies prioritize supervised learning (accuracy) over unsupervised learning (flexibility)? Debate pros and cons."
- "How can we avoid 'garbage in, garbage out' when working with raw datasets?"
- "In what real-world scenarios is descriptive modeling more useful than predictive modeling?"
- "Can unsupervised learning ever replace human intuition? Discuss examples."
- "What ethical risks arise when AI 'connects the dots' in sensitive data (e.g., healthcare, finance)?"
Opinion Questions (Agree/Disagree)
- "Labeled data is a luxury—most real-world problems require unsupervised methods." Agree/Disagree? Justify your view.
- "'Noobs' should avoid unsupervised learning until they master supervised techniques." Agree/Disagree? Why?
- "Clustering is just a fancy term for 'guessing'—it's not reliable science." Agree/Disagree? Support your stance.
- "AI researchers spend too much time 'breaking down' data instead of solving problems." Agree/Disagree? Explain.
- "'Learn the ropes' with supervised learning, then experiment freely." Agree/Disagree? Share your reasoning.