Class 3 Notes

AI Applications, Statistics, and Ethical Frameworks

Bridging Traditional Statistics with Contemporary AI Challenges

Assignment Guidelines

Topic Proposals: Due by midnight on class day

Key Principle: "Less can be more" - avoid comprehensive coverage in favor of deep analysis

Evaluation Criteria: Evidence of critical thinking over summarization

Discussion Work: Pre-work and post-work discussions on Slack, including question preparation for guest speakers

Key Insights from Class 3

Big Data Paradox

The more data, the surer we fool ourselves - quantity cannot compensate for poor quality

Statistical vs Practical Significance

Large samples can show statistical significance for practically meaningless differences

AI Ethics in Education

Balance AI assistance with critical thinking development and academic integrity

Student AI Applications and Use Cases

Financial Analysis

Market monitoring and political interpretation with source verification

Academic Research

RAG-powered databases for research references and quotes

Social Work

AI solutions for underserved communities with cultural sensitivity

Accessibility

Color identification for color blindness and real-time support

Content Creation

Social media assistance with brand consistency and authenticity

Educational Support

AI tutoring with specialized tools like Khan Academy's Khanmigo

Crisis Support Systems

Digital safe houses for vulnerable populations with privacy features for dangerous situations and culturally sensitive AI responses.

AI Ethics and Academic Integrity

Academic Integrity Concerns

Academic Integrity

False positives in AI detection affecting innocent students

Attribution Issues

Difficulty citing AI-generated content without clear sources

ESL Impact

AI detection tools disproportionately flagging non-native speakers

Copyright Concerns

Training data includes copyrighted material without attribution

Educational Framework for AI Use

Design assessments acknowledging AI as part of normal workflow
Provide clear guidelines for ethical AI use
Focus on teaching effective AI collaboration skills
Create learning experiences requiring critical thinking beyond AI capabilities
Use system prompting for brainstorming rather than direct answers

Responsible AI Use Strategies

  • System prompting for brainstorming buddy approach
  • Source verification and fact-checking
  • Iterative questioning for quality improvement
  • Human oversight in sensitive decisions
Sampling Methods and Data Quality

The Big Data Paradox

"The more data, the surer we fool ourselves" - quantity cannot compensate for poor quality

Historical Example: 1936 Literary Digest poll incorrectly predicted Alf Landon's victory due to biased sampling (telephone/car owners skewed wealthy)

Simple Random Sampling

Every individual has equal probability of selection

Example: Names drawn from a hat

Systematic Sampling

Regular interval selection maintaining randomness

Example: Every 10th customer leaving a store

Cluster Sampling

Randomly select groups, sample from selected clusters only

Example: 20 of 50 neighborhoods for door-to-door surveys

Stratified Sampling

Sample from ALL subgroups to ensure representation

Example: Sampling from each state for geographic representation

Advanced Concepts

  • Oversampling: Higher proportions of underrepresented groups
  • Weighting: Adjusting analysis to reflect true population proportions
  • NHANES Example: Multi-stage sampling (geography → counties → blocks → households → individuals)

Real-World Failures

  • 2016 Election: College graduates overrepresented, Trump supporters less likely to respond
  • COVID Vaccines: Delphi Facebook (250K) least accurate, Axios Ipsos (1K) most accurate
Probability and Conditional Probability

COVID Testing Example

Scenario:

  • 3% COVID prevalence
  • 90% test sensitivity
  • 95% test specificity

Result:

Only 35.8% chance of actually having COVID with positive test

Statistical Inference and P-Values

Understanding P-Values

Definition

Probability of seeing results as extreme or more extreme, assuming no true effect

Common Misconception

P-value is NOT the probability the hypothesis is true

0.05 Threshold

Arbitrary benchmark, not a magical cutoff point

Multiple Testing Problem

XKCD Jelly Bean Example

  • Testing 20 different jelly bean colors for acne correlation
  • Expected: 1 false positive at p < 0.05 level by chance alone
  • Publication Bias: Only "significant" green result gets published

Online Dating Study Example

  • 19,000+ participants, very small p-values
  • Divorce rates: 5.96% (online) vs. 7.67% (offline)
  • Happiness: 5.64 vs. 5.48 on 7-point scale
  • Conclusion: Statistical ≠ practical significance

Modern Solutions

Pre-registration

Submit analysis plans before data collection

Open Data/Code

Make materials publicly available

Replication Studies

Repeat important findings

Continuous Interpretation

Treat p-values as continuum

Data Context and Social Implications

Data is Never "Raw"

Always collected by specific people for specific purposes

Shaped by social context and available resources

Complex phenomena reduced to categorical variables for computer processing

Data Invisibility Problems

Missing Data Issues

  • LGBTQ+ vaccine data often missing
  • Asian American data aggregated, hiding disparities
  • Marginalized groups underrepresented

Solutions

  • Disaggregated, granular data collection
  • Acting in participants' best interests
  • Privacy protection and responsible management

Real-World Example: 23andMe Bankruptcy Concerns

Concerns about data ownership transfer highlight the importance of ethical data collection and responsible long-term data management practices.

Contemporary AI and Data Science Issues

AI Integration in Education

Challenges

  • Students potentially losing critical thinking skills (MIT study: 50% creativity decrease)
  • Over-reliance on AI for foundational learning
  • Balance between AI assistance and skill development

Opportunities

  • Personalized tutoring available 24/7
  • Support for different learning levels and backgrounds
  • Enhanced accessibility for diverse learners

Best Practices

  • Use AI as "teammate" rather than tool
  • Maintain appropriate "desirable difficulty"
  • Combine AI support with human interaction
  • Focus on foundation building + nuanced discussion

Cultural and Trust Considerations

Cultural Bias

Most AI models trained primarily on Western data, affecting performance across cultures

Trust and Explainability

Humans often can't explain their own decisions - AI systems similarly lack transparency

Recruitment AI Bias

AI systems often amplify existing human biases in hiring decisions

Need for Balance

AI insights combined with human oversight for optimal decision-making

AI-Enhanced Learning Platform

Platform Features

Personalized Podcasts

AI-generated introductions tailored to individual students

AI Tutor (Pascy)

Interactive chat-based learning support

Onboarding Process

Personalization based on student background and interests

Accessibility and Support

Privacy Features

Ability to switch screens if being monitored

Multilingual Support

Accommodating diverse linguistic backgrounds

Integration Features

Pre-work preparation, question submission, and Slack discussions

Key Takeaways and Future Directions

Critical Skills for AI Era

Source Verification: Always request and check AI-provided sources
Iterative Prompting: Refine questions to improve AI responses
Critical Evaluation: Treat AI output as starting point, not final answer
Ethical Awareness: Understand bias, privacy, and fairness implications
Statistical Literacy: Understand sampling, probability, and inference limitations

Research and Development Priorities

Culturally sensitive AI development
Transparency and explainability in AI systems
Ethical data collection and use practices
Accessibility-focused AI applications
Bias detection and mitigation strategies
Session Summary

This comprehensive session successfully bridged traditional statistical concepts with contemporary AI challenges, providing students with both foundational knowledge and practical frameworks for navigating an AI-integrated future. The class explored real-world applications of AI across diverse fields while emphasizing the critical importance of data quality, ethical considerations, and responsible AI development practices.