Class 1 Notes

STAT S-115: Overview and Introduction

Data Science: An Artificial Ecosystem

HDSR Focus

Built entirely around Harvard Data Science Review articles

Participation-Based

Assessment focuses on engagement and critical thinking

AI Integration

First-of-its-kind course incorporating direct AI interaction

Course Structure and Themes

Theme 1

How we can use Generative AI (GAI) to solve problems

Weeks 1-4

Theme 2

How we can solve problems posed by GAI

Weeks 5-7

Interactive Learning

Students will engage directly with article authors through guest presentations and participate in AI-assisted learning activities throughout both themes.

Defining Data Science: What It Is NOT

❌ Not Just Machine Learning

While ML is a component, data science requires broader interdisciplinary thinking

❌ Not Only About Predictions

Inference (understanding "why") is equally important as prediction ("what")

❌ Not Only Data Analysis

Hardest parts are often data collection, cleaning, and conceptualization

❌ Not Confined to STEM

Requires skills from humanities, social sciences, and philosophy

❌ Not a Single Discipline

It's a collection of disciplines working together, like "science" itself

💡 The "Artificial Ecosystem" Metaphor

Data science creates an interconnected environment where multiple disciplines collaborate to solve complex problems

Three Perspectives on Data Workflows

Computer Science View (Jeannette Wing)

Linear process workflow:

Data GenerationCollectionProcessingStorageManagementAnalysisVisualizationInterpretation

Philosophical View (Sabina Leonelli)

Circular process emphasizing human agency at every step:

DataModelsKnowledgeActionsObjectsBack to Data
↻ Continuous Cycle

Key insight: Human judgment and agency drive the transformation at each stage, creating a dynamic cycle where data continuously evolves through human interpretation and action.

Information Science View (Christine Borgman)

Complex preservation and curation system:

  • Focus on data "afterlives" and reusability
  • Long-term preservation challenges
  • Metadata and documentation importance
  • Data sharing and reproducibility
Key Insights and Principles

Selection Bias Problem

Understanding why certain data exists and how that affects conclusions

Data Quality Over Quantity

Addressing missing data and selection bias often more critical than sophisticated algorithms

No Raw Data Principle

All data involves human judgment in collection and conceptualization

Andrew Lo's Drug Approval Research Example

Demonstrated how addressing missing data and selection bias through statistical imputation was more critical than sophisticated machine learning algorithms for predicting drug approval success.

Understanding Artificial Intelligence

Michael Jordan's Perspective

Intelligence Augmentation (IA)

AI should augment rather than replace human intelligence

Intelligent Infrastructure

AI as supportive, interesting, and safe environmental enhancement

Human-Centric Engineering

Focus on creating useful tools rather than replicating human cognition

AI as Technology vs. Science

AI Technology (Current Reality)

Engineering-focused on creating useful tools and applications

AI Science (Future Aspiration)

Research-focused on understanding intelligence itself

Student Definitions Revealed

Class discussions showed diverse AI definitions ranging from "computer systems simulating human behavior" to "superhuman intelligence," highlighting the field's complexity.

Five Principles for AI in Society

Beneficence

AI should do good and benefit humanity

Non-maleficence

AI should not cause harm

Autonomy

Respect for human decision-making

Justice

Fair distribution of benefits and risks

Explainability

AI systems must be interpretable and transparent

Why Explainability Matters

Unlike traditional technologies (medicine, engineering), AI faces unique demands for explanation, possibly due to lower societal trust in computer scientists and statisticians compared to medical professionals.

Learning Philosophy and Expectations

Language Learning Analogy

Learning data science is like acquiring new "languages" - each discipline has its own grammar, vocabulary, and way of thinking.

Meta-Learning Objectives

  • Develop data acumen and savvy judgment about data quality
  • Build interdisciplinary communication skills
  • Create critical thinking frameworks for AI evaluation
  • Establish ethical reasoning for responsible AI development
  • Acquire panoramic data science literacy

Active Participation

Expected Behaviors

  • Challenge ideas and engage in debate
  • Think critically rather than passively absorb
  • Intellectual equality in discussions
  • Acknowledge different experience levels

Panoramic Thinking Development

Ability to navigate technical, philosophical, ethical, and societal dimensions simultaneously.

What's Coming Next

Session 2: Technical Foundations

  • Probability fundamentals
  • Inference and statistical modeling basics
  • Foundation for upcoming topics

Upcoming Sessions

  • Machine learning architecture
  • Deep learning fundamentals
  • Guest author presentations
  • AI-assisted learning activities
Session Summary

This inaugural session established the course's ambitious scope: moving beyond technical data science skills to develop sophisticated, multidisciplinary thinking capabilities essential for navigating an AI-integrated future. The course emphasizes the intersection of technology, philosophy, ethics, and society in understanding both the potential and challenges of artificial intelligence.