CMSC 178DA | Week 05

The Art of Insight

Strategic Principles of Data Visualization

Department of Computer Science

University of the Philippines Cebu

Visual Design & Perception

Why We Visualize

"The greatest value of a picture is when it forces us to notice what we never expected to see."

-- John Tukey

Agenda

Lecture Objectives

Perception

Pre-attentive attributes, Gestalt principles, color theory, and accessibility.

Design

Tufte's data-ink ratio, visual hierarchy, and minimalist frameworks.

Selection

Chart taxonomy, narrative structure, and strategic storytelling.

Part I

The Science of Seeing

Our brains evolved to detect patterns instantly -- but we must design for how humans actually see, not how we think they see.

This section covers pre-attentive processing, Gestalt principles, color theory, and accessibility.

I. Perception

Pre-attentive Processing

Pre-attentive attributes demo: color pop-out and size pop-out
I. Perception

Gestalt Principles

Six Gestalt principles: proximity, similarity, enclosure, continuity, closure, connection
I. Perception

The Three Color Palettes

Three color palette types: sequential, diverging, categorical
I. Perception

Text is UI

The eye hates "ping-ponging" between a legend and the data.

Direct Labeling

Place labels directly next to lines or bars. Eliminate the legend entirely when possible.

Annotations

Don't just show "Sales Dropped." Add context: "Sales dropped due to supply chain outage."

Product A Product B Product A Product B
I. Perception

Inclusive Design

Colorblind simulation: normal vision vs deuteranopia comparison
II. Frameworks

Tufte's Data-Ink Ratio

Data-ink ratio comparison: cluttered chart junk vs clean Tufte-style chart
II. Frameworks

The Active Title

Stop labeling charts with "What" (e.g., "Revenue by Quarter"). Label them with "So What?"

The Rule

If the title was the only thing someone read, would they get the point?

Q4 Revenue Breakdown (in Millions)

0 50 Retail Services Tech Other
VS

Revenue exceeded targets in Q4, driven by Tech sector

Q4 Target Retail Services 80M Tech Other
II. Frameworks

Visual Hierarchy

Guide the viewer's eye in a deliberate order using size, position, and color.

TL;DR

Reading Order

  1. Title -- What is this about?
  2. Main insight -- The key message (largest, boldest)
  3. Supporting detail -- Context and nuance
  4. Source / Notes -- Credibility (smallest)
1

Course Completion Rate

2
84%
3
+12% vs last semester
4

Source: Learning Management System database (Fall 2023). Excludes audited enrollments.

II. Frameworks

Focus Through Gray

Customer Acquisition Cost Spike in Q4

$0 $100 $200 Q1 Q2 Q3 $240 Q4 Q1 Q2
Knowledge Check

Tufte's Data-Ink Ratio

Which of the following elements is considered "Chart Junk" and should be erased to maximize the data-ink ratio?

A) Data Line
Y X B) Axis Labels
C) Background Gradient
+20% D) Trend Annotation
Click & hold to reveal answer

Correct Answer: C

Background gradients don't represent data. They are purely decorative "ink" that distracts from the actual information.

Part III

Choosing the Right Chart

Chart selection is strategic, not aesthetic. The wrong chart can mislead even with perfectly correct data.

From bar charts to treemaps -- every form serves a distinct analytical purpose.

BAD 3D Exploding Pie (Hard to read)
GOOD Clean Bar Chart (Easy to compare)
III. Selection

Chart Taxonomy & Questions

Goal Chart Types Primary Question Answered
Comparison Bar, Grouped Bar, Lollipop "Which [category] is the highest/lowest?"
Distribution Histogram, Box, Violin "How is the data shaped or spread?"
Relationship Scatter, Bubble, Heatmap "Does A correlate with B?"
Composition Stacked Bar, Treemap, Donut "What makes up the total?"
Trend Line, Area, Sparkline "How does this change over time?"
III. Selection

Chart Type Gallery

Six chart types: bar, histogram, scatter, box, line, heatmap
III. Selection

Precise Comparisons

Horizontal bar chart of Philippine GDP by region
III. Selection

Distributions & Spread

Distribution charts: histogram, box plot, violin, KDE
III. Selection

Scatter Plots & Correlation

Scatter plot with trend line and correlation heatmap
III. Selection

Composition & Trends

Stacked bar chart and line chart showing composition and trends
III. Selection

The Pie Chart Problem

The human brain is terrible at comparing angles and areas.

Avoid Pie Charts If:

  • You have more than 3 slices
  • Values are similar (e.g., 24% vs 26%)
  • You want to compare precise values

Better Alternative

Horizontal bar charts or donut charts with direct labels.

III. Selection

Small Multiples (Faceting)

FacetGrid: Philippine family income distribution by island group
III. Selection

Heatmaps & Tables

Sometimes you need the raw numbers and the visual pattern.

Visual Table

Apply a sequential color scale to cell backgrounds to highlight high/low values instantly.

12
35
62
84
18
42
71
94
22
48
76
105
IV. Integrity

Common Mistakes to Avoid

Truncated Y-Axis

Exaggerates small differences, misleads viewers about magnitude

Dual Y-Axes

Creates false correlations between unrelated data series

3D Charts

Distorts perception, occludes data points, adds zero information

Pie Chart Overload

Too many slices making it impossible to compare angles and areas

Rainbow Scales

Unintuitive, lacking natural order, and often not colorblind-safe

Overcomplicated

Cramming too much information without a clear visual hierarchy

IV. Integrity

The Truncated Axis Trick

Side-by-side: misleading truncated y-axis vs honest full axis
IV. Integrity

The Dual Axis Trap

Misleading (Dual Axis) Ice Cream Sales Shark Attacks 0 100k 0 10

Forces lines to cross or align, implying false causation.

Honest (Two Charts)
Sales Attacks

Allows viewers to see trends independently.

IV. Integrity

The 3D Distortion Effect

Distorted (3D)
A B

Slice A physically takes up 70% of the ink, but represents only 30% of the data.

Accurate (2D Flat)
A B

Honest representation where visual area cleanly maps to data area.

IV. Integrity

Pie Chart Overload

Cognitive Overload
Item A
Item B
Item C
Item D
Item E
Item F

Humans cannot accurately compare angles of similar sizes, forcing eyes to dart back and forth to the legend.

Ranked Clarity
Item A
Item H
Item C
Item B
Item F
Item D
Others

Direct translation of length allows instant, effortless ranking. Text labels are placed right beside the data.

IV. Integrity

Simpson's Paradox

A trend appears in groups of data but disappears or reverses when groups are combined.

The Lesson

Aggregated data can lie. Always check for confounding variables that might skew the result.

Aggregated: Trend is POSITIVE
Disaggregated: Trends are NEGATIVE
Knowledge Check

Choosing the Right Chart

You need to show the correlation between study hours and final grades for 500 students. Which chart type should you use?

A) Bar Chart
B) Pie Chart
C) Scatter Plot
D) Line Chart
Click & hold to reveal answer

Correct Answer: C

A scatter plot is the standard choice for showing correlation and relationships between two continuous numeric variables.

V. Advanced

Embracing Uncertainty

Data is rarely precise. Don't hide the variance.

Error Bars

Use 95% Confidence Intervals to show the range of the true mean, not just the sample mean.

Warning: Standard Deviation (SD) and Standard Error (SE) are not the same. Know what you are plotting.

High Variance Low Variance
V. Advanced

Visualizing Flow

Input Output A Output B
V. Advanced

Hierarchical Data

Pie charts fail at showing nested data with many categories. Treemaps excel at it, like showing the breakdown of a country's exports.

TL;DR

Why Treemaps?

They use 100% of the space to show part-to-whole relationships, scaling easily to dozens of categories (like the Philippine export economy).

Philippine Exports (2023)
Electrical & Electronics 56%
Machinery 9%
Metals/Ores 6%
Agri & Food 11%
Other Manufactures 18%
V. Advanced

Ranking Many Categories

V. Advanced

Spatial Data

Maps are powerful but dangerous. Large geographic areas dominate visually despite low data importance.

The Choropleth

Encodes data values into geographic regions using color intensity. Works best for density-based metrics (per capita, rates).

Geographic Context Interactive maps display true area, exposing how massive regions can dominate visual weight regardless of underlying data density.
Part VI

Driving Action with Narrative

A chart is a character; context is the plot. Without a narrative arc, data points remain static observations instead of catalysts for action.

Structure your presentations like a story -- context, conflict, resolution.

DATA DUMP (Boring)
THE STORY (Actionable) Sales Spiked!
VI. Narrative

Data Storytelling

Charts are not the destination; they are the vehicle for the insight.

The 3-Act Structure

  1. Context: What is the baseline?
  2. Conflict: What changed? (The anomaly)
  3. Resolution: What do we do? (The recommendation)
1. Context The Baseline 2. Conflict The Anomaly 3. Resolution The Action
VI. Narrative

Annotation Best Practices

0 4M 8M 2018 2019 2020 2021 2022 International Tourist Arrivals (PH) 82% Drop COVID-19 Global Travel Restrictions
VII. Application

Strategic Dashboards

Philippine tourism analytics dashboard with KPIs and charts
VII. Application

Anatomy of a Dashboard

Tourist Arrivals
5.45M
+15.1% YoY
Revenue
P271B
+22.3% YoY
Occupancy Rate
68.4%
-2.1pp vs target
Avg Stay
4.2 days
+0.3 days
Monthly Arrivals Trend (2023)
Top Destinations
NCR
42%
Cebu
28%
Boracay
18%
Palawan
12%
VII. Application

Plotly: Interactive by Default

VII. Application

Seaborn: Statistical Visualization

Seaborn regplot with regression line and boxplot by island group
VII. Application

Interactive Visualization

1. Overview

See the full picture first

2. Zoom & Filter

Narrow to areas of interest

3. Details

Inspect specific data on demand

VII. Application

Philippine Example: Regional GDP

Gross Regional Domestic Product (2023)
Region Share (%)
Activity

Fix This Chart

Imagine a 3D pie chart with 12 slices, rainbow colors, no labels, and the title: "Data".

  1. List at least 3 problems with this chart.
  2. How would you redesign it?
  3. Which chart type would be a better choice?
3 minutes
Industry Tools

The Modern BI Ecosystem

Why leave Python/R?

Libraries like Matplotlib, Seaborn, and Plotly are excellent for ad-hoc analysis and data science workflows. However, deploying insights to hundreds of non-technical stakeholders requires a different toolset.

Scale

Cloud-native architecture to handle millions of rows without crashing the user's browser.

Governance

Centralized semantic models ensure everyone agrees on the definition of "Revenue" or "Active Users".

Interactivity

Self-service drag-and-drop filtering, cross-filtering, and exporting for business users.

Industry Tools

Tableau: The Artist's Choice

Tableau revolutionized the BI industry by turning drag-and-drop actions into database queries.

Strengths

Unmatched visual flexibility. The famous "Show Me" feature and its underlying proprietary VizQL (Visual Query Language) allow for rapid, intuitive exploratory data analysis.

Trivia

Founded in 2003 by researchers from Stanford University who commercialized their Department of Defense-funded research. Acquired by Salesforce in 2019 for $15.7 Billion.

Tableau Sample Dashboard

Tableau

"Help people see and understand data."

Power BI Sample Dashboard

Power BI

The Enterprise Heavyweight

Industry Tools

Power BI: The Corporate Standard

Microsoft's flagship analytics tool tightly integrated with the Azure and Office 365 ecosystem.

Strengths

Incredibly powerful data modeling via DAX (Data Analysis Expressions) and Power Query (M language). Highly cost-effective for enterprises already using Microsoft E5 licenses.

Trivia

Originally designed under the code name "Project Crescent". It evolved into a behemoth by combining several obscure Excel add-ins (Power Pivot, Power Query, Power View).

Industry Tools

Looker: The Developer's BI

A 100% web-based platform that forced the industry to rethink how core metrics are governed.

Strengths

Introduced LookML, a Git-version-controlled semantic layer. This ensures a metric like "Revenue" is defined exactly once in code, meaning every dashboard across the company calculates it exactly the same way.

Trivia

Unlike Tableau or Power BI, Looker doesn't extract data into its own memory engine—it translates LookML into SQL and queries the data warehouse (like BigQuery or Snowflake) directly. Google Cloud acquired Looker in 2019 for $2.6 Billion.

Looker Sample Dashboard

Looker

Governance First, Viz Second

Key Takeaways

  1. Pre-attentive attributes enable instant perception (<200ms)
  2. Tufte's data-ink ratio: maximize data, minimize junk
  3. Match chart type to your message and data type
  4. Use active titles and annotations to tell the story
  5. Always design for accessibility and integrity

Lab 5: Visualization Portfolio

Create publication-quality charts with matplotlib & seaborn using Philippine regional data. Practice chart selection, Tufte's principles, and data storytelling.

tufte.com

seaborn.pydata.org

plotly.com