The Relevance Test

Are your sales & marketing teams ready for AI-accelerated presales? A 60-minute, evidence-based assessment you can run this week.

Why test at all (and why now):
Global frameworks are clear: AI literacy isn’t just for engineers—teams must be able to use, judge, and govern AI systems in daily work.
Meanwhile, enterprise data shows adoption is uneven: the orgs compounding gains are the ones integrating AI into weekly workflows (search, writing, analysis, decision support), not just trying demos. Microsoft’s latest Work Trend Index highlights where productivity is actually showing up—and where gaps persist. (Harvard Business Review)

What really differentiates performance:
Model choice matters, but data and labeling quality usually dominate outcomes—this is the core of “data-centric AI.” Get labels and feedback loops right, and everything compounds. (TestDome)

The 60-Minute Relevance Test (for presales orgs)

How to run it: Pull a small, anonymized slice of your own data (last 30–50 inbound inquiries, last week’s chat transcripts, 2–3 recent campaigns). Put a cross-functional group (1 exec, 2 marketers, 2 sales, 1 ops) in a room for one hour. Score each task 0–3 (0 = can’t do, 3 = crisp + repeatable). Sum to 100.

Scoring guide per task
0 = Not attempted / no method
1 = Manual/intuitive, no repeatability
2 = Some structure, partial evidence, could repeat
3 = Clear method, evidence-based, trivial to repeat/share

A. Data & Signals (30 pts)

Label intent from raw text (10 mins)
Take 10 recent inbound messages. Label: stage (browse / tour-ready), budget band, urgency signal (e.g., date, family move). Note disagreements and how you resolved them. (Good labels beat “fancier” models over time.) (TestDome)
Score 0–3.
Define minimal weekly baselines (5 mins)
From last week’s data, state: lead→tour %, time-to-first-qualified, show-rate, tour→contract %. If you can’t, name the query you would run and from which system. (Can your team ask the right questions of the data?) (arXiv)
Score 0–3.
Conversation analysis (10 mins)
Take 10 chat/call snippets. Classify the top 3 objections and map each to a next action (e.g., “budget-tight → financing explainer”). Note frequency caps or compliance flags. (Signals only matter if they route action.) (OECD)
Score 0–3.

B. Tool Use & Prompting (20 pts)

One prompt, three outputs (10 mins)
Draft one prompt to produce: (a) 3 unit matches for a buyer brief, (b) a 90-sec rep talk track, (c) a 3-message follow-up sequence—with sources/caveats listed. (Measures clarity, guardrails, reuse.) (Harvard Business Review)
Score 0–3.
Cold-start a micro-persona (5 mins)
From a spreadsheet snippet (5–10 rows), create a persona sketch (price band, location, family signal) and a single testable hypothesis (e.g., “weekday 7–9pm texts lift replies for X persona”). (Harvard Business Review)
Score 0–3.

C. Decision & Experiment (30 pts)

Rank today’s list (10 mins)
Given 20 leads, produce a Top 5 contact order with why now (signal + channel + hour). Document your rule so someone else could repeat it next week. (We’re testing decisionability, not dashboards.)
Score 0–3.
Design a tiny A/B (5 mins)
Cut time-to-first-qualified by 20%. Define one change (e.g., “SMS within 15 min if email unopened”), sample size proxy, and stop condition. (Harvard Business Review)
Score 0–3.
Price-integrity play (5 mins)
List two non-discount levers (e.g., sequencing tours by fit; framing scarcity) and the signal that triggers each—so you preserve price while moving velocity. (Decision policy, not vibes.)
Score 0–3.

D. Governance & Hygiene (20 pts)

Frequency & fairness policy (5 mins)
Write one paragraph that sets frequency caps, quiet hours, and a broker equity rule (e.g., rotate preview access unless performance > X). Aligns to “use & evaluate AI responsibly.” (Microsoft)
Score 0–3.
Data handling for chat (5 mins)
Where do transcripts live? Who can export? Retention window? Redaction? Write the rule in one minute. (Privacy & least-privilege basics.) (Source)
Score 0–3.

Interpreting your Relevance Index (0–100)

85–100: You’re compounding. Formalize the loops; consider automation next.
60–84: Strong intuition; standardize and templatize (labels, prompts, A/B cadence).
<60: High effort, low leverage. Start with labeling, weekly baselines, and one small A/B.

The “Musk Minute.”
Love him or hate him, Musk’s orgs pressure-test relevance with simple, hands-on tests—code reviews at Twitter, and more recently a public drumbeat that high-quality data labeling is underrated and core to xAI’s edge. Take the hint: small, real tasks beat slideware. (Business Insider)

What to do after the test (no tools required)

Institutionalize labels. Keep a 50-example library of good and bad labels; review weekly. (TestDome)
Make decisions visible. Publish a one-page weekly “Rank & Reason” (Top 20 contacts, why, results).
Run tiny A/Bs continuously. One change per week on who/when/how.
Add guardrails once. Frequency caps, quiet hours, redaction—write them down.

Bottom line: Relevance isn’t “who works harder.” It’s who turns yesterday’s outcomes into today’s ranked actions—ethically, repeatably, and fast. That’s how data becomes a production factor, not a report.

Further reading:

WEF’s AI Literacy Framework (practical competencies beyond tools). (OECD)
OECD AI Literacy (use, understand, critically evaluate). (Microsoft)
Andrew Ng: Data-Centric AI (why labeling quality compounds). (TestDome)
Microsoft Work Trend Index (what AI is actually helping with at work). (Harvard Business Review)

The Relevance Test

More articles

More articles

More articles