Finding your next AI engineer sucks.

Your team is stuck doing interviews and reviewing take-homes — and 84% of devs use AI in ways you'll never see.

LeetCode

Tests memorization, not real work.

Take-homes

You see the output, never the process.

"No AI" rules

Tests a workflow nobody uses anymore.

Assessment Review Dashboard

Three candidates. Same challenge.

Assessment: Recreate HackerNews · 90 min · React + Node · 3 completed

Hover the top candidate to see their prompt timeline →

B

Candidate B

88 min · 6 commits

MAYBE

LIVE

HackerNews Clone

Show HN: I built a full-stack app in 90 min

142 points · 28 comments

The future of AI-assisted development

89 points · 12 comments

Why Rust is eating the world

67 points · 9 comments

Tools

Claude Code

AI Usage

High delegation · low scrutiny

72% accept18% modify10% reject

Signals

Prompt Clarity

58

Output Scrutiny

42

Tool Fluency

65

Autonomy

30

Timeline

Requirements

✓Auth

✓Stories

✓Voting

✓Comments

○Nested comments

○Pagination

A

Candidate A

72 min · 14 commits · ★ top performer

STRONG HIRE

LIVE

Y

1. Show HN: I built a full-stack app in 90 min (github.com)

▲ 142 points by user1 3h ago | 28 comments

2. The future of AI-assisted development (blog.example.com)

▲ 89 points by user2 5h ago | 12 comments

3. Why Rust is eating the world (medium.com)

▲ 67 points by user3 7h ago | 9 comments

Tools

Claude Code Cursor Magic Patterns Puppeteer MCP

AI Usage

Low delegation · high scrutiny

35% accept45% modify20% reject

Signals

Prompt Clarity

92

Output Scrutiny

88

Tool Fluency

85

Autonomy

78

Timeline

Requirements

✓Auth

✓Stories

✓Voting

✓Comments

✓Nested comments

✓Pagination

↻ Hover to see prompt timeline

A

Candidate A — Prompt Timeline

23 prompts · 72 min session

↻ hover off to flip back

0:00 PROMPT #1

"We are tasked with recreating HackerNews. Before we write any code, let's scour GitHub to find public repos of people doing this exact project. Find only high quality repos from the last year. I want to study their architecture decisions before we start."

+1 novel prompting technique research-first approach

0:04 PROMPT #2

"Good finds. Now deep think about the common patterns across these repos. What's the minimal schema we need? I want you to circle back with me to clarify before generating any models."

key term: "deep think" key term: "circle back to clarify" +1 verification checkpoint

0:12 PROMPT #3

"Schema looks right. Now scaffold the project — but don't implement any routes yet. Just give me the folder structure, package.json, and DB migrations. I'll use Magic Patterns to generate the UI components separately."

+1 task decomposition multi-tool orchestration

0:20 PROMPT #4 — Modified AI output

"The auth middleware you generated catches generic errors. That's not good enough — I need specific handlers for TokenExpiredError and InvalidSignatureError. Also, move the JWT secret to an env var, not hardcoded."

+1 error correction +1 security instinct

0:35 PROMPT #5

"Core features are working. Now I want nested comments — but before you implement, show me just the recursive SQL query you'd use. I want to validate the approach before we build the component tree."

key term: "validate approach" +1 verification checkpoint

0:52 PROMPT #6

"Last push — add pagination. Use cursor-based, not offset. And run the Puppeteer tests against localhost to make sure everything renders correctly before I submit."

+1 testing awareness cursor pagination (advanced)

... 17 more prompts in full session →

C

Candidate C

90 min · 3 commits

NO HIRE

FAILED

✕Build Failed

ERROR in src/App.tsx:24:8

TS2322: Type 'string' is not assignable

to type 'number'.

ERROR in src/components/Story.tsx:11:5

Cannot find module './Comments'

webpack compiled with 2 errors

Tools

Codex CLIChatGPT (browser)

AI Usage

Full delegation · zero scrutiny

94% accept4% modify2% reject

Signals

Prompt Clarity

22

Output Scrutiny

8

Tool Fluency

35

Autonomy

12

Timeline

Requirements

✓Auth

✓Stories

✕Voting

✕Comments

○Nested comments

○Pagination

AI prompting

Manual editing

Reviewing output

Errors / debugging

How it works

From real work to hiring signal.

01

Linear ENG-347

Feature High Priority

Add threaded commenting to Discussions

Implement a commenting system similar to HackerNews for our Discussions section. Needs nested replies, upvoting, and real-time updates. Reference HN's UX for threading depth.

Assigned: Eng Team Sprint 14

✦ spize agent generates assessment

Recreate HackerNews AI GENERATED

Build a functional HackerNews clone — auth, stories, voting, nested comments. Tests the exact skills from ENG-347 but in a standalone challenge.

⏱ 90 min React + Node From: ENG-347

☐ Auth (login/register)

☐ Story submission

☐ Upvoting/downvoting

☐ Threaded comments

☐ Bonus: Pagination

☐ Bonus: Tagging

Use real work. Or let our agent create it.

Point Spize at your actual Linear stories, GitHub issues, or Jira tickets. Our agent analyzes the work your team has done and generates a standalone assessment that tests the same skills — without exposing your codebase.

Or bring your own challenge. Either way, candidates get something that actually matters to your team.

02

Recreate HackerNews

ACTIVE

⏱ 90 min 📦 React + Node 👥 3 invited

Invite sent to candidate@email.com

$ npx @spize/cli abc-123-def

Token expires in 7 days · One-time use

Invite candidates with a single token.

Each candidate gets a unique, expiring token. One command to start — no accounts, no setup, no IDE restrictions. They use their own tools.

03

✓ Session active · 1h 28m remaining

─────────────────────────

[14:02] prompt → researched GitHub repos

[14:08] edit → modified auth middleware

[14:15] review → 3 min pause, rewrote logic

[14:22] tool → switched to Magic Patterns

[14:30] flag → caught AI hallucination

They work. We observe.

Every prompt, every tool switch, every pause, every edit. Our agent captures it all silently — no screen recording, no webcam, just AI interaction data.

04

Assessment Results 3 of 3 completed

A

Candidate A STRONG HIRE

B

Candidate B MAYBE

C

Candidate C NO HIRE

Compare. Decide. Hire.

Side-by-side review with signal scores, prompt timelines, and AI usage profiles. Then each candidate defends their decisions in The Defense.

What We Surface

Not just what they built — how they built it.

A

Candidate A — Prompt Timeline

23 prompts · 72 min session

0:00 PROMPT #1

"We are tasked with recreating HackerNews. Before we write any code, let's scour GitHub to find public repos of people doing this exact project. Find only high quality repos from the last year. I want to study their architecture decisions before we start."

+1 novel technique research-first approach

0:04 PROMPT #2

"Good finds. Now deep think about the common patterns across these repos. What's the minimal schema we need? I want you to circle back with me to clarify before generating any models."

+1 AI proficiency key term: "deep think"

0:20 PROMPT #4 — Modified AI output

"The auth middleware you generated catches generic errors. That's not good enough — I need specific handlers for TokenExpiredError and InvalidSignatureError. Also, move the JWT secret to an env var, not hardcoded."

+1 error correction +1 security instinct

0:35 PROMPT #5

"Core features are working. Now I want nested comments — but before you implement, show me just the recursive SQL query you'd use. I want to validate the approach before we build the component tree."

+1 AI proficiency verification checkpoint

🧪

Novel Techniques

Creative approaches that separate good engineers from great ones. Research-first, multi-tool orchestration, unconventional prompting strategies.

90

🎯

AI Proficiency

Vibing or orchestrating? Do they lead the AI with intent, or just accept whatever comes back?

92

🐛

Error Correction

When the AI hallucinates — do they catch it, fix it, or ship it?

88

🛡️

Security Instincts

Do they think about secrets and trust boundaries — or ship whatever compiles?

85

Post-Assessment

The Defense

Code is done. Now AI interrogates every decision — generated from their actual session data.

⚡

The Defense — Candidate A

Recreate HackerNews · 6 questions · ~15 min

LIVE Q 3 / 6

📎 CONTEXT

At 0:00 you searched GitHub for HackerNews clones before writing any code. At 0:04 you asked the AI to "deep think" and "circle back to clarify." At 0:20 you rejected the AI's generic error handler and rewrote it.

Q1 RESEARCH APPROACH

You spent the first 4 minutes researching existing HackerNews repos instead of coding. What specifically were you looking for, and how did it change your architecture?

A1 CANDIDATE

I wanted to see how others handled the comment threading — recursive CTEs vs. adjacency list vs. materialized paths. Found two repos using materialized paths which confirmed my instinct. Saved me from over-engineering the schema.

Q2 ERROR CORRECTION

At 0:20, Claude generated auth middleware with a generic catch(err) block. You rewrote it to handle TokenExpiredError and InvalidSignatureError separately. Why wasn't the generic handler sufficient?

A2 CANDIDATE

Expired tokens need a 401 with a refresh hint. Invalid signatures are a potential attack — that's a 403, log it, and maybe rate-limit the IP. Lumping them together means you can't distinguish between a user who needs to re-login and someone probing your auth.

Q3 SECURITY

You also moved the JWT secret from a hardcoded string to an environment variable in the same edit. Was that in response to the AI's code, or something you planned from the start?

A3 CANDIDATE

Session: 72 min Prompts: 23 Flags: 12

Powered by session data · not vibes

The technical screen
for the AI era.

Finding your next AI engineer sucks.

LeetCode

Take-homes

"No AI" rules

Three candidates. Same challenge.

From real work to hiring signal.

Use real work. Or let our agent create it.

Invite candidates with a single token.

They work. We observe.

Compare. Decide. Hire.

What We Surface

Novel Techniques

AI Proficiency

Error Correction

Security Instincts

The Defense

FAQ

The technical screen for the AI era.

Finding your next AI engineer sucks.

LeetCode

Take-homes

"No AI" rules

Three candidates. Same challenge.

From real work to hiring signal.

Use real work. Or let our agent create it.

Invite candidates with a single token.

They work. We observe.

Compare. Decide. Hire.

What We Surface

Novel Techniques

AI Proficiency

Error Correction

Security Instincts

The Defense

FAQ

The technical screen
for the AI era.