Which productivity KPIs fail Annex III scope
Annex III of the EU AI Act lists AI systems used for the evaluation, performance, allocation, or termination of work as high-risk. Productivity KPIs that ride on those AI systems inherit the same classification — and the four most common KPIs in 2026 dashboards do. They fail Annex III not because the metric is wrong, but because the metric definition collapses too many decisions into a single AI-derived number, and that number then drives a material employment outcome without the documented human-oversight loop the Act requires.
The four common KPIs that fail most often:
- Single-number productivity score. A composite score, usually 0-100, that blends activity, attendance, and output into one ranked figure. Fails on three counts — the inputs are not separable for the employee to inspect, the model behind the composite is rarely documented, and the score routinely drives compensation, promotion, or PIP without a documented human-loop sign-off. Vendors who lead with this score (ActivTrak, Hubstaff, Insightful, Time Doctor, Teramind) all carry this exposure in different forms.
- Behavioural inference from keystroke or mouse data. A score that infers engagement, focus, or productivity from typing rate, mouse movement, or window-switching patterns. Fails on proportionality (the behavioural signal is not the work output), on transparency (employees rarely understand what the inference is doing), and on Article 5 if the inference drifts into emotion or affect.
- Sentiment or emotion index. A score that runs sentiment analysis on internal chat, AI tone-scoring on customer calls, or affect inference on engagement-survey text. Article 5 directly prohibits AI systems that infer emotions at the workplace outside narrow medical or safety use. Most engagement-survey vendors have this baked into the platform with the line drawn unclearly.
- Automated stack-rank or leaderboard scoring. A ranked list of employees by an AI-derived metric, used to drive recognition, allocation, or termination. Annex III flags this both as a high-risk evaluation system and as a likely social-scoring proxy. The risk is highest when the ranking is published internally; it persists even when the ranking is kept inside a manager dashboard.
Free: EU AI Act Vendor Readiness Scorecard (2026)
Fourteen questions a CISO and procurement team should score before signing any AI productivity vendor — Annex III coverage, conformity evidence, human oversight, transparency artefacts, and post-market monitoring posture.
See the 7-vendor scorecardThe seven KPIs that pass scrutiny by design
Each of the seven below is defined on aggregate work output, has a clear data source the employee can verify, and produces a signal a manager can use without an automated decision attached. Where the metric feeds a material employment decision, the human-oversight loop is documented as part of the metric itself.
| KPI | Definition | Compliance rationale |
|---|---|---|
| 1. Throughput | Units of work completed per period (tickets, deals, releases, calls handled) | Aggregate output, not personal inference. Outside Annex III scope when reported at team or function level. Transparency by definition. |
| 2. Cycle time | Median elapsed time from task start to delivery, by work type | Process metric on work artefacts, not on the worker. No model inference required. Defensible under DPDP Section 4 proportionality. |
| 3. Queue depth | Unresolved work items by age band (0-2 days, 3-7, 8+) | System-state metric. Drives allocation decisions through the queue, not through the employee. Annex III safe. |
| 4. Focus density (team level) | Share of working hours in uninterrupted blocks of 25+ minutes, reported at team | Aggregate reporting cuts personal inference. Annex III safe when never published per-employee, never tied to comp. |
| 5. On-time delivery rate | Share of work items closed by their committed date, by team or service line | Output-based, contractually anchored, transparent to employees. Outside Annex III scope when team-level. |
| 6. Deep-work share | Share of week spent on deep-work calendar blocks vs meeting / interrupt load | Self-reported via calendar conventions, not inferred. No covert capture. Transparent and disputable. |
| 7. Rework rate | Share of completed work that returns to the queue within 14 days | Quality signal on work output. Annex III safe when used to drive process review, not employee scoring. |
The pattern across all seven: the metric is about the work, not about the worker; the data source is observable, not inferred; the use is allocation or process improvement, not automated employment decision. When a KPI sits on the work-side of that line, Annex III scope rarely attaches. When it crosses to the worker-side, scope attaches and the conformity stack arrives with it. [needs-legal-review]
How to migrate from a banned KPI to a compliant one
The replacement is rarely one-for-one. A single-number productivity score collapses three or four signals into one figure; replacing it means accepting that the panel is wider but the readout is fairer. The mapping most teams find works:
- Single-number productivity score → throughput + cycle time + on-time delivery rate. Three numbers, all output-based, all readable in a 30-second manager scan.
- Behavioural inference → focus density at team level + deep-work share. Picks up the engagement signal without the keystroke/mouse capture.
- Sentiment or emotion index → explicit workload survey + queue depth. Trades inferred affect for declared workload, which is both more accurate and Article 5 safe.
- Automated stack-rank → manager-reviewed quartile bands with a documented human-loop sign-off before any comp or assignment effect. Bands replace rank; the loop replaces the automation.
The migration usually clears two boards in the same quarter — the Data Protection Officer signs off because proportionality is now defensible, and the People Ops head signs off because the engagement risk of opaque scoring drops. The procurement evidence trail for the EU AI Act vendor review is in the 7-vendor scorecard, which carries the conformity-evidence questions a buyer should ask before signing. [needs-legal-review]
The India layer — DPDP rationale per KPI
India teams running the same metric stack carry a parallel obligation under the Digital Personal Data Protection Act. Section 4 requires a defined purpose and consent; Section 8 makes the employer a Data Fiduciary with reasonable-security and breach-notification duties. The seven KPIs above all clear Section 4 because the purpose is operational performance and the data is observable work output. The four banned KPIs all create Section 4 issues — purpose creep, retention without a defined window, or processing without a recorded consent route.
The DPDP Rules implementing the Act are expected to be notified late 2025 or 2026; India sections of any KPI documentation should flex with the final Rules — name the Data Fiduciary, hedge timelines tied to specific Rule provisions, and reserve the right to update consent text post-notification. The deeper India-specific worksheet for the 14 questions a CISO should score lives in the DPDP Rules CISO worksheet. [needs-legal-review]
What a reviewer wants to see
A conformity reviewer or an internal audit team will work down a short checklist per KPI. The KPIs that survive scrutiny are the ones that already carry the answers inside the metric definition.
- Definition. What does the metric measure, in one sentence, in plain language.
- Data source. Where the inputs come from — system events, calendar entries, ticket states. Personal inference flagged as such.
- Computation. The formula or model that turns inputs into the output. Auditable, not opaque.
- Human-oversight loop. Which manager reviews which threshold, what action sits on the loop, what gets logged.
- Transparency artefact. What the employee can see about their own data, how they see it, and how they raise a question.
- Dispute and correction route. The named owner of the dispute pathway and the standard response timeline.
- Retention. The window the data is held against, the deletion or aggregation rule, and the proportionality justification.
- Proportionality. The argument for why this signal at this granularity is the minimum needed for the operational purpose.
Free: EU AI Act Vendor Readiness Scorecard (2026)
Use the 14-question scorecard during procurement to evidence that the platform behind your KPIs has the conformity stack in place before the August 2 deadline.
Read the scorecardFAQ
Frequently asked questions
Which productivity KPIs fall under EU AI Act Annex III?
Annex III names AI systems used for the evaluation, performance, and allocation of work as high-risk. In practice that means any productivity KPI that runs through an AI model, drives a material employment decision, or feeds an automated performance signal sits inside Annex III scope. The four KPIs that fail the test most often are single-number productivity scores, behavioural inference scores derived from keystroke or mouse activity, emotion or sentiment indices from chat or calls, and automated stack-rank or leaderboard scoring. Each of these triggers conformity, transparency, and human-oversight obligations from August 2 2026.
Can a productivity KPI sit outside Annex III scope?
Yes, if three conditions hold. The KPI must be defined on aggregate or process data rather than personal inference. The KPI must not be the sole or primary driver of a material employment decision. And the KPI must be transparent to the employee with a documented dispute route. Throughput, focus density at team level, cycle time, queue depth, and similar process metrics meet all three when implemented without screenshots or keystroke capture. [needs-legal-review]
Does the EU AI Act ban productivity scoring outright?
No. The Act does not ban productivity measurement. Article 5 bans specific practices including social scoring and workplace emotion inference outside narrow safety or medical use. Annex III classifies broader employment-decision AI as high-risk, which means conformity assessment, technical documentation, human oversight, and post-market monitoring. The practical effect on most productivity scoring is not a ban — it is a duty to evidence proportionality, accuracy, and transparency, which the four common KPIs cannot currently meet.
How should a team replace a banned KPI without losing signal?
Replace single-number productivity scores with a multi-signal panel where each signal is defined on aggregate work output. Replace behavioural inference with focus density and cycle time. Replace sentiment indices with explicit employee-reported workload surveys. Replace automated leaderboards with manager-reviewed quartile bands that require a human-loop sign-off before any compensation or assignment effect. Most teams find the replacement panel reads better in dashboards and is easier to defend in an HR review.
What evidence will an EU AI Act reviewer want for productivity KPIs?
A reviewer will ask for the metric definition, the data sources, the model or formula used to compute it, the human-oversight loop, the employee transparency artefact, the dispute and correction route, the retention window, and the proportionality rationale. For Annex III high-risk KPIs the reviewer will also expect a conformity assessment, post-market monitoring records, and incident reporting. KPIs designed compliant by default carry this evidence stack inside the metric definition itself; KPIs retrofitted for compliance usually fail one or more of these checks. [needs-legal-review]
Related reading on gStride
- AI workplace policy template 2026 — the seven pillars
- EU AI Act compliant productivity vendors — 7-vendor scorecard
- DPDP Rules — 14 questions India CISOs must score
- Employee productivity scoring — buyer verification framework
- The anti-surveillance productivity stack — pillar guide
- AI productivity intelligence platform — category pillar
See a productivity intelligence platform built on the seven KPIs
gStride ships the throughput, cycle time, queue depth, focus density, on-time, deep-work, and rework metrics as first-class signals, with the human-oversight loop documented and the surveillance defaults off.
See the platform Book a 30-min call
