Screening Criteria Setup: Build an Explainable Scorecard
Turn a hiring profile into a weighted, explainable scorecard for AI CV screening. Concrete rubrics, calibration, fairness checks, and GDPR-ready governance.
Most teams have a hiring profile. Far fewer can turn that profile into a scorecard that an AI CV screening tool can apply reliably. The risk is simple: vague criteria amplify noise and bias, while clear, testable rules produce a shortlist you can defend to a hiring manager, a candidate, or a regulator. This guide shows you how to codify role success into an explainable, weighted scorecard and keep it accurate over time.
Step 1: Define observable role-specific success signals
Work backward from what the role must achieve in its first 6 to 12 months. Pick 4 to 8 signals that can be observed in a CV, application answers, work samples, or a portfolio. Every signal must be job relevant, evidence-based, and scoreable by multiple reviewers in the same way.
Make signals concrete and falsifiable. Examples:
- Demand generation marketer
- Pipeline impact: sustained SQL growth tied to campaigns the candidate owned.
- Content engine: ability to plan, produce, and distribute content that converts.
- Marketing ops: hands-on CRM and marketing automation proficiency.
- Backend engineer
- Production ownership: shipped services used at scale with on-call exposure.
- Systems skills: debugging across services, queues, and data stores.
- Code quality: evidence of reviews, performance work, and testing discipline.
- Customer success manager
- Renewals and NRR: managed renewals with measurable churn reduction or expansion.
- Book complexity: portfolio size, ARR responsibilities, enterprise personas.
- Executive communication: QBRs, documented playbooks, or credible case studies.
Anchor each signal to evidence you can verify. For demand gen, a strong proof might be growing SQLs from zero to a consistent monthly run rate, with program attribution. That kind of outcome is illustrated in the LinkedIn Automation Case Study: 0 to 40 SQLs in 90 Days, which shows the level of LinkedIn content automation impact you can validate in a portfolio or reference.
Design for equity without lowering the bar. Pair signals that could privilege narrow backgrounds with inclusive alternatives. For example, if you value experience in a known SaaS brand, accept evidence from open source, bootstrapped products, freelancing, or community leadership that delivered equivalent outcomes. Exclude proxies like native-language phrasing or continuous employment unless truly job critical.
Step 2: Convert signals into a weighted, explainable scorecard
Use a 100-point scorecard so tradeoffs are explicit. Keep 3 to 5 must-haves, a few nice-to-haves, and very few objective disqualifiers tied to non-negotiable constraints.
- Must-haves (60 to 70 points total). Example for demand gen:
- Sustained SQL growth owned by candidate: 25 points.
- Built and ran a content engine end to end: 20 points.
- CRM or automation fluency (HubSpot, Marketo, or equivalent): 15 points.
- Nice-to-haves (20 to 30 points total). Example: ICP overlap with your buyers (10), relevant certification or training (5), cross-functional leadership with sales or product (10).
- Disqualifiers (documented, minimal). Example: cannot operate within required core hours; no work authorization for the jurisdiction.
Write a short rubric for each rule with 0, partial, and full credit descriptions plus acceptable evidence. Example rubric for “Sustained SQL growth” (25 points):
- 0 points: Mentions lead generation without metrics or ownership.
- 12 points: Shows 2 to 3-month spikes or shared ownership with partial attribution.
- 25 points: 6-plus months of consistent SQL growth attributed to programs led end to end, with volumes or conversion rates listed.
Repeat this clarity for engineering and CS roles. For a backend engineer, “Production ownership” could be 0 points if only academic projects, 10 points for minor features on a service with some on-call, and 20 points for primary ownership of a service used by 50k+ daily users with incident postmortems authored.
In Marxel, open the Criteria Builder, create one rule per signal, set weights, and attach the rubric text and evidence examples. Map acceptable evidence types to what the system reads reliably, such as resume bullets with metrics, portfolio links, or public repos. The generated per-candidate scorecard highlights which rules fired, the evidence cited, and the points awarded so reviewers can check the same lines.
Decide thresholds before you run live requisitions. A simple pattern that scales:
- 70+ points: fast-track to phone screen within 48 hours.
- 50 to 69 points: manual review within 5 business days.
- Under 50 points: decline unless a reviewer flags a specific exception with rationale.
Step 3: Pilot and calibrate on real resumes
Pilot before scaling. Calibration tightens rubrics, corrects weights, and reveals hidden proxies.
- Assemble a balanced batch. Use 40 to 80 historical or recent applications that include clear hires, near-misses, and clear declines. Redact or mask fields that reveal protected traits where feasible.
- Run the model and export attributions. Generate the explainable shortlist and per-criterion attributions. Save the initial rankings and scores.
- Double-blind human review. Have two trained reviewers rescore 20 top profiles and 20 around the threshold using only the rubric. Do not allow them to see each other’s ratings or the AI scores.
- Measure agreement and quality. Compute inter-rater agreement (target Cohen’s kappa ≥ 0.6). Compare precision and recall versus a gold label set, such as past hires who met ramp goals within 90 days.
- Diagnose and tune. Where disagreement clusters, tighten rubric language, add positive and negative examples, or shift 5-point weight increments. Remove rules that fire on proxies like school names. Re-run and log deltas in pass rates and precision.
In Marxel, the attribution view makes it obvious when a rule fires on the wrong evidence. Use it to prune rules that rarely contribute signal, merge overlapping rules, and confirm that top-ranked candidates match hiring manager expectations. Two or three calibration loops usually yield stable thresholds.
Step 4: Audit for fairness, GDPR, and UK context
Treat governance as a product requirement. Your CV screening workflow must be lawful, fair, and transparent, especially for UK and EU candidates.
- Fairness checks. When you have lawful, voluntary self-ID data, sample outcomes across relevant groups and look for disparate impact. As a heuristic, investigate if a group’s pass rate falls below 80% of the highest group’s pass rate. Remove or reweight rules that act as proxies for protected traits.
- Content hygiene. Do not score headshots, names, ages, or postal codes. Prefer outcomes and skills over brand-name proxies. Suppress free-text fields during initial screening if they routinely leak sensitive information.
- GDPR basics. Define purpose and legal basis (often legitimate interests with an LIA, or consent if appropriate). Minimize data to what screening needs, set retention periods (for example, 6 to 12 months for unsuccessful applicants), and honor access, correction, and deletion requests within statutory timelines. Maintain Records of Processing Activities and a candidate-facing privacy notice that explains automated decision-making at a high level.
- UK specifics. Confirm data residency and transfer safeguards if data leaves the UK or EEA. Execute DPAs with processors and include UK IDTA or EU SCCs where required. Document human-in-the-loop steps and provide a clear contact path for objections or queries.
Explainability is your safety net. When a candidate asks why they did not advance, you should be able to point to the job-relevant rules and the evidence that determined the score. Marxel’s scorecards provide that attribution, along with an audit log of changes to rules and weights.
Step 5: Governance and updates that stick
A scorecard is living documentation. Treat it like an operating procedure with ownership, versioning, and feedback loops.
- Assign an owner and a reviewer. Make the hiring manager or talent leader the DRI for changes. Require a people ops reviewer to check fairness and legal alignment before publishing.
- Version and log changes. Keep a change log with date, rationale, expected effect, and who approved it. Example: 2026-07-12. Increased SQL growth weight from 20 to 25 after pilot showed vanity-metric false positives.
- Recalibrate on a schedule. Quarterly, pull a 50-profile sample, re-score, and compare pass rates and onsite-to-offer conversion. Watch for applicant pool drift or shifts in product strategy that change what good looks like.
- Close the loop with outcomes. Link early signals to post-hire proxies such as ramp time, manager quality-of-hire scores at 90 days, and first-year retention. Retire or rewrite rules that do not predict downstream success.
- Communicate and train. Brief recruiters and interviewers whenever the scorecard changes. Share before-and-after examples so sourcing, messaging, and assessment stay aligned.
Common pitfalls to avoid
- Overweighting pedigree and underweighting outcomes. Score shipped work, not brand names.
- Too many rules. More than 10 to 12 often adds noise. Merge or eliminate overlaps.
- Vague criteria like culture fit. If you cannot write a rubric and examples, it does not belong in automated screening.
- Hard disqualifiers that are preferences in disguise. Treat location, degrees, and niche tools as weighted signals unless legally or operationally required.
- No holdout test. Always pilot and measure agreement before trusting the scorecard at scale.
Key takeaways
- Define 4 to 8 observable success signals and tie each to verifiable evidence.
- Translate signals into a 100-point scorecard with clear rubrics, weights, and thresholds.
- Pilot with real resumes, measure kappa, and tune for precision and recall.
- Audit for fairness and design a GDPR-ready process, including DSAR handling and data minimization.
- Treat the scorecard as a living system with owners, versions, and outcome feedback.
With disciplined criteria and an explainable engine, teams move from gut feel to consistent hiring quality. Marxel’s Criteria Builder, attribution-rich scorecards, and audit logs give you structure, speed, and transparency while supporting DEI goals and rigorous review.