Checklist · 8 min read

AI software development partner: evaluation checklist for CTOs

A 17-question checklist to evaluate AI-assisted software engineering vendors. Use it before you sign with anyone — including us. We pass this checklist; many vendors don’t. The goal is not to choose Pernix. The goal is to choose well.

By Pernix Engineering TeamMay 15, 2026

TL;DR

If a vendor refuses to answer these questions in detail, treat it as a red flag. If they answer but cannot show concrete artifacts (sample specs, test reports, references), treat it as a red flag. If they pass, validate in a 14-day paid pilot before signing a long-term contract.

The 17 questions

Show me 3 references I can call directly.

Past 24 months. Not testimonial quotes — phone numbers.

What is your engineer retention rate?

Below 80% over 2 years is a warning. Below 70% is a red flag.

Can I interview the specific engineers proposed for my team?

"You get who we send" is staff aug at premium pricing.

Show me a sample specification you wrote for a similar project.

If they don't write specs, they don't have a process.

What is your code review and merge process?

Should include peer review, automated testing, branch protection, security scanning.

How do your engineers use AI tools day-to-day?

Generic answer = they don't really. Look for specific tools, prompts, and discipline.

What happens if AI-generated code introduces a bug?

"Our humans review everything" is the right answer. Anything else is concerning.

What's your minimum engagement size and notice period?

Lock-in over 6 months without an out clause is a red flag.

Will you do work for free to demonstrate fit?

We do. Many won't.

What's your timezone overlap with my team?

Less than 4 hours of overlap creates async-only collaboration. Plan accordingly.

What security controls and compliance posture can you support?

Ask about access policies, AI usage restrictions, NDA process, code handling practices, and regulatory requirements. For regulated environments, SOC 2, ISO 27001, or equivalent controls may apply.

What's your IP transfer model?

You should own everything created. Their model code/templates excluded.

How do you handle a project running late?

"We extend on your dime" is the wrong answer. Right answer involves scope renegotiation.

Can I see code shipped to production for a similar client?

With permission, they should be able to.

What's your test coverage standard?

Don’t accept "we don’t have one." Pernix targets 60-80% in active areas.

What happens if your engineer leaves mid-engagement?

Should be: pair programming + docs + spec mean replacement onboards in days.

Show me your last 3 client retrospectives.

Honest vendors have honest postmortems. Hidden retros = hidden problems.

7 red flags

"AI writes 80% of our code."

No discipline. Run.

"We don't share rates / it depends."

Lack of transparency is a leading indicator of bigger issues.

"Just trust us, we have great reviews."

Reviews without verifiable specifics are marketing copy.

"Our engineers are anonymous to clients."

You can't evaluate what you can't see.

"We don't do test-first."

Then you're paying for technical debt.

"We can start tomorrow."

Real engineering pods don't have idle capacity. They have lead time.

"Our minimum contract is 12 months."

If they're confident, they'll accept a short pilot.

Why AI-generated code isn't enough

The market is flooded with "AI-powered" agencies. The reality: most of them just installed Copilot and added AI to their pitch deck. AI-generated code without spec discipline, code review, test coverage, and senior engineering review is net negative for your codebase. It produces volume that looks like progress while accumulating debt at unprecedented speed.

How to validate in 14 days

Don't sign a long-term contract before validating. Run a 14-day paid pilot with a written specification, a defined milestone, and a "if we don't deliver, you don't pay" clause. This is the Pernix model. Demand it from any vendor.

How they write specs.
How their engineers communicate.
How their code reads under review.
Whether their AI claims match reality.
Whether you want to work with these people for the next year.

Download the printable checklist

PDF version with scoring rubric. Free. No sales follow-up.

Frequently asked questions

How do I know if an AI engineering vendor is actually using AI well?

Ask them to walk you through a specific workflow: what prompts they use, how they review AI output, where human judgment is required. Generic answers mean they have not operationalized it. Look for concrete artifacts — a sample spec, a code review log, test reports. A vendor that uses AI well can show you, not just describe it.

What's the minimum engagement I should consider before signing long-term?

Run a 14-day paid pilot with a written specification, a defined milestone, and a "if we do not deliver, you do not pay" clause. Avoid commitments over 6 months without an out clause. Legitimate vendors that are confident in their work will accept a short pilot before a longer contract.

Should I evaluate the specific engineers, or just the company?

Both. The company practices and systems create the floor. The specific engineers on your team determine day-to-day quality. Any vendor that will not let you interview the proposed engineers is showing you their sales process, not their team.

What's the biggest red flag when evaluating an AI engineering partner?

Refusing to provide references you can call directly, from the past 24 months, for real conversations. Testimonial quotes on a website are marketing. Direct conversations with past clients reveal real working relationships, real outcomes, and whether the vendor delivered what they promised.

Related resources

Read